-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Conversation
allennlp/__init__.py
Outdated
@@ -5,6 +5,11 @@ | |||
if sys.version_info < (3, 6): | |||
raise RuntimeError("AllenNLP requires Python 3.6 or later") | |||
|
|||
# Disable FutureWarnings raised by h5py | |||
# TODO(joelgrus): remove this (and pin requirements) when h5py 2.8.0 is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will flatten the warnings that I added for the predictors - you can also use warnings
as a context manager, can we be smarter about this somehow?
h5py is only used in the following files, so maybe either adding context managers for the h5py import or importing h5py at the top level, such that this warning wouldn't be triggered repeatedly would be better?
allennlp/commands/elmo.py:The output is a hdf5 file (<http://docs.h5py.org/en/latest/>) where, with the --all flag, each
allennlp/commands/elmo.py:import h5py
allennlp/commands/elmo.py: with h5py.File(output_file_path, 'w') as fout:
allennlp/modules/elmo.py:import h5py
allennlp/modules/elmo.py: with h5py.File(cached_path(self._weight_file), 'r') as fin:
allennlp/modules/elmo.py: with h5py.File(cached_path(self._weight_file), 'r') as fin:
allennlp/modules/elmo.py: with h5py.File(cached_path(self._weight_file), 'r') as fin:
allennlp/modules/elmo.py: with h5py.File(cached_path(self._weight_file), 'r') as fin:
allennlp/modules/elmo_lstm.py:import h5py
allennlp/modules/elmo_lstm.py: with h5py.File(cached_path(weight_file), 'r') as fin:
allennlp/modules/token_embedders/embedding.py:import h5py
allennlp/modules/token_embedders/embedding.py: with h5py.File(embeddings_filename, 'r') as fin:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I should be able to just do it where h5py gets imported, I was being lazy
@@ -60,7 +65,7 @@ def main(prog: str = None, | |||
# so give the user some help. | |||
if 'func' in dir(args): | |||
# Import any additional modules needed (to register custom classes). | |||
for package_name in args.include_package: | |||
for package_name in getattr(args, 'include_package', ()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking if the name wasn't "configure" as you did above would be more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no "name" at this point, though. you'd have to do something like
if args.func.__name__ == "_configure"
which feels hackier than this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yep my bad.
|
||
parts = full_path.split(".") | ||
class_name = parts[-1] | ||
module_name = ".".join(parts[:-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break if a user hasn't added imports of their classes to the __init__.py
of their module (I think). Is there any way we can be more robust to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code assumes people always specify the full path:
allennlp.data.dataset_readers.snli.SnliReader
which is how they're exposed here -- that is, if you ask it for the possible types of dataset readers, it will return those paths rather than the ones one level above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right that is non-obvious. Why is this the case? It would be really annoying to have to write out the full path, when it isn't required to find the registerable class.
I would expect the API for allennlp components to be allennlp configure SnliReader
. I think this is possible if you just add back the "--include_package" flag and remove the importlib usage here, right? This seems better as it is 1) less verbose and 2) uses the package import method that the rest of the commands use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was imagining a workflow that looks something like this, where each time you're choosing one of the full paths from the previous step:
(allennlp) OS-XImage:allennlp joelg$ allennlp configure
configuration stub for AllenNLP:
{
"dataset_reader": <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> (configurable) // specify your dataset reader here
// "validation_dataset_reader": <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> (configurable) (default: None ) // same as dataset_reader by default
"train_data_path": <class 'str'> // path to the training data
// "validation_data_path": <class 'str'> (default: None ) // path to the validation data
// "test_data_path": <class 'str'> (default: None ) // path to the test data (you probably don't want to use this!)
// "evaluate_on_test": <class 'bool'> (default: False ) // whether to evaluate on the test dataset at the end of training (don't do it!
"model": <class 'allennlp.models.model.Model'> (configurable) // specify your model here
"iterator": <class 'allennlp.data.iterators.data_iterator.DataIterator'> (configurable) // specify your data iterator here
"trainer": <class 'allennlp.training.trainer.Trainer'> (configurable) // specify the trainer parameters here
// "datasets_for_vocab_creation": typing.List[str] (default: None ) // if not specified, use all datasets
// "vocabulary": <class 'allennlp.data.vocabulary.Vocabulary'> (configurable) (default: None ) // vocabulary options
}
(allennlp) OS-XImage:allennlp joelg$ allennlp configure allennlp.data.dataset_readers.dataset_reader.DatasetReader
DatasetReader is an abstract base class, choose one of the following subclasses:
allennlp.data.dataset_readers.conll2003.Conll2003DatasetReader
allennlp.data.dataset_readers.coreference_resolution.conll.ConllCorefReader
allennlp.data.dataset_readers.coreference_resolution.winobias.WinobiasReader
allennlp.data.dataset_readers.language_modeling.LanguageModelingReader
allennlp.data.dataset_readers.nlvr.NlvrDatasetReader
allennlp.data.dataset_readers.penn_tree_bank.PennTreeBankConstituencySpanDatasetReader
allennlp.data.dataset_readers.reading_comprehension.squad.SquadReader
allennlp.data.dataset_readers.reading_comprehension.triviaqa.TriviaQaReader
allennlp.data.dataset_readers.semantic_role_labeling.SrlReader
allennlp.data.dataset_readers.seq2seq.Seq2SeqDatasetReader
allennlp.data.dataset_readers.sequence_tagging.SequenceTaggingDatasetReader
allennlp.data.dataset_readers.snli.SnliReader
allennlp.data.dataset_readers.stanford_sentiment_tree_bank.StanfordSentimentTreeBankDatasetReader
allennlp.data.dataset_readers.wikitables.WikiTablesDatasetReader
(allennlp) OS-XImage:allennlp joelg$ allennlp configure allennlp.data.dataset_readers.conll2003.Conll2003DatasetReader
configuration stub for allennlp.data.dataset_readers.conll2003.Conll2003DatasetReader:
{
"type": "conll2003",
// "token_indexers": typing.Dict[str, allennlp.data.token_indexers.token_indexer.TokenIndexer] (default: None )
// "tag_label": <class 'str'> (default: ner )
// "feature_labels": typing.Sequence[str] (default: () )
// "lazy": <class 'bool'> (default: False )
// "coding_scheme": <class 'str'> (default: IOB1 )
}
I am hesitant about your suggestion because class names are not globally unique. They are (currently) in our codebase, but you could easily imagine that someone implements their own BasicIterator
or whatever, and then what does the configurator do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would be that hard to either identify the duplication and ask the user to provide an additional path - e.g if you look at globals
or sys.modules or something, you could filter for allennlp and the name of any imported packages (if you pass the name to this function). Then you can build a dictionary of {module: List[classes]}
, and check this list to find the class, matching on the name. If you find duplicates, either throw an error or return both.
E.g:
import inspect
import allennlp
def get_classes(imported_module, module_to_class: Dict[str, List[class]]):
for name, obj in inspect.getmembers(imported_module):
if inspect.ismodule(obj):
get_classes(obj)
elif inspect.isclass(obj):
module_to_class[obj.__module__].append(obj)
class_mapping = defaultdict(list)
get_classes(allennlp, class_mapping)
And then do some kind of matching on first the name of the class, and then the name of the module or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But feel free to leave that for another PR/TODO/Issue - maybe it's more complicated to do the matching part.
|
||
return choices | ||
|
||
def configure(full_path: str = '') -> Union[Config, List[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easy for this to support directories as well - it would be great if passing a directory contained within allennlp resulted in something like:
allennlp configure models
Found Registerable "Model" with subclasses:
SemanticRoleLabeller
CoreferenceResolver
Bidaf
...
I think this is easy to add - just check if the full path is a directory in allennlp and then list the Registerable classes and Types. Feel free to leave to another PR though.
|
||
parts = full_path.split(".") | ||
class_name = parts[-1] | ||
module_name = ".".join(parts[:-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But feel free to leave that for another PR/TODO/Issue - maybe it's more complicated to do the matching part.
…into config-explorer
…into config-explorer
This reverts commit 03d6fad.
* config explorer * wip * wip * progress * work * config tool * configuration * configuration command line tool * add test for configure command * fix test collisions * fix docs * scope suppress FutureWarning to only h5py imports * fix import ordering * fix docs * fix docs (again)
I believe this is the right design to make it easy to add a Single-Page-App version.
I disabled the h5py warning because it was printing at the beginning of every configuration stub.
Let me know if you hate the 1337 variable names, it seemed like the easiest way to deal with
class
andtype
.