command-line configuration explorer #1309

joelgrus · 2018-05-30T20:00:59Z

I believe this is the right design to make it easy to add a Single-Page-App version.

I disabled the h5py warning because it was printing at the beginning of every configuration stub.

Let me know if you hate the 1337 variable names, it seemed like the easiest way to deal with class and type.

DeNeutoy · 2018-05-31T16:52:09Z

allennlp/__init__.py

@@ -5,6 +5,11 @@
 if sys.version_info < (3, 6):
    raise RuntimeError("AllenNLP requires Python 3.6 or later")

+# Disable FutureWarnings raised by h5py
+# TODO(joelgrus): remove this (and pin requirements) when h5py 2.8.0 is available.


This will flatten the warnings that I added for the predictors - you can also use warnings as a context manager, can we be smarter about this somehow?

h5py is only used in the following files, so maybe either adding context managers for the h5py import or importing h5py at the top level, such that this warning wouldn't be triggered repeatedly would be better?

allennlp/commands/elmo.py:The output is a hdf5 file (<http://docs.h5py.org/en/latest/>) where, with the --all flag, each allennlp/commands/elmo.py:import h5py allennlp/commands/elmo.py: with h5py.File(output_file_path, 'w') as fout: allennlp/modules/elmo.py:import h5py allennlp/modules/elmo.py: with h5py.File(cached_path(self._weight_file), 'r') as fin: allennlp/modules/elmo.py: with h5py.File(cached_path(self._weight_file), 'r') as fin: allennlp/modules/elmo.py: with h5py.File(cached_path(self._weight_file), 'r') as fin: allennlp/modules/elmo.py: with h5py.File(cached_path(self._weight_file), 'r') as fin: allennlp/modules/elmo_lstm.py:import h5py allennlp/modules/elmo_lstm.py: with h5py.File(cached_path(weight_file), 'r') as fin: allennlp/modules/token_embedders/embedding.py:import h5py allennlp/modules/token_embedders/embedding.py: with h5py.File(embeddings_filename, 'r') as fin:

yeah, I should be able to just do it where h5py gets imported, I was being lazy

DeNeutoy · 2018-05-31T16:54:28Z

allennlp/commands/__init__.py

@@ -60,7 +65,7 @@ def main(prog: str = None,
    # so give the user some help.
    if 'func' in dir(args):
        # Import any additional modules needed (to register custom classes).
-        for package_name in args.include_package:
+        for package_name in getattr(args, 'include_package', ()):


Checking if the name wasn't "configure" as you did above would be more explicit

there's no "name" at this point, though. you'd have to do something like

if args.func.__name__ == "_configure"

which feels hackier than this

oh, yep my bad.

DeNeutoy · 2018-05-31T17:05:40Z

allennlp/common/configuration.py

+
+    parts = full_path.split(".")
+    class_name = parts[-1]
+    module_name = ".".join(parts[:-1])


This will break if a user hasn't added imports of their classes to the __init__.py of their module (I think). Is there any way we can be more robust to that?

the code assumes people always specify the full path:

allennlp.data.dataset_readers.snli.SnliReader

which is how they're exposed here -- that is, if you ask it for the possible types of dataset readers, it will return those paths rather than the ones one level above

Ah right that is non-obvious. Why is this the case? It would be really annoying to have to write out the full path, when it isn't required to find the registerable class.

I would expect the API for allennlp components to be allennlp configure SnliReader. I think this is possible if you just add back the "--include_package" flag and remove the importlib usage here, right? This seems better as it is 1) less verbose and 2) uses the package import method that the rest of the commands use.

I was imagining a workflow that looks something like this, where each time you're choosing one of the full paths from the previous step:

(allennlp) OS-XImage:allennlp joelg$ allennlp configure configuration stub for AllenNLP: { "dataset_reader": <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> (configurable) // specify your dataset reader here // "validation_dataset_reader": <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> (configurable) (default: None ) // same as dataset_reader by default "train_data_path": <class 'str'> // path to the training data // "validation_data_path": <class 'str'> (default: None ) // path to the validation data // "test_data_path": <class 'str'> (default: None ) // path to the test data (you probably don't want to use this!) // "evaluate_on_test": <class 'bool'> (default: False ) // whether to evaluate on the test dataset at the end of training (don't do it! "model": <class 'allennlp.models.model.Model'> (configurable) // specify your model here "iterator": <class 'allennlp.data.iterators.data_iterator.DataIterator'> (configurable) // specify your data iterator here "trainer": <class 'allennlp.training.trainer.Trainer'> (configurable) // specify the trainer parameters here // "datasets_for_vocab_creation": typing.List[str] (default: None ) // if not specified, use all datasets // "vocabulary": <class 'allennlp.data.vocabulary.Vocabulary'> (configurable) (default: None ) // vocabulary options } (allennlp) OS-XImage:allennlp joelg$ allennlp configure allennlp.data.dataset_readers.dataset_reader.DatasetReader DatasetReader is an abstract base class, choose one of the following subclasses: allennlp.data.dataset_readers.conll2003.Conll2003DatasetReader allennlp.data.dataset_readers.coreference_resolution.conll.ConllCorefReader allennlp.data.dataset_readers.coreference_resolution.winobias.WinobiasReader allennlp.data.dataset_readers.language_modeling.LanguageModelingReader allennlp.data.dataset_readers.nlvr.NlvrDatasetReader allennlp.data.dataset_readers.penn_tree_bank.PennTreeBankConstituencySpanDatasetReader allennlp.data.dataset_readers.reading_comprehension.squad.SquadReader allennlp.data.dataset_readers.reading_comprehension.triviaqa.TriviaQaReader allennlp.data.dataset_readers.semantic_role_labeling.SrlReader allennlp.data.dataset_readers.seq2seq.Seq2SeqDatasetReader allennlp.data.dataset_readers.sequence_tagging.SequenceTaggingDatasetReader allennlp.data.dataset_readers.snli.SnliReader allennlp.data.dataset_readers.stanford_sentiment_tree_bank.StanfordSentimentTreeBankDatasetReader allennlp.data.dataset_readers.wikitables.WikiTablesDatasetReader (allennlp) OS-XImage:allennlp joelg$ allennlp configure allennlp.data.dataset_readers.conll2003.Conll2003DatasetReader configuration stub for allennlp.data.dataset_readers.conll2003.Conll2003DatasetReader: { "type": "conll2003", // "token_indexers": typing.Dict[str, allennlp.data.token_indexers.token_indexer.TokenIndexer] (default: None ) // "tag_label": <class 'str'> (default: ner ) // "feature_labels": typing.Sequence[str] (default: () ) // "lazy": <class 'bool'> (default: False ) // "coding_scheme": <class 'str'> (default: IOB1 ) }

I am hesitant about your suggestion because class names are not globally unique. They are (currently) in our codebase, but you could easily imagine that someone implements their own BasicIterator or whatever, and then what does the configurator do?

I don't think it would be that hard to either identify the duplication and ask the user to provide an additional path - e.g if you look at globals or sys.modules or something, you could filter for allennlp and the name of any imported packages (if you pass the name to this function). Then you can build a dictionary of {module: List[classes]}, and check this list to find the class, matching on the name. If you find duplicates, either throw an error or return both.

E.g:

import inspect import allennlp def get_classes(imported_module, module_to_class: Dict[str, List[class]]): for name, obj in inspect.getmembers(imported_module): if inspect.ismodule(obj): get_classes(obj) elif inspect.isclass(obj): module_to_class[obj.__module__].append(obj) class_mapping = defaultdict(list) get_classes(allennlp, class_mapping)

And then do some kind of matching on first the name of the class, and then the name of the module or something.

But feel free to leave that for another PR/TODO/Issue - maybe it's more complicated to do the matching part.

DeNeutoy · 2018-05-31T17:09:35Z

allennlp/common/configuration.py

+
+    return choices
+
+def configure(full_path: str = '') -> Union[Config, List[str]]:


It would be easy for this to support directories as well - it would be great if passing a directory contained within allennlp resulted in something like:

allennlp configure models Found Registerable "Model" with subclasses: SemanticRoleLabeller CoreferenceResolver Bidaf ...

I think this is easy to add - just check if the full path is a directory in allennlp and then list the Registerable classes and Types. Feel free to leave to another PR though.

matt-gardner · 2018-05-31T18:49:25Z

@joelgrus, if you want me specifically to look at it, I'm happy to, but @DeNeutoy has already looked at it, and I'm also happy just leaving it to him.

…into config-explorer

DeNeutoy · 2018-05-31T20:32:06Z

allennlp/common/configuration.py

+
+    parts = full_path.split(".")
+    class_name = parts[-1]
+    module_name = ".".join(parts[:-1])


But feel free to leave that for another PR/TODO/Issue - maybe it's more complicated to do the matching part.

…into config-explorer

This reverts commit 03d6fad.

* config explorer * wip * wip * progress * work * config tool * configuration * configuration command line tool * add test for configure command * fix test collisions * fix docs * scope suppress FutureWarning to only h5py imports * fix import ordering * fix docs * fix docs (again)

joelgrus added 15 commits January 17, 2018 10:37

config explorer

fe51c1b

wip

6228659

wip

b65e46a

progress

89c405e

work

c8a3d14

Merge branch 'master' into config-explorer

71694cf

config tool

a8660fb

Merge branch 'master' into config-explorer

3b2ab16

Merge branch 'master' into config-explorer

ad613ab

configuration

e5061ce

Merge branch 'master' into config-explorer

eb2ca4e

configuration command line tool

d6d35ea

Merge branch 'master' into config-explorer

2848d8a

add test for configure command

917bcd4

fix test collisions

1cdd195

joelgrus requested a review from matt-gardner May 30, 2018 20:01

fix docs

7b68a9d

schmmd assigned matt-gardner May 31, 2018

DeNeutoy reviewed May 31, 2018

View reviewed changes

matt-gardner assigned DeNeutoy and unassigned matt-gardner May 31, 2018

scope suppress FutureWarning to only h5py imports

64f8af1

joelgrus added 2 commits May 31, 2018 11:55

Merge branch 'master' into config-explorer

d1f25a8

Merge branch 'config-explorer' of https://github.com/joelgrus/allennlp …

cd12c61

…into config-explorer

DeNeutoy approved these changes May 31, 2018

View reviewed changes

joelgrus added 4 commits May 31, 2018 14:40

fix import ordering

a046700

Merge branch 'master' into config-explorer

2501bf4

Merge branch 'master' into config-explorer

5031825

Merge branch 'master' into config-explorer

592be4b

joelgrus added 7 commits May 31, 2018 15:57

Merge branch 'master' into config-explorer

9705b32

fix docs

1c8c16c

Merge branch 'config-explorer' of https://github.com/joelgrus/allennlp …

c89671d

…into config-explorer

Merge branch 'master' into config-explorer

de682aa

fix docs (again)

473b325

Merge branch 'config-explorer' of https://github.com/joelgrus/allennlp …

c0cde6c

…into config-explorer

Merge branch 'master' into config-explorer

0ecf5be

joelgrus merged commit 03d6fad into allenai:master Jun 1, 2018

joelgrus deleted the config-explorer branch June 1, 2018 00:55

easonnie added a commit to easonnie/allennlp that referenced this pull request Jun 2, 2018

Revert "command-line configuration explorer (allenai#1309)"

e495235

This reverts commit 03d6fad.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

command-line configuration explorer #1309

command-line configuration explorer #1309

joelgrus commented May 30, 2018

DeNeutoy May 31, 2018

joelgrus May 31, 2018

DeNeutoy May 31, 2018

joelgrus May 31, 2018

DeNeutoy May 31, 2018

DeNeutoy May 31, 2018

joelgrus May 31, 2018

DeNeutoy May 31, 2018

joelgrus May 31, 2018

DeNeutoy May 31, 2018 •

edited

Loading

DeNeutoy May 31, 2018

DeNeutoy May 31, 2018

matt-gardner commented May 31, 2018

DeNeutoy May 31, 2018


		return choices

		def configure(full_path: str = '') -> Union[Config, List[str]]:

command-line configuration explorer #1309

command-line configuration explorer #1309

Conversation

joelgrus commented May 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DeNeutoy May 31, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matt-gardner commented May 31, 2018

Choose a reason for hiding this comment

DeNeutoy May 31, 2018 •

edited

Loading