Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

command-line configuration explorer #1309

Merged
merged 30 commits into from
Jun 1, 2018
Merged

Conversation

joelgrus
Copy link
Contributor

I believe this is the right design to make it easy to add a Single-Page-App version.

I disabled the h5py warning because it was printing at the beginning of every configuration stub.

Let me know if you hate the 1337 variable names, it seemed like the easiest way to deal with class and type.

@@ -5,6 +5,11 @@
if sys.version_info < (3, 6):
raise RuntimeError("AllenNLP requires Python 3.6 or later")

# Disable FutureWarnings raised by h5py
# TODO(joelgrus): remove this (and pin requirements) when h5py 2.8.0 is available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will flatten the warnings that I added for the predictors - you can also use warnings as a context manager, can we be smarter about this somehow?

h5py is only used in the following files, so maybe either adding context managers for the h5py import or importing h5py at the top level, such that this warning wouldn't be triggered repeatedly would be better?

allennlp/commands/elmo.py:The output is a hdf5 file (<http://docs.h5py.org/en/latest/>) where, with the --all flag, each
allennlp/commands/elmo.py:import h5py
allennlp/commands/elmo.py:        with h5py.File(output_file_path, 'w') as fout:
allennlp/modules/elmo.py:import h5py
allennlp/modules/elmo.py:        with h5py.File(cached_path(self._weight_file), 'r') as fin:
allennlp/modules/elmo.py:            with h5py.File(cached_path(self._weight_file), 'r') as fin:
allennlp/modules/elmo.py:            with h5py.File(cached_path(self._weight_file), 'r') as fin:
allennlp/modules/elmo.py:        with h5py.File(cached_path(self._weight_file), 'r') as fin:
allennlp/modules/elmo_lstm.py:import h5py
allennlp/modules/elmo_lstm.py:        with h5py.File(cached_path(weight_file), 'r') as fin:
allennlp/modules/token_embedders/embedding.py:import h5py
allennlp/modules/token_embedders/embedding.py:    with h5py.File(embeddings_filename, 'r') as fin:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I should be able to just do it where h5py gets imported, I was being lazy

@@ -60,7 +65,7 @@ def main(prog: str = None,
# so give the user some help.
if 'func' in dir(args):
# Import any additional modules needed (to register custom classes).
for package_name in args.include_package:
for package_name in getattr(args, 'include_package', ()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking if the name wasn't "configure" as you did above would be more explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no "name" at this point, though. you'd have to do something like

if args.func.__name__ == "_configure"

which feels hackier than this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yep my bad.


parts = full_path.split(".")
class_name = parts[-1]
module_name = ".".join(parts[:-1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break if a user hasn't added imports of their classes to the __init__.py of their module (I think). Is there any way we can be more robust to that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code assumes people always specify the full path:

allennlp.data.dataset_readers.snli.SnliReader

which is how they're exposed here -- that is, if you ask it for the possible types of dataset readers, it will return those paths rather than the ones one level above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right that is non-obvious. Why is this the case? It would be really annoying to have to write out the full path, when it isn't required to find the registerable class.

I would expect the API for allennlp components to be allennlp configure SnliReader. I think this is possible if you just add back the "--include_package" flag and remove the importlib usage here, right? This seems better as it is 1) less verbose and 2) uses the package import method that the rest of the commands use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was imagining a workflow that looks something like this, where each time you're choosing one of the full paths from the previous step:

(allennlp) OS-XImage:allennlp joelg$ allennlp configure

configuration stub for AllenNLP:

{
    "dataset_reader": <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> (configurable) // specify your dataset reader here
    // "validation_dataset_reader": <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> (configurable) (default: None ) // same as dataset_reader by default
    "train_data_path": <class 'str'> // path to the training data
    // "validation_data_path": <class 'str'> (default: None ) // path to the validation data
    // "test_data_path": <class 'str'> (default: None ) // path to the test data (you probably don't want to use this!)
    // "evaluate_on_test": <class 'bool'> (default: False ) // whether to evaluate on the test dataset at the end of training (don't do it!
    "model": <class 'allennlp.models.model.Model'> (configurable) // specify your model here
    "iterator": <class 'allennlp.data.iterators.data_iterator.DataIterator'> (configurable) // specify your data iterator here
    "trainer": <class 'allennlp.training.trainer.Trainer'> (configurable) // specify the trainer parameters here
    // "datasets_for_vocab_creation": typing.List[str] (default: None ) // if not specified, use all datasets
    // "vocabulary": <class 'allennlp.data.vocabulary.Vocabulary'> (configurable) (default: None ) // vocabulary options
}


(allennlp) OS-XImage:allennlp joelg$ allennlp configure allennlp.data.dataset_readers.dataset_reader.DatasetReader

DatasetReader is an abstract base class, choose one of the following subclasses:

	 allennlp.data.dataset_readers.conll2003.Conll2003DatasetReader
	 allennlp.data.dataset_readers.coreference_resolution.conll.ConllCorefReader
	 allennlp.data.dataset_readers.coreference_resolution.winobias.WinobiasReader
	 allennlp.data.dataset_readers.language_modeling.LanguageModelingReader
	 allennlp.data.dataset_readers.nlvr.NlvrDatasetReader
	 allennlp.data.dataset_readers.penn_tree_bank.PennTreeBankConstituencySpanDatasetReader
	 allennlp.data.dataset_readers.reading_comprehension.squad.SquadReader
	 allennlp.data.dataset_readers.reading_comprehension.triviaqa.TriviaQaReader
	 allennlp.data.dataset_readers.semantic_role_labeling.SrlReader
	 allennlp.data.dataset_readers.seq2seq.Seq2SeqDatasetReader
	 allennlp.data.dataset_readers.sequence_tagging.SequenceTaggingDatasetReader
	 allennlp.data.dataset_readers.snli.SnliReader
	 allennlp.data.dataset_readers.stanford_sentiment_tree_bank.StanfordSentimentTreeBankDatasetReader
	 allennlp.data.dataset_readers.wikitables.WikiTablesDatasetReader

(allennlp) OS-XImage:allennlp joelg$ allennlp configure allennlp.data.dataset_readers.conll2003.Conll2003DatasetReader

configuration stub for allennlp.data.dataset_readers.conll2003.Conll2003DatasetReader:

{
    "type": "conll2003",
    // "token_indexers": typing.Dict[str, allennlp.data.token_indexers.token_indexer.TokenIndexer] (default: None )
    // "tag_label": <class 'str'> (default: ner )
    // "feature_labels": typing.Sequence[str] (default: () )
    // "lazy": <class 'bool'> (default: False )
    // "coding_scheme": <class 'str'> (default: IOB1 )
}

I am hesitant about your suggestion because class names are not globally unique. They are (currently) in our codebase, but you could easily imagine that someone implements their own BasicIterator or whatever, and then what does the configurator do?

Copy link
Contributor

@DeNeutoy DeNeutoy May 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be that hard to either identify the duplication and ask the user to provide an additional path - e.g if you look at globals or sys.modules or something, you could filter for allennlp and the name of any imported packages (if you pass the name to this function). Then you can build a dictionary of {module: List[classes]}, and check this list to find the class, matching on the name. If you find duplicates, either throw an error or return both.

E.g:

import inspect
import allennlp
def get_classes(imported_module, module_to_class: Dict[str, List[class]]):
    for name, obj in inspect.getmembers(imported_module):
        if inspect.ismodule(obj):
	        get_classes(obj)
	    elif inspect.isclass(obj):
            module_to_class[obj.__module__].append(obj)

class_mapping = defaultdict(list)
get_classes(allennlp, class_mapping)

And then do some kind of matching on first the name of the class, and then the name of the module or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But feel free to leave that for another PR/TODO/Issue - maybe it's more complicated to do the matching part.


return choices

def configure(full_path: str = '') -> Union[Config, List[str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easy for this to support directories as well - it would be great if passing a directory contained within allennlp resulted in something like:

allennlp configure models

Found Registerable "Model" with subclasses:
SemanticRoleLabeller
CoreferenceResolver
Bidaf
...

I think this is easy to add - just check if the full path is a directory in allennlp and then list the Registerable classes and Types. Feel free to leave to another PR though.

@matt-gardner
Copy link
Contributor

@joelgrus, if you want me specifically to look at it, I'm happy to, but @DeNeutoy has already looked at it, and I'm also happy just leaving it to him.


parts = full_path.split(".")
class_name = parts[-1]
module_name = ".".join(parts[:-1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But feel free to leave that for another PR/TODO/Issue - maybe it's more complicated to do the matching part.

@joelgrus joelgrus merged commit 03d6fad into allenai:master Jun 1, 2018
@joelgrus joelgrus deleted the config-explorer branch June 1, 2018 00:55
easonnie added a commit to easonnie/allennlp that referenced this pull request Jun 2, 2018
gabrielStanovsky pushed a commit to gabrielStanovsky/allennlp that referenced this pull request Sep 7, 2018
* config explorer

* wip

* wip

* progress

* work

* config tool

* configuration

* configuration command line tool

* add test for configure command

* fix test collisions

* fix docs

* scope suppress FutureWarning to only h5py imports

* fix import ordering

* fix docs

* fix docs (again)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants