Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change default Test from local => aws
As a default we set
aws=True
,Local=False
,slow=False
1. RUN_AWS=1 (default)
This runs 4 tests per dataset script.
a) Does the dataset script have a valid etag / Can it be reached on AWS?
b) Can we load its
builder_class
?c) Can we load all dataset configs?
d) Most importantly: Can we load the dataset?
Important - we currently only test the first config of each dataset to reduce test time. Total test time is around 1min20s.
2. RUN_LOCAL=1 RUN_AWS=0
This should be done when debugging dataset scripts of the ./datasets folder
This only runs 1 test per dataset test, which is equivalent to aws d) - Can we load the dataset from the local
datasets
directory?3. RUN_SLOW=1
We should set up to run these tests maybe 1 time per week ? @thomwolf
The
slow
tests include two more important tests.e) Can we load the dataset with all possible configs? This test will probably fail at the moment because a lot of dummy data is missing. We should add the dummy data step by step to be sure that all configs work.
f) Test that the actual dataset can be loaded. This will take quite some time to run, but is important to make sure that the "real" data can be loaded. It will also test whether the dataset script has the correct checksums file which is currently not tested with
aws=True
. @lhoestq - is there an easy way to check cheaply whether thedataset_info.json
is correct for each dataset script?