[Tests] Local => aws #123

patrickvonplaten · 2020-05-15T09:12:25Z

Change default Test from local => aws

As a default we set aws=True, Local=False, slow=False

1. RUN_AWS=1 (default)

This runs 4 tests per dataset script.

a) Does the dataset script have a valid etag / Can it be reached on AWS?
b) Can we load its builder_class?
c) Can we load all dataset configs?
d) Most importantly: Can we load the dataset?

Important - we currently only test the first config of each dataset to reduce test time. Total test time is around 1min20s.

2. RUN_LOCAL=1 RUN_AWS=0

This should be done when debugging dataset scripts of the ./datasets folder

This only runs 1 test per dataset test, which is equivalent to aws d) - Can we load the dataset from the local datasets directory?

3. RUN_SLOW=1

We should set up to run these tests maybe 1 time per week ? @thomwolf

The slow tests include two more important tests.

e) Can we load the dataset with all possible configs? This test will probably fail at the moment because a lot of dummy data is missing. We should add the dummy data step by step to be sure that all configs work.

f) Test that the actual dataset can be loaded. This will take quite some time to run, but is important to make sure that the "real" data can be loaded. It will also test whether the dataset script has the correct checksums file which is currently not tested with aws=True. @lhoestq - is there an easy way to check cheaply whether the dataset_info.json is correct for each dataset script?

lhoestq · 2020-05-15T09:46:07Z

For each dataset, If there exist a dataset_info.json, then the command nlp-cli test path/to/my/dataset --al_configs is successful only if the dataset_infos.json is correct. The infos are correct if the size and checksums of the downloaded file are correct, and if the number of examples in each split are correct.

Note: the test command is supposed to test the script, that's why it runs the script even if the cached files already exist. Let me know if it's good to you.

patrickvonplaten · 2020-05-15T10:03:14Z

For each dataset, If there exist a dataset_info.json, then the command nlp-cli test path/to/my/dataset --al_configs is successful only if the dataset_infos.json is correct. The infos are correct if the size and checksums of the downloaded file are correct, and if the number of examples in each split are correct.

Note: the test command is supposed to test the script, that's why it runs the script even if the cached files already exist. Let me know if it's good to you.

Does it have to download the whole data to check if the checksums are correct? I guess so no?

lhoestq · 2020-05-15T10:06:11Z

For each dataset, If there exist a dataset_info.json, then the command nlp-cli test path/to/my/dataset --al_configs is successful only if the dataset_infos.json is correct. The infos are correct if the size and checksums of the downloaded file are correct, and if the number of examples in each split are correct.
Note: the test command is supposed to test the script, that's why it runs the script even if the cached files already exist. Let me know if it's good to you.

Does it have to download the whole data to check if the checksums are correct? I guess so no?

Yes it has to download them all (unless they were already downloaded in which case it just uses the cached downloaded files).

patrickvonplaten changed the title ~~[Tests] default to aws tests~~ [Tests] Local => AWS May 15, 2020

patrickvonplaten changed the title ~~[Tests] Local => AWS~~ [Tests] Local => aws May 15, 2020

patrickvonplaten added 2 commits May 15, 2020 11:31

default to aws tests

3413835

correct new id name

52fb764

patrickvonplaten force-pushed the release_aws_tests branch from 9168166 to 52fb764 Compare May 15, 2020 09:32

patrickvonplaten requested review from thomwolf, mariamabarham, lhoestq and jplu May 15, 2020 09:32

correct description

d2fd48d

patrickvonplaten merged commit 7d97b83 into master May 15, 2020

patrickvonplaten deleted the release_aws_tests branch May 15, 2020 10:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tests] Local => aws #123

[Tests] Local => aws #123

patrickvonplaten commented May 15, 2020 •

edited

Loading

lhoestq commented May 15, 2020

patrickvonplaten commented May 15, 2020

lhoestq commented May 15, 2020

[Tests] Local => aws #123

[Tests] Local => aws #123

Conversation

patrickvonplaten commented May 15, 2020 • edited Loading

Change default Test from local => aws

1. RUN_AWS=1 (default)

2. RUN_LOCAL=1 RUN_AWS=0

3. RUN_SLOW=1

lhoestq commented May 15, 2020

patrickvonplaten commented May 15, 2020

lhoestq commented May 15, 2020

patrickvonplaten commented May 15, 2020 •

edited

Loading