Unittest ops + bugfix in Bucketize #496

bschifferer · 2020-12-15T12:55:45Z

Updates unittest for ops

The unittest fails, when I call them at once, but if I call the failed unittest individually, they run successful

pytest tests/unit/test_ops.py
================================================= short test summary info =================================================
FAILED tests/unit/test_ops.py::test_hash_bucket_lists - concurrent.futures._base.CancelledError
FAILED tests/unit/test_ops.py::test_categorify_lists[0] - concurrent.futures._base.CancelledError
FAILED tests/unit/test_ops.py::test_categorify_lists[1] - concurrent.futures._base.CancelledError
FAILED tests/unit/test_ops.py::test_categorify_lists[2] - concurrent.futures._base.CancelledError
======================================= 4 failed, 153 passed, 3 warnings in 57.00s ========================================

(rapids) root@3afe3038599f:/workspace/01_NVT/16_NewAPI_unit/NVTabular# pytest tests/unit/test_ops.py::test_hash_bucket_lists
=================================================== test session starts ===================================================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /workspace/01_NVT/16_NewAPI_unit/NVTabular, configfile: setup.cfg
plugins: asyncio-0.12.0, benchmark-3.2.3, timeout-1.4.2, hypothesis-5.37.4, cov-2.10.1
collected 1 item

tests/unit/test_ops.py .                                                                                            [100%]

==================================================== 1 passed in 1.22s ====================================================
(rapids) root@3afe3038599f:/workspace/01_NVT/16_NewAPI_unit/NVTabular#

(rapids) root@3afe3038599f:/workspace/01_NVT/16_NewAPI_unit/NVTabular# pytest tests/unit/test_ops.py::test_categorify_lists=================================================== test session starts ===================================================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /workspace/01_NVT/16_NewAPI_unit/NVTabular, configfile: setup.cfg
plugins: asyncio-0.12.0, benchmark-3.2.3, timeout-1.4.2, hypothesis-5.37.4, cov-2.10.1
collected 3 items

tests/unit/test_ops.py ...                                                                                          [100%]

==================================================== 3 passed in 2.72s ====================================================
(rapids) root@3afe3038599f:/workspace/01_NVT/16_NewAPI_unit/NVTabular#

…w_api_testops

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * support min * permutate index Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * fix tagas Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * update download Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Julio Perez <37191411+jperez999@users.noreply.github.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * aws sagemaker * Update cloud_integration.md Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * add TS Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * criteo update Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * add comma to ps.json Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * support min * permutate index Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

bschifferer added 2 commits December 15, 2020 10:09

test_minmix

7d99e91

updates test

8db8ecb

bschifferer mentioned this pull request Dec 15, 2020

API Overhaul #491

Merged

38 tasks

bschifferer added 2 commits December 16, 2020 14:49

Merge branch 'new_api' of https://github.com/NVIDIA/NVTabular into ne…

cda2025

…w_api_testops

unittest ops

9f01392

bschifferer changed the title ~~[WIP] Unittest ops + bugfix in Bucketize~~ Unittest ops + bugfix in Bucketize Dec 16, 2020

benfred approved these changes Dec 16, 2020

View reviewed changes

benfred merged commit f216edf into NVIDIA-Merlin:new_api Dec 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unittest ops + bugfix in Bucketize #496

Unittest ops + bugfix in Bucketize #496

bschifferer commented Dec 15, 2020 •

edited

Loading

Unittest ops + bugfix in Bucketize #496

Unittest ops + bugfix in Bucketize #496

Conversation

bschifferer commented Dec 15, 2020 • edited Loading

bschifferer commented Dec 15, 2020 •

edited

Loading