Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unittest ops + bugfix in Bucketize #496

Merged
merged 4 commits into from
Dec 16, 2020

Conversation

bschifferer
Copy link
Contributor

@bschifferer bschifferer commented Dec 15, 2020

Updates unittest for ops

The unittest fails, when I call them at once, but if I call the failed unittest individually, they run successful

pytest tests/unit/test_ops.py
================================================= short test summary info =================================================
FAILED tests/unit/test_ops.py::test_hash_bucket_lists - concurrent.futures._base.CancelledError
FAILED tests/unit/test_ops.py::test_categorify_lists[0] - concurrent.futures._base.CancelledError
FAILED tests/unit/test_ops.py::test_categorify_lists[1] - concurrent.futures._base.CancelledError
FAILED tests/unit/test_ops.py::test_categorify_lists[2] - concurrent.futures._base.CancelledError
======================================= 4 failed, 153 passed, 3 warnings in 57.00s ========================================

(rapids) root@3afe3038599f:/workspace/01_NVT/16_NewAPI_unit/NVTabular# pytest tests/unit/test_ops.py::test_hash_bucket_lists
=================================================== test session starts ===================================================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /workspace/01_NVT/16_NewAPI_unit/NVTabular, configfile: setup.cfg
plugins: asyncio-0.12.0, benchmark-3.2.3, timeout-1.4.2, hypothesis-5.37.4, cov-2.10.1
collected 1 item

tests/unit/test_ops.py .                                                                                            [100%]

==================================================== 1 passed in 1.22s ====================================================
(rapids) root@3afe3038599f:/workspace/01_NVT/16_NewAPI_unit/NVTabular#

(rapids) root@3afe3038599f:/workspace/01_NVT/16_NewAPI_unit/NVTabular# pytest tests/unit/test_ops.py::test_categorify_lists=================================================== test session starts ===================================================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /workspace/01_NVT/16_NewAPI_unit/NVTabular, configfile: setup.cfg
plugins: asyncio-0.12.0, benchmark-3.2.3, timeout-1.4.2, hypothesis-5.37.4, cov-2.10.1
collected 3 items

tests/unit/test_ops.py ...                                                                                          [100%]

==================================================== 3 passed in 2.72s ====================================================
(rapids) root@3afe3038599f:/workspace/01_NVT/16_NewAPI_unit/NVTabular#

@bschifferer bschifferer mentioned this pull request Dec 15, 2020
38 tasks
@bschifferer bschifferer changed the title [WIP] Unittest ops + bugfix in Bucketize Unittest ops + bugfix in Bucketize Dec 16, 2020
@benfred benfred merged commit f216edf into NVIDIA-Merlin:new_api Dec 16, 2020
karlhigley added a commit that referenced this pull request Feb 8, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* support min

* permutate index

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
karlhigley added a commit that referenced this pull request Mar 18, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* fix tagas

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
karlhigley added a commit that referenced this pull request Mar 21, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* update download

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Julio Perez <37191411+jperez999@users.noreply.github.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
karlhigley added a commit that referenced this pull request Mar 23, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* aws sagemaker

* Update cloud_integration.md

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
karlhigley added a commit that referenced this pull request Mar 31, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* add TS

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
karlhigley added a commit that referenced this pull request Apr 5, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* criteo update

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
karlhigley pushed a commit that referenced this pull request Apr 22, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* add comma to ps.json

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
mikemckiernan pushed a commit that referenced this pull request Nov 24, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* support min

* permutate index

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants