add a dependencies method to LambdaOp #498

rnyak · 2020-12-16T00:05:52Z

Current lambdaop isn't defining dependencies. This PR is adding a dependencies method to lambdaop.

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * support min * permutate index Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * fix tagas Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * update download Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Julio Perez <37191411+jperez999@users.noreply.github.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * aws sagemaker * Update cloud_integration.md Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * add TS Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * criteo update Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * add comma to ps.json Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com>

* API Overhaul First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc. * remove debug print statement * Fix test_io unittest Also partially fix some tests inside test_workflow * Handle multi-column joint/combo categorify * Update JoinGroupby * Fix differencelag * add dependencies method (#498) * Convert TargetEncoding op * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Update nvtabular/workflow.py Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> * Remove workflow code from dataloaders We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now * Unittest ops + bugfix in Bucketize (#496) * test_minmix * updates test * unittest ops * First draft get_embedding_sizes support Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns) * isort * Remove groupbystatistics * implement serialization of statistics add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate * Fix TF dataloader unittests * test_torch_dataloader fixes * doc strings * support min * permutate index Co-authored-by: Ben Frederickson <github@benfrederickson.com> Co-authored-by: rnyak <ronayak@hotmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: root <root@dgx06.aselab.nvidia.com> Co-authored-by: Karl Higley <kmhigley@gmail.com>

add dependencies method

ff58949

benfred approved these changes Dec 16, 2020

View reviewed changes

benfred merged commit fd3f35a into NVIDIA-Merlin:new_api Dec 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a dependencies method to LambdaOp #498

add a dependencies method to LambdaOp #498

rnyak commented Dec 16, 2020

add a dependencies method to LambdaOp #498

add a dependencies method to LambdaOp #498

Conversation

rnyak commented Dec 16, 2020