Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a dependencies method to LambdaOp #498

Merged
merged 1 commit into from
Dec 16, 2020

Conversation

rnyak
Copy link
Contributor

@rnyak rnyak commented Dec 16, 2020

Current lambdaop isn't defining dependencies. This PR is adding a dependencies method to lambdaop.

@benfred benfred merged commit fd3f35a into NVIDIA-Merlin:new_api Dec 16, 2020
karlhigley added a commit that referenced this pull request Feb 8, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* support min

* permutate index

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
karlhigley added a commit that referenced this pull request Mar 18, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* fix tagas

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
karlhigley added a commit that referenced this pull request Mar 21, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* update download

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Julio Perez <37191411+jperez999@users.noreply.github.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
karlhigley added a commit that referenced this pull request Mar 23, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* aws sagemaker

* Update cloud_integration.md

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
karlhigley added a commit that referenced this pull request Mar 31, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* add TS

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
karlhigley added a commit that referenced this pull request Apr 5, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* criteo update

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
karlhigley pushed a commit that referenced this pull request Apr 22, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* add comma to ps.json

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
mikemckiernan pushed a commit that referenced this pull request Nov 24, 2022
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* support min

* permutate index

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants