Skip to content

Commit

Permalink
Fix bug about criteo download notebook (#1453)
Browse files Browse the repository at this point in the history
* API Overhaul

First draft of the API overhauls changes. Adds most core functionality, including
defining workflow graphs with a ColumnGroup class, the workflow and dataset changes
, most operators converted to use the new api, etc.

* remove debug print statement

* Fix test_io unittest

Also partially fix some tests inside test_workflow

* Handle multi-column joint/combo categorify

* Update JoinGroupby

* Fix differencelag

* add dependencies method (#498)

* Convert TargetEncoding op

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Update nvtabular/workflow.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>

* Remove workflow code from dataloaders

We should be doing online transforms like
```KerasSequenceLoader(workflow.transform(dataset), ...```  instead of
```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now

* Unittest ops + bugfix in Bucketize (#496)

* test_minmix

* updates test

* unittest ops

* First draft get_embedding_sizes support

Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here
(sizes are returned same as single hot, and we don't use this method to distinguish between
multi and singlehot columns)

* isort

* Remove groupbystatistics

* implement serialization of statistics

add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting
called as appropiate

* Fix TF dataloader unittests

* test_torch_dataloader fixes

* doc strings

* update download

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Co-authored-by: root <root@dgx06.aselab.nvidia.com>
Co-authored-by: Julio Perez <37191411+jperez999@users.noreply.github.com>
Co-authored-by: Karl Higley <kmhigley@gmail.com>
  • Loading branch information
7 people authored Mar 21, 2022
1 parent cac92f8 commit 4bb265c
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions examples/scaling-criteo/01-Download-Convert.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -232,14 +232,15 @@
" dtypes[x] = \"hex\"\n",
"\n",
"# Create an NVTabular Dataset from a CSV-file glob\n",
"file_list = glob.glob(os.path.join(INPUT_PATH, \"day_*\"))\n",
"file_list = glob.glob(os.path.join(INPUT_PATH, \"day_*[!.gz]\"))\n",
"dataset = nvt.Dataset(\n",
" file_list,\n",
" engine=\"csv\",\n",
" names=cols,\n",
" part_mem_fraction=frac_size,\n",
" sep=\"\\t\",\n",
" dtypes=dtypes,\n",
" client=client,\n",
")"
]
},
Expand Down Expand Up @@ -300,4 +301,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}

0 comments on commit 4bb265c

Please sign in to comment.