Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Creating dedicated loader submodule to build TF async dataloader #224

Merged
merged 114 commits into from
Aug 27, 2020

Conversation

alecgunny
Copy link
Contributor

Addressing #218

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 59e5a450d012fe6afb52cc9571da3be151ff0297, no merge conflicts.
Running as SYSTEM
Setting status of 59e5a450d012fe6afb52cc9571da3be151ff0297 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/547/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 59e5a450d012fe6afb52cc9571da3be151ff0297^{commit} # timeout=10
Checking out Revision 59e5a450d012fe6afb52cc9571da3be151ff0297 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 59e5a450d012fe6afb52cc9571da3be151ff0297 # timeout=10
Commit message: "cleaning up and blackening"
 > git rev-list --no-walk befdbecfce99b23272ecd4dc742294cec06cd250 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins303756966890646663.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
32 files would be left unchanged.
./nvtabular/loader/tensorflow.py:2:1: F401 'warnings' imported but unused
./nvtabular/loader/tensorflow.py:7:1: F401 '..io._shuffle_gdf' imported but unused
./nvtabular/loader/tensorflow.py:7:1: F401 '..io.device_mem_size' imported but unused
./nvtabular/loader/tensorflow.py:8:1: F401 '..workflow.BaseWorkflow' imported but unused
./nvtabular/loader/backend.py:1:1: F401 'math' imported but unused
./nvtabular/loader/backend.py:9:1: F401 'nvtabular.ops._get_embedding_order' imported but unused
./nvtabular/loader/backend.py:52:15: F821 undefined name 'torch'
./nvtabular/loader/backend.py:52:54: F821 undefined name 'torch'
./nvtabular/loader/backend.py:122:26: F821 undefined name 'workflows'
./nvtabular/loader/tf_utils.py:7:23: F821 undefined name 'device_mem_size'
./nvtabular/loader/tf_utils.py:9:29: F821 undefined name 'os'
./nvtabular/loader/tf_utils.py:19:18: F821 undefined name 'os'
./nvtabular/loader/tf_utils.py:26:9: F821 undefined name 'warnings'
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins6245704696221640167.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 16f9d754f6530865c3c8734fa6347c59ee1645f5, no merge conflicts.
Running as SYSTEM
Setting status of 16f9d754f6530865c3c8734fa6347c59ee1645f5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/549/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 16f9d754f6530865c3c8734fa6347c59ee1645f5^{commit} # timeout=10
Checking out Revision 16f9d754f6530865c3c8734fa6347c59ee1645f5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 16f9d754f6530865c3c8734fa6347c59ee1645f5 # timeout=10
Commit message: "finished separating loader code"
 > git rev-list --no-walk ad99e5becb61d8b246a5bd652700c9c8059d4d0b # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins7354477975365041431.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
./nvtabular/loader/tensorflow.py:20:1: F401 'tensorflow.python.feature_column.feature_column_v2 as fc' imported but unused
./nvtabular/loader/torch.py:44:16: F821 undefined name 'TorchTensorBatchDatasetItr'
./nvtabular/loader/tf_utils.py:10:29: F821 undefined name 'os'
./nvtabular/loader/tf_utils.py:20:18: F821 undefined name 'os'
./nvtabular/loader/tf_utils.py:27:9: F821 undefined name 'warnings'
./nvtabular/loader/tf_utils.py:68:19: F821 undefined name 'columns'
./nvtabular/loader/tf_utils.py:73:31: F821 undefined name 'fc'
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins414755964881219107.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit bb63cc5fb5a4817a5dcc96f394b84a81f01d886b, no merge conflicts.
Running as SYSTEM
Setting status of bb63cc5fb5a4817a5dcc96f394b84a81f01d886b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/670/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse bb63cc5fb5a4817a5dcc96f394b84a81f01d886b^{commit} # timeout=10
Checking out Revision bb63cc5fb5a4817a5dcc96f394b84a81f01d886b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f bb63cc5fb5a4817a5dcc96f394b84a81f01d886b # timeout=10
Commit message: "adding PARTS_PER_CHUNK to criteo example"
 > git rev-list --no-walk f272dc9c302d25f7025b3d97513860fd4c95f299 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins4374779857816902899.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: hypothesis-5.28.0, forked-1.3.0, xdist-2.1.0, cov-2.10.1
collected 419 items / 1 skipped / 418 selected

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 11%]
.......... [ 14%]
tests/unit/test_io.py .................................................. [ 26%]
............................ [ 32%]
tests/unit/test_notebooks.py F.s. [ 33%]
tests/unit/test_ops.py ................................................. [ 45%]
........................................................................ [ 62%]
........................ [ 68%]
tests/unit/test_tf_dataloader.py FFFFFFFFFFFF [ 71%]
tests/unit/test_torch_dataloader.py ......FFFFFFFFFFFFFBuild timed out (after 15 minutes). Marking the build as failed.
Build was aborted
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins7800292252871003934.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 716825debda4ed32108942f20ac45b1baa7fcbea, no merge conflicts.
Running as SYSTEM
Setting status of 716825debda4ed32108942f20ac45b1baa7fcbea to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/671/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 716825debda4ed32108942f20ac45b1baa7fcbea^{commit} # timeout=10
Checking out Revision 716825debda4ed32108942f20ac45b1baa7fcbea (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 716825debda4ed32108942f20ac45b1baa7fcbea # timeout=10
Commit message: "fixing tf unit tests"
 > git rev-list --no-walk bb63cc5fb5a4817a5dcc96f394b84a81f01d886b # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins4548874604139810294.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/tensorflow.py
Oh no! 💥 💔 💥
1 file would be reformatted, 30 files would be left unchanged.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins320685992468812347.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 056155ef1c13b9d976a86ce2d86cbe7a3607ed38, no merge conflicts.
Running as SYSTEM
Setting status of 056155ef1c13b9d976a86ce2d86cbe7a3607ed38 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/672/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 056155ef1c13b9d976a86ce2d86cbe7a3607ed38^{commit} # timeout=10
Checking out Revision 056155ef1c13b9d976a86ce2d86cbe7a3607ed38 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 056155ef1c13b9d976a86ce2d86cbe7a3607ed38 # timeout=10
Commit message: "blackening"
 > git rev-list --no-walk 716825debda4ed32108942f20ac45b1baa7fcbea # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins3255953055005442194.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
./nvtabular/loader/tf_utils.py:31:24: F821 undefined name 'tf_device'
./nvtabular/loader/tf_utils.py:31:87: F821 undefined name 'tf_mem_size'
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins1468419213717319336.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 9c82df5f68c2605773fa3b744ca8e67cf4594947, no merge conflicts.
Running as SYSTEM
Setting status of 9c82df5f68c2605773fa3b744ca8e67cf4594947 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/673/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 9c82df5f68c2605773fa3b744ca8e67cf4594947^{commit} # timeout=10
Checking out Revision 9c82df5f68c2605773fa3b744ca8e67cf4594947 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9c82df5f68c2605773fa3b744ca8e67cf4594947 # timeout=10
Commit message: "fixed tf_util bug"
 > git rev-list --no-walk 056155ef1c13b9d976a86ce2d86cbe7a3607ed38 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7392629584272941548.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
./nvtabular/loader/tf_utils.py:31:84: F821 undefined name 'tf_mem_size'
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins996061379472624272.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 933b71900713e90a0044cf0d294ef3de24067712, no merge conflicts.
Running as SYSTEM
Setting status of 933b71900713e90a0044cf0d294ef3de24067712 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/674/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 933b71900713e90a0044cf0d294ef3de24067712^{commit} # timeout=10
Checking out Revision 933b71900713e90a0044cf0d294ef3de24067712 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 933b71900713e90a0044cf0d294ef3de24067712 # timeout=10
Commit message: "fixing tf_utils bug"
 > git rev-list --no-walk 9c82df5f68c2605773fa3b744ca8e67cf4594947 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6709133231254430996.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/tf_utils.py
Oh no! 💥 💔 💥
1 file would be reformatted, 30 files would be left unchanged.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins8185771004694239439.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 3db05b05683d1af460215eca87b707b6ccfb7bbf, no merge conflicts.
Running as SYSTEM
Setting status of 3db05b05683d1af460215eca87b707b6ccfb7bbf to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/675/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 3db05b05683d1af460215eca87b707b6ccfb7bbf^{commit} # timeout=10
Checking out Revision 3db05b05683d1af460215eca87b707b6ccfb7bbf (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3db05b05683d1af460215eca87b707b6ccfb7bbf # timeout=10
Commit message: "blackening"
 > git rev-list --no-walk 933b71900713e90a0044cf0d294ef3de24067712 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8254015544088330717.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
./nvtabular/loader/tf_utils.py:32:64: F821 undefined name 'memory_allcation'
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins8077269931564506719.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 9458a8d241f925c7c38a438af3ed2c17334f9ad8, no merge conflicts.
Running as SYSTEM
Setting status of 9458a8d241f925c7c38a438af3ed2c17334f9ad8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/676/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 9458a8d241f925c7c38a438af3ed2c17334f9ad8^{commit} # timeout=10
Checking out Revision 9458a8d241f925c7c38a438af3ed2c17334f9ad8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9458a8d241f925c7c38a438af3ed2c17334f9ad8 # timeout=10
Commit message: "blackening"
 > git rev-list --no-walk 3db05b05683d1af460215eca87b707b6ccfb7bbf # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins839065965201183044.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: hypothesis-5.28.0, forked-1.3.0, xdist-2.1.0, cov-2.10.1
collected 419 items / 1 skipped / 418 selected

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 11%]
.......... [ 14%]
tests/unit/test_io.py .................................................. [ 26%]
............................ [ 32%]
tests/unit/test_notebooks.py F.F. [ 33%]
tests/unit/test_ops.py ................................................. [ 45%]
........................................................................ [ 62%]
........................ [ 68%]
tests/unit/test_tf_dataloader.py FFFFFFFFFFFF [ 71%]
tests/unit/test_torch_dataloader.py ......FFFFFFFFFFFFFFF [ 76%]
tests/unit/test_workflow.py ............................................ [ 86%]
....................................................... [100%]

=================================== FAILURES ===================================
_____________________________ test_criteo_notebook _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_criteo_notebook0')

def test_criteo_notebook(tmpdir):
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, f"day_{i}.parquet"))
    os.environ["INPUT_DATA_DIR"] = str(tmpdir)
    os.environ["OUTPUT_DATA_DIR"] = str(tmpdir)

    _run_notebook(
        tmpdir,
        os.path.join(dirname(TEST_PATH), "examples", "criteo-example.ipynb"),
        # disable rmm.reinitialize, seems to be causing issues
      transform=lambda line: line.replace("rmm.reinitialize(", "# rmm.reinitialize("),
    )

tests/unit/test_notebooks.py:29:


tests/unit/test_notebooks.py:92: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/opt/conda/lib/python3.7/subprocess.py:395: in check_output
**kwargs).stdout


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/opt/conda/bin/python', '/tmp/pytest-of-jenkins/pytest-13/test_criteo_notebook0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f577e07c7d0>
stdout = b'\xe2\x96\x88\repoch train_loss valid_loss accuracy time \n\xe2\x96\x88\r'
stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired:
            process.kill()
            stdout, stderr = process.communicate()
            raise TimeoutExpired(process.args, timeout, output=stdout,
                                 stderr=stderr)
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
            raise CalledProcessError(retcode, process.args,
                                   output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '/tmp/pytest-of-jenkins/pytest-13/test_criteo_notebook0/notebook.py']' returned non-zero exit status 1.

/opt/conda/lib/python3.7/subprocess.py:487: CalledProcessError
----------------------------- Captured stderr call -----------------------------
/opt/conda/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))
/opt/conda/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-13/test_criteo_notebook0/notebook.py", line 90, in
learn.fit_one_cycle(epochs, learning_rate)
File "/opt/conda/lib/python3.7/site-packages/fastai/train.py", line 23, in fit_one_cycle
learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
File "/opt/conda/lib/python3.7/site-packages/fastai/basic_train.py", line 200, in fit
fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
File "/opt/conda/lib/python3.7/site-packages/fastai/basic_train.py", line 99, in fit
for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
File "/opt/conda/lib/python3.7/site-packages/fastprogress/fastprogress.py", line 47, in iter
raise e
File "/opt/conda/lib/python3.7/site-packages/fastprogress/fastprogress.py", line 41, in iter
for i,o in enumerate(self.gen):
File "/opt/conda/lib/python3.7/site-packages/fastai/basic_data.py", line 75, in iter
for b in self.dl: yield self.proc_batch(b)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in next
data = self._next_data()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 403, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
data.append(next(self.dataset_iter))
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/backend.py", line 262, in next
return self._get_next_batch()
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/backend.py", line 289, in _get_next_batch
self._fetch_chunk()
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/backend.py", line 268, in _fetch_chunk
raise chunks
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/backend.py", line 140, in load_chunks
spill = dataloader._handle_tensors(spill)
TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'
_____________________________ test_rossman_example _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_rossman_example0')

def test_rossman_example(tmpdir):
    pytest.importorskip("nvtabular.loader.tensorflow")
    _get_random_rossmann_data(1000).to_csv(os.path.join(tmpdir, "train.csv"))
    _get_random_rossmann_data(1000).to_csv(os.path.join(tmpdir, "valid.csv"))
    os.environ["INPUT_DATA_DIR"] = str(tmpdir)

    notebook_path = os.path.join(
        dirname(TEST_PATH), "examples", "rossmann-store-sales-example.ipynb"
    )
  _run_notebook(tmpdir, notebook_path, lambda line: line.replace("EPOCHS = 25", "EPOCHS = 1"))

tests/unit/test_notebooks.py:51:


tests/unit/test_notebooks.py:92: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/opt/conda/lib/python3.7/subprocess.py:395: in check_output
**kwargs).stdout


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/opt/conda/bin/python', '/tmp/pytest-of-jenkins/pytest-13/test_rossman_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f577dfb8550>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired:
            process.kill()
            stdout, stderr = process.communicate()
            raise TimeoutExpired(process.args, timeout, output=stdout,
                                 stderr=stderr)
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
            raise CalledProcessError(retcode, process.args,
                                   output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '/tmp/pytest-of-jenkins/pytest-13/test_rossman_example0/notebook.py']' returned non-zero exit status 1.

/opt/conda/lib/python3.7/subprocess.py:487: CalledProcessError
----------------------------- Captured stderr call -----------------------------
/opt/conda/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))
/opt/conda/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-13/test_rossman_example0/notebook.py", line 59, in
proc.apply(train_dataset, record_stats=True, output_path=PREPROCESS_DIR_TRAIN, shuffle=nvt.io.Shuffle.PER_WORKER, out_files_per_proc=2)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 729, in apply
num_io_threads=num_io_threads,
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 829, in build_and_process_graph
self.exec_phase(idx, record_stats=record_stats)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 612, in exec_phase
self._aggregated_dask_transform(transforms)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 587, in _aggregated_dask_transform
ddf = self.get_ddf()
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 575, in get_ddf
return self.ddf.to_ddf(columns=columns, shuffle=self._shuffle_parts)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py", line 809, in to_ddf
ddf = self.engine.to_ddf(columns=columns)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py", line 1060, in to_ddf
return dask_cudf.read_csv(self.paths, chunksize=self.part_size, **self.csv_kwargs)[
File "/opt/conda/lib/python3.7/site-packages/dask_cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7.egg/dask_cudf/io/csv.py", line 19, in read_csv
return _internal_read_csv(path=path, chunksize=chunksize, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/dask_cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7.egg/dask_cudf/io/csv.py", line 59, in _internal_read_csv
meta = dask_reader(filenames[0], **kwargs)._meta
File "/opt/conda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 649, in read
**kwargs,
File "/opt/conda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 479, in read_pandas
**(storage_options or {}),
File "/opt/conda/lib/python3.7/site-packages/dask/bytes/core.py", line 125, in read_bytes
size = fs.info(path)["size"]
File "/opt/conda/lib/python3.7/site-packages/fsspec/implementations/local.py", line 60, in info
out = os.stat(path, follow_symlinks=False)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-jenkins/pytest-13/test_optimize_criteo0/train.csv'
_____________________ test_tf_gpu_dl[True-1-parquet-0.01] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_True_1_parquet_0')
paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f580118fad0>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
      X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:64:


nvtabular/loader/backend.py:262: in next
return self._get_next_batch()
nvtabular/loader/backend.py:289: in _get_next_batch
self._fetch_chunk()
nvtabular/loader/backend.py:268: in _fetch_chunk
raise chunks
nvtabular/loader/backend.py:120: in load_chunks
chunks = dataloader._create_tensors(chunks)
nvtabular/loader/backend.py:372: in _create_tensors
conts = self._to_tensor(gdf_conts)
nvtabular/loader/tensorflow.py:265: in _to_tensor
dlpack = gdf.values.T.toDlpack()
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:907: in values
return cupy.asarray(self.as_gpu_matrix())
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:3208: in as_gpu_matrix
matrix[:, colidx] = dense
cupy/core/core.pyx:1248: in cupy.core.core.ndarray.setitem
???
cupy/core/_routines_indexing.pyx:49: in cupy.core._routines_indexing._ndarray_setitem
???
cupy/core/_routines_indexing.pyx:801: in cupy.core._routines_indexing._scatter_op
???
cupy/core/core.pyx:517: in cupy.core.core.ndarray.fill
???
cupy/core/_kernel.pyx:605: in cupy.core._kernel.ElementwiseKernel.call
???


???
E ValueError: Array device must be same as the current device: array device = 3 while current = 0

cupy/core/_kernel.pyx:95: ValueError
----------------------------- Captured stderr call -----------------------------
2020-08-26 21:33:55.366901: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-08-26 21:33:55.391236: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3198080000 Hz
2020-08-26 21:33:55.392182: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f573025c1e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-26 21:33:55.392220: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-26 21:33:55.945518: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f57302c7d70 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-26 21:33:55.945589: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P100-DGXS-16GB, Compute Capability 6.0
2020-08-26 21:33:55.945613: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): Tesla P100-DGXS-16GB, Compute Capability 6.0
2020-08-26 21:33:55.945632: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): Tesla P100-DGXS-16GB, Compute Capability 6.0
2020-08-26 21:33:55.945651: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (3): Tesla P100-DGXS-16GB, Compute Capability 6.0
2020-08-26 21:33:55.949931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-08-26 21:33:55.951931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-08-26 21:33:55.953342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 2 with properties:
pciBusID: 0000:0e:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-08-26 21:33:55.954727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 3 with properties:
pciBusID: 0000:0f:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-08-26 21:33:55.954841: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-08-26 21:33:55.954875: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-08-26 21:33:55.954904: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-08-26 21:33:55.954932: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-08-26 21:33:55.954958: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-08-26 21:33:55.954983: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-08-26 21:33:55.955010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-26 21:33:55.965076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1, 2, 3
2020-08-26 21:33:55.965137: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-08-26 21:33:55.970257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-26 21:33:55.970285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0 1 2 3
2020-08-26 21:33:55.970295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N Y Y Y
2020-08-26 21:33:55.970330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 1: Y N Y Y
2020-08-26 21:33:55.970340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 2: Y Y N Y
2020-08-26 21:33:55.970348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 3: Y Y Y N
2020-08-26 21:33:55.975406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory) -> physical GPU (device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0)
2020-08-26 21:33:55.976876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15212 MB memory) -> physical GPU (device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0)
2020-08-26 21:33:55.978348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15212 MB memory) -> physical GPU (device: 2, name: Tesla P100-DGXS-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2020-08-26 21:33:55.979827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15212 MB memory) -> physical GPU (device: 3, name: Tesla P100-DGXS-16GB, pci bus id: 0000:0f:00.0, compute capability: 6.0)
_____________________ test_tf_gpu_dl[True-1-parquet-0.06] ______________________

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f57802342d0>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
      batch = next(self._batch_itr)

E StopIteration

nvtabular/loader/backend.py:293: StopIteration

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_True_1_parquet_1')
paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f57802347d0>
batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
        X, y = next(data_itr)

        # first elements to check epoch-to-epoch consistency
        if idx == 0:
            X0, y0 = X, y

        # check that we have at most batch_size elements
        num_samples = y[0].shape[0]
        if num_samples != batch_size:
            try:
                next(data_itr)
            except StopIteration:
                continue
            else:
                raise ValueError("Batch size too small at idx {}".format(idx))

        # check that all the features in X have the
        # appropriate length and that the set of
        # their names is exactly the set of names in
        # `columns`
        these_cols = columns.copy()
        for column, x in X.items():
            try:
                these_cols.remove(column)
            except ValueError:
                raise AssertionError
            assert x.shape[0] == num_samples
        assert len(these_cols) == 0

        rows += num_samples

    # check start of next epoch to ensure consistency
  X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:96:


nvtabular/loader/backend.py:262: in next
return self._get_next_batch()


self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f57802342d0>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
        batch = next(self._batch_itr)
    except StopIteration:
        # anticipate any more chunks getting created
        # if not, raise the StopIteration
        if not self._working and self._buff.empty:
            self._workers = None
            self._batch_itr = None
          raise StopIteration

E StopIteration

nvtabular/loader/backend.py:300: StopIteration
_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577e0799d0>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
      batch = next(self._batch_itr)

E StopIteration

nvtabular/loader/backend.py:293: StopIteration

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_True_10_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f58010d7210>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
      X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:64:


nvtabular/loader/backend.py:262: in next
return self._get_next_batch()
nvtabular/loader/backend.py:304: in _get_next_batch
self._fetch_chunk()
nvtabular/loader/backend.py:268: in _fetch_chunk
raise chunks


self = <nvtabular.loader.backend.ChunkQueue object at 0x7f58036618d0>, dev = 0
dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577e0799d0>

def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)
              spill = dataloader._handle_tensors(spill)

E TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'

nvtabular/loader/backend.py:140: TypeError
_____________________ test_tf_gpu_dl[True-10-parquet-0.06] _____________________

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f57802d2ad0>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
      batch = next(self._batch_itr)

E StopIteration

nvtabular/loader/backend.py:293: StopIteration

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_True_10_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f578027b050>
batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
      X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:64:


nvtabular/loader/backend.py:262: in next
return self._get_next_batch()
nvtabular/loader/backend.py:304: in _get_next_batch
self._fetch_chunk()
nvtabular/loader/backend.py:268: in _fetch_chunk
raise chunks


self = <nvtabular.loader.backend.ChunkQueue object at 0x7f5781635b10>, dev = 0
dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f57802d2ad0>

def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)
              spill = dataloader._handle_tensors(spill)

E TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'

nvtabular/loader/backend.py:140: TypeError
____________________ test_tf_gpu_dl[True-100-parquet-0.01] _____________________

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577dfb8f10>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
      batch = next(self._batch_itr)

E StopIteration

nvtabular/loader/backend.py:293: StopIteration

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_True_100_parque0')
paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f577dfb8310>
batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
      X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:64:


nvtabular/loader/backend.py:262: in next
return self._get_next_batch()
nvtabular/loader/backend.py:304: in _get_next_batch
self._fetch_chunk()
nvtabular/loader/backend.py:268: in _fetch_chunk
raise chunks


self = <nvtabular.loader.backend.ChunkQueue object at 0x7f577dfb8a90>, dev = 0
dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577dfb8f10>

def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)
              spill = dataloader._handle_tensors(spill)

E TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'

nvtabular/loader/backend.py:140: TypeError
____________________ test_tf_gpu_dl[True-100-parquet-0.06] _____________________

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577d2aea50>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
      batch = next(self._batch_itr)

E StopIteration

nvtabular/loader/backend.py:293: StopIteration

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_True_100_parque1')
paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f5800f67890>
batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
      X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:64:


nvtabular/loader/backend.py:262: in next
return self._get_next_batch()
nvtabular/loader/backend.py:304: in _get_next_batch
self._fetch_chunk()
nvtabular/loader/backend.py:268: in _fetch_chunk
raise chunks


self = <nvtabular.loader.backend.ChunkQueue object at 0x7f5803767f10>, dev = 0
dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577d2aea50>

def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)
              spill = dataloader._handle_tensors(spill)

E TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'

nvtabular/loader/backend.py:140: TypeError
_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f5800fb5610>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
      batch = next(self._batch_itr)

E StopIteration

nvtabular/loader/backend.py:293: StopIteration

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_False_1_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']
use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f58085c3050>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
        X, y = next(data_itr)

        # first elements to check epoch-to-epoch consistency
        if idx == 0:
            X0, y0 = X, y

        # check that we have at most batch_size elements
        num_samples = y[0].shape[0]
        if num_samples != batch_size:
            try:
                next(data_itr)
            except StopIteration:
                continue
            else:
                raise ValueError("Batch size too small at idx {}".format(idx))

        # check that all the features in X have the
        # appropriate length and that the set of
        # their names is exactly the set of names in
        # `columns`
        these_cols = columns.copy()
        for column, x in X.items():
            try:
                these_cols.remove(column)
            except ValueError:
                raise AssertionError
            assert x.shape[0] == num_samples
        assert len(these_cols) == 0

        rows += num_samples

    # check start of next epoch to ensure consistency
  X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:96:


nvtabular/loader/backend.py:262: in next
return self._get_next_batch()


self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f5800fb5610>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
        batch = next(self._batch_itr)
    except StopIteration:
        # anticipate any more chunks getting created
        # if not, raise the StopIteration
        if not self._working and self._buff.empty:
            self._workers = None
            self._batch_itr = None
          raise StopIteration

E StopIteration

nvtabular/loader/backend.py:300: StopIteration
_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f580104c550>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
      batch = next(self._batch_itr)

E StopIteration

nvtabular/loader/backend.py:293: StopIteration

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_False_1_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']
use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f5800f12fd0>
batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
        X, y = next(data_itr)

        # first elements to check epoch-to-epoch consistency
        if idx == 0:
            X0, y0 = X, y

        # check that we have at most batch_size elements
        num_samples = y[0].shape[0]
        if num_samples != batch_size:
            try:
                next(data_itr)
            except StopIteration:
                continue
            else:
                raise ValueError("Batch size too small at idx {}".format(idx))

        # check that all the features in X have the
        # appropriate length and that the set of
        # their names is exactly the set of names in
        # `columns`
        these_cols = columns.copy()
        for column, x in X.items():
            try:
                these_cols.remove(column)
            except ValueError:
                raise AssertionError
            assert x.shape[0] == num_samples
        assert len(these_cols) == 0

        rows += num_samples

    # check start of next epoch to ensure consistency
  X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:96:


nvtabular/loader/backend.py:262: in next
return self._get_next_batch()


self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f580104c550>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
        batch = next(self._batch_itr)
    except StopIteration:
        # anticipate any more chunks getting created
        # if not, raise the StopIteration
        if not self._working and self._buff.empty:
            self._workers = None
            self._batch_itr = None
          raise StopIteration

E StopIteration

nvtabular/loader/backend.py:300: StopIteration
____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f5801077190>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
      batch = next(self._batch_itr)

E StopIteration

nvtabular/loader/backend.py:293: StopIteration

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_False_10_parque0')
paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']
use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f577e053a10>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
      X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:64:


nvtabular/loader/backend.py:262: in next
return self._get_next_batch()
nvtabular/loader/backend.py:304: in _get_next_batch
self._fetch_chunk()
nvtabular/loader/backend.py:268: in _fetch_chunk
raise chunks


self = <nvtabular.loader.backend.ChunkQueue object at 0x7f5800d47490>, dev = 0
dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f5801077190>

def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)
              spill = dataloader._handle_tensors(spill)

E TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'

nvtabular/loader/backend.py:140: TypeError
____________________ test_tf_gpu_dl[False-10-parquet-0.06] _____________________

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f57802c8e90>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
      batch = next(self._batch_itr)

E StopIteration

nvtabular/loader/backend.py:293: StopIteration

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_False_10_parque1')
paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']
use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f58011aac10>
batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
      X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:64:


nvtabular/loader/backend.py:262: in next
return self._get_next_batch()
nvtabular/loader/backend.py:304: in _get_next_batch
self._fetch_chunk()
nvtabular/loader/backend.py:268: in _fetch_chunk
raise chunks


self = <nvtabular.loader.backend.ChunkQueue object at 0x7f5781610a50>, dev = 0
dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f57802c8e90>

def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)
              spill = dataloader._handle_tensors(spill)

E TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'

nvtabular/loader/backend.py:140: TypeError
____________________ test_tf_gpu_dl[False-100-parquet-0.01] ____________________

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577e07cb90>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
      batch = next(self._batch_itr)

E StopIteration

nvtabular/loader/backend.py:293: StopIteration

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_False_100_parqu0')
paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']
use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f578034af50>
batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
      X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:64:


nvtabular/loader/backend.py:262: in next
return self._get_next_batch()
nvtabular/loader/backend.py:304: in _get_next_batch
self._fetch_chunk()
nvtabular/loader/backend.py:268: in _fetch_chunk
raise chunks


self = <nvtabular.loader.backend.ChunkQueue object at 0x7f5780208990>, dev = 0
dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577e07cb90>

def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)
              spill = dataloader._handle_tensors(spill)

E TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'

nvtabular/loader/backend.py:140: TypeError
____________________ test_tf_gpu_dl[False-100-parquet-0.06] ____________________

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f5780264610>

def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
      batch = next(self._batch_itr)

E StopIteration

nvtabular/loader/backend.py:293: StopIteration

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_False_100_parqu1')
paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']
use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f57802c4790>
batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
      X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:64:


nvtabular/loader/backend.py:262: in next
return self._get_next_batch()
nvtabular/loader/backend.py:304: in _get_next_batch
self._fetch_chunk()
nvtabular/loader/backend.py:268: in _fetch_chunk
raise chunks


self = <nvtabular.loader.backend.ChunkQueue object at 0x7f578025c4d0>, dev = 0
dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f5780264610>

def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)
              spill = dataloader._handle_tensors(spill)

E TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'

nvtabular/loader/backend.py:140: TypeError
___________________________ test_empty_cols[parquet] ___________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_empty_cols_parquet_0')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f577e08a290>, engine = 'parquet'

@pytest.mark.parametrize("engine", ["parquet"])
def test_empty_cols(tmpdir, df, dataset, engine):
    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    # first with no continuous columns
    no_conts = torch_dataloader.TorchAsyncItr(
        dataset, cats=["id"], conts=[], labels=["label"], batch_size=1
    )
  assert all(conts is None for _, conts, _ in no_conts)

tests/unit/test_torch_dataloader.py:52:


tests/unit/test_torch_dataloader.py:52: in
assert all(conts is None for _, conts, _ in no_conts)
nvtabular/loader/backend.py:262: in next
return self._get_next_batch()
nvtabular/loader/backend.py:289: in _get_next_batch
self._fetch_chunk()
nvtabular/loader/backend.py:268: in _fetch_chunk
raise chunks
nvtabular/loader/backend.py:124: in load_chunks
chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
nvtabular/loader/backend.py:124: in
chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
nvtabular/loader/torch.py:103: in _create_batch
return torch.split(tensor, idx)


tensor = None, split_size_or_sections = [1, 1, 1, 1, 1, 1, ...], dim = 0

def split(tensor, split_size_or_sections, dim=0):
    r"""Splits the tensor into chunks. Each chunk is a view of the original tensor.

    If :attr:`split_size_or_sections` is an integer type, then :attr:`tensor` will
    be split into equally sized chunks (if possible). Last chunk will be smaller if
    the tensor size along the given dimension :attr:`dim` is not divisible by
    :attr:`split_size`.

    If :attr:`split_size_or_sections` is a list, then :attr:`tensor` will be split
    into ``len(split_size_or_sections)`` chunks with sizes in :attr:`dim` according
    to :attr:`split_size_or_sections`.

    Arguments:
        tensor (Tensor): tensor to split.
        split_size_or_sections (int) or (list(int)): size of a single chunk or
            list of sizes for each chunk
        dim (int): dimension along which to split the tensor.

    Example::
        >>> a = torch.arange(10).reshape(5,2)
        >>> a
        tensor([[0, 1],
                [2, 3],
                [4, 5],
                [6, 7],
                [8, 9]])
        >>> torch.split(a, 2)
        (tensor([[0, 1],
                 [2, 3]]),
         tensor([[4, 5],
                 [6, 7]]),
         tensor([[8, 9]]))
        >>> torch.split(a, [1,4])
        (tensor([[0, 1]]),
         tensor([[2, 3],
                 [4, 5],
                 [6, 7],
                 [8, 9]]))
    """
    if not torch.jit.is_scripting():
        if type(tensor) is not Tensor and has_torch_function((tensor,)):
            return handle_torch_function(split, (tensor,), tensor, split_size_or_sections,
                                         dim=dim)
    # Overwriting reason:
    # This dispatches to two ATen functions depending on the type of
    # split_size_or_sections. The branching code is in tensor.py, which we
    # call here.
  return tensor.split(split_size_or_sections, dim)

E AttributeError: 'NoneType' object has no attribute 'split'

/opt/conda/lib/python3.7/site-packages/torch/functional.py:115: AttributeError
______________________ test_gpu_dl[None-parquet-1-1e-06] _______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_None_parquet_1_1e_0')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f570845a5d0>, batch_size = 1
part_mem_fraction = 1e-06, engine = 'parquet', devices = None

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:91: AttributeError
_______________________ test_gpu_dl[None-parquet-1-0.06] _______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_None_parquet_1_0_00')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f577dfb8a90>, batch_size = 1
part_mem_fraction = 0.06, engine = 'parquet', devices = None

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:91: AttributeError
______________________ test_gpu_dl[None-parquet-10-1e-06] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_None_parquet_10_1e0')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f578020b610>, batch_size = 10
part_mem_fraction = 1e-06, engine = 'parquet', devices = None

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:91: AttributeError
______________________ test_gpu_dl[None-parquet-10-0.06] _______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_None_parquet_10_0_0')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f5800f85f50>, batch_size = 10
part_mem_fraction = 0.06, engine = 'parquet', devices = None

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:91: AttributeError
_____________________ test_gpu_dl[None-parquet-100-1e-06] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_None_parquet_100_10')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f578164e610>, batch_size = 100
part_mem_fraction = 1e-06, engine = 'parquet', devices = None

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:91: AttributeError
______________________ test_gpu_dl[None-parquet-100-0.06] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_None_parquet_100_00')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f5800f16d90>, batch_size = 100
part_mem_fraction = 0.06, engine = 'parquet', devices = None

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:91: AttributeError
____________________ test_gpu_dl[devices1-parquet-1-1e-06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_devices1_parquet_10')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f5781705790>, batch_size = 1
part_mem_fraction = 1e-06, engine = 'parquet', devices = [0, 1]

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:91: AttributeError
_____________________ test_gpu_dl[devices1-parquet-1-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_devices1_parquet_11')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f577dfc50d0>, batch_size = 1
part_mem_fraction = 0.06, engine = 'parquet', devices = [0, 1]

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:91: AttributeError
____________________ test_gpu_dl[devices1-parquet-10-1e-06] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_devices1_parquet_12')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f5781705cd0>, batch_size = 10
part_mem_fraction = 1e-06, engine = 'parquet', devices = [0, 1]

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:91: AttributeError
____________________ test_gpu_dl[devices1-parquet-10-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_devices1_parquet_13')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f5780264990>, batch_size = 10
part_mem_fraction = 0.06, engine = 'parquet', devices = [0, 1]

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:91: AttributeError
___________________ test_gpu_dl[devices1-parquet-100-1e-06] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_devices1_parquet_14')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f5775367c10>, batch_size = 100
part_mem_fraction = 1e-06, engine = 'parquet', devices = [0, 1]

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:91: AttributeError
____________________ test_gpu_dl[devices1-parquet-100-0.06] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_devices1_parquet_15')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f577e068790>, batch_size = 100
part_mem_fraction = 0.06, engine = 'parquet', devices = [0, 1]

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:91: AttributeError
_________________________ test_kill_dl[parquet-1e-06] __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_kill_dl_parquet_1e_06_0')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f577dfc5090>
part_mem_fraction = 1e-06, engine = 'parquet'

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
def test_kill_dl(tmpdir, df, dataset, part_mem_fraction, engine):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data, cats=cat_names, conts=cont_names, labels=["label"]
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:163: AttributeError
__________________________ test_kill_dl[parquet-0.1] ___________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_kill_dl_parquet_0_1_0')
df = name-cat name-string id label x y
0 Ingrid Jerry 954 966 -0.692361 0.614564
...er 1023 1012 -0.365027 0.816941
2160 Charlie Ray 1005 1016 0.056081 -0.808740

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f5779548190>
part_mem_fraction = 0.1, engine = 'parquet'

@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
def test_kill_dl(tmpdir, df, dataset, part_mem_fraction, engine):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)
  data_itr = nvt.torch_dataloader.TorchAsyncItr(
        nvt_data, cats=cat_names, conts=cont_names, labels=["label"]
    )

E AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'

tests/unit/test_torch_dataloader.py:163: AttributeError
=============================== warnings summary ===============================
/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12
/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing

tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:77: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.
warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39959 instead
http_address["port"], self.http_server.port

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30548 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30800 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29876 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 31108 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 31472 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 60480 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_workflow.py::test_chaining_3
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:748: UserWarning: part_mem_fraction is ignored for DataFrame input.
warnings.warn("part_mem_fraction is ignored for DataFrame input.")

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 6 0 0 0 100%
nvtabular/categorify.py 269 49 150 31 78% 69->70, 70-72, 74->75, 75, 76->77, 77, 95->98, 98, 109->110, 110, 116->118, 141->142, 142-143, 145->146, 146-147, 149->150, 150-166, 168->172, 172, 176->177, 177, 178->179, 179, 186->187, 187, 190->192, 192->193, 193, 196->200, 200-203, 213->214, 214, 216->218, 220->237, 237-240, 263->264, 264, 267->268, 268, 269->270, 270, 277->278, 278, 279->282, 282, 386->387, 387, 388->389, 389, 410->425, 450->455, 453->454, 454, 464->461, 469->461
nvtabular/column_similarity.py 88 21 28 4 70% 170-171, 180-182, 190-206, 221->231, 223->226, 226->227, 227, 236->237, 237
nvtabular/io.py 569 41 226 32 91% 73->74, 74, 78->81, 81, 86-88, 101->102, 102, 104->105, 105, 112->113, 113, 129->130, 130, 134->136, 136->132, 140->141, 141, 142->143, 143, 151->156, 162->163, 163-164, 182->183, 183, 195->198, 198, 213->214, 214, 230, 247, 271->272, 272, 310, 313, 374->375, 375, 396-398, 479->481, 526->549, 553, 685->688, 745->746, 746, 758->759, 759, 767->768, 768, 776->788, 781->786, 786-788, 863->864, 864, 961->963, 963-965, 973->975, 975, 1001->1002, 1002, 1055->1056, 1056, 1079->1080, 1080, 1085, 1100->1101, 1101
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 183 19 62 8 87% 71->72, 72, 77-78, 102->103, 103, 111->112, 112, 131->132, 132, 141, 147-151, 154, 221->223, 223, 228->229, 229-233, 243->244, 244, 248->249, 249, 379
nvtabular/loader/tensorflow.py 102 34 40 10 62% 37->38, 38-39, 49->50, 50, 57->58, 58-61, 70->71, 71, 74->75, 75, 76->81, 81, 240-251, 257->258, 258, 278->279, 279, 280->283, 283, 288->289, 289, 297-299, 302-304, 312, 315-323
nvtabular/loader/tf_utils.py 51 24 20 5 45% 13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48, 60-65, 75-88
nvtabular/loader/torch.py 31 0 2 0 100%
nvtabular/ops.py 547 34 166 31 90% 54->53, 56->57, 57, 82-86, 109->111, 128->129, 129, 223, 281, 331, 356->357, 357, 365->366, 366, 370->372, 372->373, 373, 426->427, 427, 442->443, 443-445, 446->449, 449, 495->496, 496, 503->502, 536->537, 537, 546->548, 548-549, 585->586, 586, 615->616, 616, 718->719, 719, 743, 937->938, 938, 939->940, 940, 954->957, 957, 970->974, 1107->1108, 1108, 1116->1121, 1121, 1131->1132, 1132, 1174->1175, 1175, 1216->1217, 1217, 1220->1226, 1285->1286, 1286, 1287->1288, 1288, 1324->1325, 1325
nvtabular/worker.py 65 1 30 2 97% 80->92, 118->121, 121
nvtabular/workflow.py 415 47 230 22 87% 96->100, 100, 106->107, 107-111, 141->exit, 157->exit, 173->exit, 189->exit, 242->244, 292->293, 293, 372->375, 375, 396-411, 473->474, 474, 492->494, 494-503, 514->513, 563->568, 568, 571->572, 572, 607->608, 608, 655->646, 721->732, 732, 755-785, 813->814, 814, 827->830, 860->861, 861-863, 867->868, 868, 901->902, 902
setup.py 2 2 0 0 0% 18-20

TOTAL 2328 272 954 145 85%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 85.16%
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_criteo_notebook - subprocess.Called...
FAILED tests/unit/test_notebooks.py::test_rossman_example - subprocess.Called...
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[parquet] - Attrib...
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-1e-06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-1e-06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-1e-06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06] - Att...
FAILED tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-0.1] - Attri...
====== 29 failed, 390 passed, 1 skipped, 17 warnings in 350.82s (0:05:50) ======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins8374203174643733384.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 6f543f9d99d5aaf52ea62e6d512e2fa5647f42ff, no merge conflicts.
Running as SYSTEM
Setting status of 6f543f9d99d5aaf52ea62e6d512e2fa5647f42ff to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/677/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 6f543f9d99d5aaf52ea62e6d512e2fa5647f42ff^{commit} # timeout=10
Checking out Revision 6f543f9d99d5aaf52ea62e6d512e2fa5647f42ff (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6f543f9d99d5aaf52ea62e6d512e2fa5647f42ff # timeout=10
Commit message: "fixing bug in loader backend"
 > git rev-list --no-walk 9458a8d241f925c7c38a438af3ed2c17334f9ad8 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6580127825192388955.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: hypothesis-5.28.0, forked-1.3.0, xdist-2.1.0, cov-2.10.1
collected 419 items / 1 skipped / 418 selected

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 11%]
.......... [ 14%]
tests/unit/test_io.py .................................................. [ 26%]
............................ [ 32%]
tests/unit/test_notebooks.py ..F. [ 33%]
tests/unit/test_ops.py ................................................. [ 45%]
........................................................................ [ 62%]
........................ [ 68%]
tests/unit/test_tf_dataloader.py FFFFFFFFFFFF [ 71%]
tests/unit/test_torch_dataloader.py ......FFFFFFFFFFFFBuild timed out (after 15 minutes). Marking the build as failed.
Build was aborted
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins3625263205336297522.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 1042d968da88cfba12597c35e42fae013d06d139, no merge conflicts.
Running as SYSTEM
Setting status of 1042d968da88cfba12597c35e42fae013d06d139 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/680/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 1042d968da88cfba12597c35e42fae013d06d139^{commit} # timeout=10
Checking out Revision 1042d968da88cfba12597c35e42fae013d06d139 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 1042d968da88cfba12597c35e42fae013d06d139 # timeout=10
Commit message: "tests passing"
 > git rev-list --no-walk 823142479af5643c9325479ca695ef2d5455a657 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins4226273224442509525.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/tests/unit/test_tf_dataloader.py
Oh no! 💥 💔 💥
1 file would be reformatted, 30 files would be left unchanged.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins5384196860892043366.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 05b9a25ee4c0bdb9001c40f597bacaad3921e37c, no merge conflicts.
Running as SYSTEM
Setting status of 05b9a25ee4c0bdb9001c40f597bacaad3921e37c to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/681/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 05b9a25ee4c0bdb9001c40f597bacaad3921e37c^{commit} # timeout=10
Checking out Revision 05b9a25ee4c0bdb9001c40f597bacaad3921e37c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 05b9a25ee4c0bdb9001c40f597bacaad3921e37c # timeout=10
Commit message: "blackening"
 > git rev-list --no-walk 1042d968da88cfba12597c35e42fae013d06d139 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6903780172210067657.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: hypothesis-5.28.0, forked-1.3.0, xdist-2.1.0, cov-2.10.1
collected 419 items / 1 skipped / 418 selected

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 11%]
.......... [ 14%]
tests/unit/test_io.py .................................................. [ 26%]
............................ [ 32%]
tests/unit/test_notebooks.py ..F. [ 33%]
tests/unit/test_ops.py ................................................. [ 45%]
........................................................................ [ 62%]
........................ [ 68%]
tests/unit/test_tf_dataloader.py F........... [ 71%]
tests/unit/test_torch_dataloader.py ..................... [ 76%]
tests/unit/test_workflow.py ............................................ [ 86%]
....................................................... [100%]

=================================== FAILURES ===================================
_____________________________ test_rossman_example _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-19/test_rossman_example0')

def test_rossman_example(tmpdir):
    pytest.importorskip("nvtabular.loader.tensorflow")
    _get_random_rossmann_data(1000).to_csv(os.path.join(tmpdir, "train.csv"))
    _get_random_rossmann_data(1000).to_csv(os.path.join(tmpdir, "valid.csv"))
    os.environ["INPUT_DATA_DIR"] = str(tmpdir)

    notebook_path = os.path.join(
        dirname(TEST_PATH), "examples", "rossmann-store-sales-example.ipynb"
    )
  _run_notebook(tmpdir, notebook_path, lambda line: line.replace("EPOCHS = 25", "EPOCHS = 1"))

tests/unit/test_notebooks.py:51:


tests/unit/test_notebooks.py:92: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/opt/conda/lib/python3.7/subprocess.py:395: in check_output
**kwargs).stdout


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/opt/conda/bin/python', '/tmp/pytest-of-jenkins/pytest-19/test_rossman_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f01e5141e50>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired:
            process.kill()
            stdout, stderr = process.communicate()
            raise TimeoutExpired(process.args, timeout, output=stdout,
                                 stderr=stderr)
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
            raise CalledProcessError(retcode, process.args,
                                   output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '/tmp/pytest-of-jenkins/pytest-19/test_rossman_example0/notebook.py']' returned non-zero exit status 1.

/opt/conda/lib/python3.7/subprocess.py:487: CalledProcessError
----------------------------- Captured stderr call -----------------------------
/opt/conda/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))
/opt/conda/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-19/test_rossman_example0/notebook.py", line 59, in
proc.apply(train_dataset, record_stats=True, output_path=PREPROCESS_DIR_TRAIN, shuffle=nvt.io.Shuffle.PER_WORKER, out_files_per_proc=2)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 729, in apply
num_io_threads=num_io_threads,
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 829, in build_and_process_graph
self.exec_phase(idx, record_stats=record_stats)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 612, in exec_phase
self._aggregated_dask_transform(transforms)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 587, in _aggregated_dask_transform
ddf = self.get_ddf()
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 575, in get_ddf
return self.ddf.to_ddf(columns=columns, shuffle=self._shuffle_parts)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py", line 809, in to_ddf
ddf = self.engine.to_ddf(columns=columns)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py", line 1060, in to_ddf
return dask_cudf.read_csv(self.paths, chunksize=self.part_size, **self.csv_kwargs)[
File "/opt/conda/lib/python3.7/site-packages/dask_cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7.egg/dask_cudf/io/csv.py", line 19, in read_csv
return _internal_read_csv(path=path, chunksize=chunksize, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/dask_cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7.egg/dask_cudf/io/csv.py", line 59, in _internal_read_csv
meta = dask_reader(filenames[0], **kwargs)._meta
File "/opt/conda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 649, in read
**kwargs,
File "/opt/conda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 479, in read_pandas
**(storage_options or {}),
File "/opt/conda/lib/python3.7/site-packages/dask/bytes/core.py", line 125, in read_bytes
size = fs.info(path)["size"]
File "/opt/conda/lib/python3.7/site-packages/fsspec/implementations/local.py", line 60, in info
out = os.stat(path, follow_symlinks=False)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-jenkins/pytest-19/test_optimize_criteo0/train.csv'
_____________________ test_tf_gpu_dl[True-1-parquet-0.01] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-19/test_tf_gpu_dl_True_1_parquet_0')
paths = ['/tmp/pytest-of-jenkins/pytest-19/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-19/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f026462dad0>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    dont_iter = False
    for idx in range(len(data_itr)):
      X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:65:


nvtabular/loader/backend.py:267: in next
return self._get_next_batch()
nvtabular/loader/backend.py:294: in _get_next_batch
self._fetch_chunk()
nvtabular/loader/backend.py:273: in _fetch_chunk
raise chunks
nvtabular/loader/backend.py:124: in load_chunks
chunks = dataloader._create_tensors(chunks)
nvtabular/loader/backend.py:377: in _create_tensors
conts = self._to_tensor(gdf_conts)
nvtabular/loader/tensorflow.py:266: in _to_tensor
dlpack = gdf.values.T.toDlpack()
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:907: in values
return cupy.asarray(self.as_gpu_matrix())
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:3208: in as_gpu_matrix
matrix[:, colidx] = dense
cupy/core/core.pyx:1248: in cupy.core.core.ndarray.setitem
???
cupy/core/_routines_indexing.pyx:49: in cupy.core._routines_indexing._ndarray_setitem
???
cupy/core/_routines_indexing.pyx:801: in cupy.core._routines_indexing._scatter_op
???
cupy/core/core.pyx:517: in cupy.core.core.ndarray.fill
???
cupy/core/_kernel.pyx:605: in cupy.core._kernel.ElementwiseKernel.call
???


???
E ValueError: Array device must be same as the current device: array device = 3 while current = 0

cupy/core/_kernel.pyx:95: ValueError
----------------------------- Captured stderr call -----------------------------
2020-08-27 00:57:06.964471: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-08-27 00:57:06.987273: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3198080000 Hz
2020-08-27 00:57:06.988492: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f018c25caf0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-27 00:57:06.988555: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-27 00:57:07.310560: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f018c2c85f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-27 00:57:07.310636: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P100-DGXS-16GB, Compute Capability 6.0
2020-08-27 00:57:07.310660: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): Tesla P100-DGXS-16GB, Compute Capability 6.0
2020-08-27 00:57:07.310681: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): Tesla P100-DGXS-16GB, Compute Capability 6.0
2020-08-27 00:57:07.310711: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (3): Tesla P100-DGXS-16GB, Compute Capability 6.0
2020-08-27 00:57:07.315351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-08-27 00:57:07.317501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-08-27 00:57:07.319304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 2 with properties:
pciBusID: 0000:0e:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-08-27 00:57:07.320892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 3 with properties:
pciBusID: 0000:0f:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-08-27 00:57:07.321014: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-08-27 00:57:07.321052: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-08-27 00:57:07.321083: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-08-27 00:57:07.321113: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-08-27 00:57:07.321141: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-08-27 00:57:07.321169: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-08-27 00:57:07.321198: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-27 00:57:07.329782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1, 2, 3
2020-08-27 00:57:07.329849: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-08-27 00:57:07.335064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-27 00:57:07.335098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0 1 2 3
2020-08-27 00:57:07.335111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N Y Y Y
2020-08-27 00:57:07.335142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 1: Y N Y Y
2020-08-27 00:57:07.335153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 2: Y Y N Y
2020-08-27 00:57:07.335161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 3: Y Y Y N
2020-08-27 00:57:07.340168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory) -> physical GPU (device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0)
2020-08-27 00:57:07.341703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15212 MB memory) -> physical GPU (device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0)
2020-08-27 00:57:07.343167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15212 MB memory) -> physical GPU (device: 2, name: Tesla P100-DGXS-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2020-08-27 00:57:07.344638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15212 MB memory) -> physical GPU (device: 3, name: Tesla P100-DGXS-16GB, pci bus id: 0000:0f:00.0, compute capability: 6.0)
=============================== warnings summary ===============================
/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12
/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing

tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:77: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.
warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39315 instead
http_address["port"], self.http_server.port

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29400 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30828 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30100 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30464 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30352 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30856 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 60480 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_workflow.py::test_chaining_3
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:748: UserWarning: part_mem_fraction is ignored for DataFrame input.
warnings.warn("part_mem_fraction is ignored for DataFrame input.")

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 6 0 0 0 100%
nvtabular/categorify.py 269 49 150 31 78% 69->70, 70-72, 74->75, 75, 76->77, 77, 95->98, 98, 109->110, 110, 116->118, 141->142, 142-143, 145->146, 146-147, 149->150, 150-166, 168->172, 172, 176->177, 177, 178->179, 179, 186->187, 187, 190->192, 192->193, 193, 196->200, 200-203, 213->214, 214, 216->218, 220->237, 237-240, 263->264, 264, 267->268, 268, 269->270, 270, 277->278, 278, 279->282, 282, 386->387, 387, 388->389, 389, 410->425, 450->455, 453->454, 454, 464->461, 469->461
nvtabular/column_similarity.py 88 21 28 4 70% 170-171, 180-182, 190-206, 221->231, 223->226, 226->227, 227, 236->237, 237
nvtabular/io.py 569 41 226 32 91% 73->74, 74, 78->81, 81, 86-88, 101->102, 102, 104->105, 105, 112->113, 113, 129->130, 130, 134->136, 136->132, 140->141, 141, 142->143, 143, 151->156, 162->163, 163-164, 182->183, 183, 195->198, 198, 213->214, 214, 230, 247, 271->272, 272, 310, 313, 374->375, 375, 396-398, 479->481, 526->549, 553, 685->688, 745->746, 746, 758->759, 759, 767->768, 768, 776->788, 781->786, 786-788, 863->864, 864, 961->963, 963-965, 973->975, 975, 1001->1002, 1002, 1055->1056, 1056, 1079->1080, 1080, 1085, 1100->1101, 1101
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 188 6 60 6 95% 71->72, 72, 115->116, 116, 135->136, 136, 158, 233->235, 248->249, 249, 253->254, 254
nvtabular/loader/tensorflow.py 108 35 46 11 64% 37->38, 38-39, 49->50, 50, 57->58, 58-61, 70->71, 71, 74->75, 75, 76->81, 81, 240-251, 257->258, 258, 277->278, 278, 285->286, 286, 287->290, 290, 295->296, 296, 304-306, 309-311, 319, 322-330
nvtabular/loader/tf_utils.py 51 24 20 5 45% 13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48, 60-65, 75-88
nvtabular/loader/torch.py 33 0 4 0 100%
nvtabular/ops.py 547 34 166 31 90% 54->53, 56->57, 57, 82-86, 109->111, 128->129, 129, 223, 281, 331, 356->357, 357, 365->366, 366, 370->372, 372->373, 373, 426->427, 427, 442->443, 443-445, 446->449, 449, 495->496, 496, 503->502, 536->537, 537, 546->548, 548-549, 585->586, 586, 615->616, 616, 718->719, 719, 743, 937->938, 938, 939->940, 940, 954->957, 957, 970->974, 1107->1108, 1108, 1116->1121, 1121, 1131->1132, 1132, 1174->1175, 1175, 1216->1217, 1217, 1220->1226, 1285->1286, 1286, 1287->1288, 1288, 1324->1325, 1325
nvtabular/worker.py 65 1 30 2 97% 80->92, 118->121, 121
nvtabular/workflow.py 415 47 230 22 87% 96->100, 100, 106->107, 107-111, 141->exit, 157->exit, 173->exit, 189->exit, 242->244, 292->293, 293, 372->375, 375, 396-411, 473->474, 474, 492->494, 494-503, 514->513, 563->568, 568, 571->572, 572, 607->608, 608, 655->646, 721->732, 732, 755-785, 813->814, 814, 827->830, 860->861, 861-863, 867->868, 868, 901->902, 902
setup.py 2 2 0 0 0% 18-20

TOTAL 2341 260 960 144 86%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 85.76%
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_rossman_example - subprocess.Called...
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.01]
====== 2 failed, 417 passed, 1 skipped, 20 warnings in 432.94s (0:07:12) =======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins7684354513045421438.sh

"metadata": {},
"outputs": [],
"source": [
"DATA_DIR = os.environ.get('INPUT_DATA_DIR', './data')\n",
"OUTPUT_DATA_DIR = os.environ.get('INPUT_DATA_DIR', './data')\n",
"DATA_DIR = os.environ.get(\"OUTPUT_DATA_DIR\", \"./data\")\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should be INPUT_DATA_DIR, since it's also used for getting reading the dataset.

Right now the unittests are failling because of this in tests/unit/test_notebooks.py::test_rossman_example

Traceback (most recent call last):
  File "/tmp/pytest-of-jenkins/pytest-20/test_rossman_example0/notebook.py", line 59, in <module>
    proc.apply(train_dataset, record_stats=True, output_path=PREPROCESS_DIR_TRAIN, shuffle=nvt.io.Shuffle.PER_WORKER, out_files_per_proc=2)
  File "/var/jenkins_home/nvtabular/nvtabular/workflow.py", line 729, in apply
    num_io_threads=num_io_threads,
  File "/var/jenkins_home/nvtabular/nvtabular/workflow.py", line 829, in build_and_process_graph
    self.exec_phase(idx, record_stats=record_stats)
  File "/var/jenkins_home/nvtabular/nvtabular/workflow.py", line 612, in exec_phase
    self._aggregated_dask_transform(transforms)
  File "/var/jenkins_home/nvtabular/nvtabular/workflow.py", line 587, in _aggregated_dask_transform
    ddf = self.get_ddf()
  File "/var/jenkins_home/nvtabular/nvtabular/workflow.py", line 575, in get_ddf
    return self.ddf.to_ddf(columns=columns, shuffle=self._shuffle_parts)
  File "/var/jenkins_home/nvtabular/nvtabular/io.py", line 809, in to_ddf
    ddf = self.engine.to_ddf(columns=columns)
  File "/var/jenkins_home/nvtabular/nvtabular/io.py", line 1060, in to_ddf
    return dask_cudf.read_csv(self.paths, chunksize=self.part_size, **self.csv_kwargs)[
  File "/opt/conda/lib/python3.7/site-packages/dask_cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7.egg/dask_cudf/io/csv.py", line 19, in read_csv
    return _internal_read_csv(path=path, chunksize=chunksize, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/dask_cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7.egg/dask_cudf/io/csv.py", line 59, in _internal_read_csv
    meta = dask_reader(filenames[0], **kwargs)._meta
  File "/opt/conda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 649, in read
    **kwargs,
  File "/opt/conda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 479, in read_pandas
    **(storage_options or {}),
  File "/opt/conda/lib/python3.7/site-packages/dask/bytes/core.py", line 125, in read_bytes
    size = fs.info(path)["size"]
  File "/opt/conda/lib/python3.7/site-packages/fsspec/implementations/local.py", line 60, in info
    out = os.stat(path, follow_symlinks=False)
FileNotFoundError: [Errno 2] No such file or directory: '/var/jenkins_home/nvtabular/data/train.csv'
Suggested change
"DATA_DIR = os.environ.get(\"OUTPUT_DATA_DIR\", \"./data\")\n",
"DATA_DIR = os.environ.get(\"INPUT_DATA_DIR\", \"./data\")\n",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think I'm confused then, because the preprocessing notebook uses INPUT_DATA_DIR to read the original data, but then places the preprocessed data with the extra features in OUTPUT_DATA_DIR, which is where train.csv should live. Unless we're not using the preprocessing notebook first in this test? When I run the preproc notebook followed by the example notebook locally, this works.

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 32f15f8605bd770b66f8311c7de28504f26d12dd, no merge conflicts.
Running as SYSTEM
Setting status of 32f15f8605bd770b66f8311c7de28504f26d12dd to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/685/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 32f15f8605bd770b66f8311c7de28504f26d12dd^{commit} # timeout=10
Checking out Revision 32f15f8605bd770b66f8311c7de28504f26d12dd (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 32f15f8605bd770b66f8311c7de28504f26d12dd # timeout=10
Commit message: "Merge branch 'tfasync' of github.com:alecgunny/NVTabular into tfasync"
 > git rev-list --no-walk 0d354244b8c3f516737bf881bf4b78b8e002ebfb # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins8758181856450508653.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: hypothesis-5.28.0, forked-1.3.0, xdist-2.1.0, cov-2.10.1
collected 419 items / 1 skipped / 418 selected

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 11%]
.......... [ 14%]
tests/unit/test_io.py .................................................. [ 26%]
............................ [ 32%]
tests/unit/test_notebooks.py .... [ 33%]
tests/unit/test_ops.py ................................................. [ 45%]
........................................................................ [ 62%]
........................ [ 68%]
tests/unit/test_tf_dataloader.py F........... [ 71%]
tests/unit/test_torch_dataloader.py ..................... [ 76%]
tests/unit/test_workflow.py ............................................ [ 86%]
....................................................... [100%]

=================================== FAILURES ===================================
_____________________ test_tf_gpu_dl[True-1-parquet-0.01] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_tf_gpu_dl_True_1_parquet_0')
paths = ['/tmp/pytest-of-jenkins/pytest-3/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-3/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7fe9eabbc550>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    dont_iter = False
    for idx in range(len(data_itr)):
      X, y = next(data_itr)

tests/unit/test_tf_dataloader.py:65:


nvtabular/loader/backend.py:267: in next
return self._get_next_batch()
nvtabular/loader/backend.py:294: in _get_next_batch
self._fetch_chunk()
nvtabular/loader/backend.py:273: in _fetch_chunk
raise chunks
nvtabular/loader/backend.py:124: in load_chunks
chunks = dataloader._create_tensors(chunks)
nvtabular/loader/backend.py:377: in _create_tensors
conts = self._to_tensor(gdf_conts)
nvtabular/loader/tensorflow.py:266: in _to_tensor
dlpack = gdf.values.T.toDlpack()
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:907: in values
return cupy.asarray(self.as_gpu_matrix())
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:3208: in as_gpu_matrix
matrix[:, colidx] = dense
cupy/core/core.pyx:1248: in cupy.core.core.ndarray.setitem
???
cupy/core/_routines_indexing.pyx:49: in cupy.core._routines_indexing._ndarray_setitem
???
cupy/core/_routines_indexing.pyx:801: in cupy.core._routines_indexing._scatter_op
???
cupy/core/core.pyx:517: in cupy.core.core.ndarray.fill
???
cupy/core/_kernel.pyx:605: in cupy.core._kernel.ElementwiseKernel.call
???


???
E ValueError: Array device must be same as the current device: array device = 3 while current = 0

cupy/core/_kernel.pyx:95: ValueError
----------------------------- Captured stderr call -----------------------------
2020-08-27 16:54:40.420477: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-08-27 16:54:40.455273: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3198080000 Hz
2020-08-27 16:54:40.456170: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe99825c040 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-27 16:54:40.456198: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-27 16:54:40.772611: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe9982c7bd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-27 16:54:40.772681: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P100-DGXS-16GB, Compute Capability 6.0
2020-08-27 16:54:40.772704: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): Tesla P100-DGXS-16GB, Compute Capability 6.0
2020-08-27 16:54:40.772722: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): Tesla P100-DGXS-16GB, Compute Capability 6.0
2020-08-27 16:54:40.772740: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (3): Tesla P100-DGXS-16GB, Compute Capability 6.0
2020-08-27 16:54:40.777045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-08-27 16:54:40.779733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-08-27 16:54:40.782176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 2 with properties:
pciBusID: 0000:0e:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-08-27 16:54:40.783923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 3 with properties:
pciBusID: 0000:0f:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-08-27 16:54:40.784012: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-08-27 16:54:40.784036: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-08-27 16:54:40.784056: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-08-27 16:54:40.784075: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-08-27 16:54:40.784093: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-08-27 16:54:40.784110: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-08-27 16:54:40.784129: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-27 16:54:40.792073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1, 2, 3
2020-08-27 16:54:40.792126: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-08-27 16:54:40.797814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-27 16:54:40.797835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0 1 2 3
2020-08-27 16:54:40.797846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N Y Y Y
2020-08-27 16:54:40.797878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 1: Y N Y Y
2020-08-27 16:54:40.797889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 2: Y Y N Y
2020-08-27 16:54:40.797896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 3: Y Y Y N
2020-08-27 16:54:40.802905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory) -> physical GPU (device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0)
2020-08-27 16:54:40.804329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15212 MB memory) -> physical GPU (device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0)
2020-08-27 16:54:40.805955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15212 MB memory) -> physical GPU (device: 2, name: Tesla P100-DGXS-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2020-08-27 16:54:40.807355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15212 MB memory) -> physical GPU (device: 3, name: Tesla P100-DGXS-16GB, pci bus id: 0000:0f:00.0, compute capability: 6.0)
=============================== warnings summary ===============================
/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12
/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing

tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:77: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.
warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43211 instead
http_address["port"], self.http_server.port

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29148 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30212 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29680 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29876 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 31472 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29960 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 60480 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_workflow.py::test_chaining_3
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:748: UserWarning: part_mem_fraction is ignored for DataFrame input.
warnings.warn("part_mem_fraction is ignored for DataFrame input.")

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 6 0 0 0 100%
nvtabular/categorify.py 269 49 150 31 78% 69->70, 70-72, 74->75, 75, 76->77, 77, 95->98, 98, 109->110, 110, 116->118, 141->142, 142-143, 145->146, 146-147, 149->150, 150-166, 168->172, 172, 176->177, 177, 178->179, 179, 186->187, 187, 190->192, 192->193, 193, 196->200, 200-203, 213->214, 214, 216->218, 220->237, 237-240, 263->264, 264, 267->268, 268, 269->270, 270, 277->278, 278, 279->282, 282, 386->387, 387, 388->389, 389, 410->425, 450->455, 453->454, 454, 464->461, 469->461
nvtabular/column_similarity.py 88 21 28 4 70% 170-171, 180-182, 190-206, 221->231, 223->226, 226->227, 227, 236->237, 237
nvtabular/io.py 569 41 226 32 91% 73->74, 74, 78->81, 81, 86-88, 101->102, 102, 104->105, 105, 112->113, 113, 129->130, 130, 134->136, 136->132, 140->141, 141, 142->143, 143, 151->156, 162->163, 163-164, 182->183, 183, 195->198, 198, 213->214, 214, 230, 247, 271->272, 272, 310, 313, 374->375, 375, 396-398, 479->481, 526->549, 553, 685->688, 745->746, 746, 758->759, 759, 767->768, 768, 776->788, 781->786, 786-788, 863->864, 864, 961->963, 963-965, 973->975, 975, 1001->1002, 1002, 1055->1056, 1056, 1079->1080, 1080, 1085, 1100->1101, 1101
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 188 4 60 4 97% 71->72, 72, 135->136, 136, 158, 233->235, 248->249, 249
nvtabular/loader/tensorflow.py 108 16 46 10 82% 37->38, 38-39, 49->50, 50, 57->58, 58-61, 70->71, 71, 76->81, 81, 242-251, 257->258, 258, 277->278, 278, 285->286, 286, 287->290, 290, 295->296, 296
nvtabular/loader/tf_utils.py 51 7 20 5 83% 13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48
nvtabular/loader/torch.py 33 0 4 0 100%
nvtabular/ops.py 547 34 166 31 90% 54->53, 56->57, 57, 82-86, 109->111, 128->129, 129, 223, 281, 331, 356->357, 357, 365->366, 366, 370->372, 372->373, 373, 426->427, 427, 442->443, 443-445, 446->449, 449, 495->496, 496, 503->502, 536->537, 537, 546->548, 548-549, 585->586, 586, 615->616, 616, 718->719, 719, 743, 937->938, 938, 939->940, 940, 954->957, 957, 970->974, 1107->1108, 1108, 1116->1121, 1121, 1131->1132, 1132, 1174->1175, 1175, 1216->1217, 1217, 1220->1226, 1285->1286, 1286, 1287->1288, 1288, 1324->1325, 1325
nvtabular/worker.py 65 1 30 2 97% 80->92, 118->121, 121
nvtabular/workflow.py 415 38 230 24 89% 96->100, 100, 106->107, 107-111, 141->exit, 157->exit, 173->exit, 189->exit, 242->244, 292->293, 293, 372->375, 375, 400->401, 401, 407->410, 410, 473->474, 474, 492->494, 494-503, 514->513, 563->568, 568, 571->572, 572, 607->608, 608, 655->646, 721->732, 732, 755-785, 813->814, 814, 827->830, 860->861, 861-863, 867->868, 868, 901->902, 902
setup.py 2 2 0 0 0% 18-20

TOTAL 2341 213 960 143 88%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 88.00%
=========================== short test summary info ============================
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.01]
====== 1 failed, 418 passed, 1 skipped, 20 warnings in 470.77s (0:07:50) =======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins2718176934798198774.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #224 of commit 372520b35b46416e02ec52a45dc9dcaf731f29d8, no merge conflicts.
Running as SYSTEM
Setting status of 372520b35b46416e02ec52a45dc9dcaf731f29d8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/692/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 372520b35b46416e02ec52a45dc9dcaf731f29d8^{commit} # timeout=10
Checking out Revision 372520b35b46416e02ec52a45dc9dcaf731f29d8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 372520b35b46416e02ec52a45dc9dcaf731f29d8 # timeout=10
Commit message: "Fix cupy device errors"
 > git rev-list --no-walk a4468067a7a72c821fda94573dc264e26454ee40 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins2550838868390358978.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: hypothesis-5.28.0, forked-1.3.0, xdist-2.1.0, cov-2.10.1
collected 419 items / 1 skipped / 418 selected

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 11%]
.......... [ 14%]
tests/unit/test_io.py .................................................. [ 26%]
............................ [ 32%]
tests/unit/test_notebooks.py .... [ 33%]
tests/unit/test_ops.py ................................................. [ 45%]
........................................................................ [ 62%]
........................ [ 68%]
tests/unit/test_tf_dataloader.py ............ [ 71%]
tests/unit/test_torch_dataloader.py ..................... [ 76%]
tests/unit/test_workflow.py ............................................ [ 86%]
....................................................... [100%]

=============================== warnings summary ===============================
/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12
/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing

tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:77: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.
warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45061 instead
http_address["port"], self.http_server.port

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30380 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 31444 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29932 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30128 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29344 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30296 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 60480 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_workflow.py::test_chaining_3
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:748: UserWarning: part_mem_fraction is ignored for DataFrame input.
warnings.warn("part_mem_fraction is ignored for DataFrame input.")

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 6 0 0 0 100%
nvtabular/categorify.py 269 49 150 31 78% 69->70, 70-72, 74->75, 75, 76->77, 77, 95->98, 98, 109->110, 110, 116->118, 141->142, 142-143, 145->146, 146-147, 149->150, 150-166, 168->172, 172, 176->177, 177, 178->179, 179, 186->187, 187, 190->192, 192->193, 193, 196->200, 200-203, 213->214, 214, 216->218, 220->237, 237-240, 263->264, 264, 267->268, 268, 269->270, 270, 277->278, 278, 279->282, 282, 386->387, 387, 388->389, 389, 410->425, 450->455, 453->454, 454, 464->461, 469->461
nvtabular/column_similarity.py 88 21 28 4 70% 170-171, 180-182, 190-206, 221->231, 223->226, 226->227, 227, 236->237, 237
nvtabular/io.py 569 41 226 32 91% 73->74, 74, 78->81, 81, 86-88, 101->102, 102, 104->105, 105, 112->113, 113, 129->130, 130, 134->136, 136->132, 140->141, 141, 142->143, 143, 151->156, 162->163, 163-164, 182->183, 183, 195->198, 198, 213->214, 214, 230, 247, 271->272, 272, 310, 313, 374->375, 375, 396-398, 479->481, 526->549, 553, 685->688, 745->746, 746, 758->759, 759, 767->768, 768, 776->788, 781->786, 786-788, 863->864, 864, 961->963, 963-965, 973->975, 975, 1001->1002, 1002, 1055->1056, 1056, 1079->1080, 1080, 1085, 1100->1101, 1101
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 188 8 60 5 95% 71->72, 72, 135->136, 136, 146-147, 158, 233->235, 248->249, 249, 271->272, 272-273
nvtabular/loader/tensorflow.py 112 16 46 10 82% 39->40, 40-41, 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 244-253, 264->265, 265, 284->285, 285, 292->293, 293, 294->297, 297, 302->303, 303
nvtabular/loader/tf_utils.py 51 7 20 5 83% 13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48
nvtabular/loader/torch.py 33 0 4 0 100%
nvtabular/ops.py 547 34 166 31 90% 54->53, 56->57, 57, 82-86, 109->111, 128->129, 129, 223, 281, 331, 356->357, 357, 365->366, 366, 370->372, 372->373, 373, 426->427, 427, 442->443, 443-445, 446->449, 449, 495->496, 496, 503->502, 536->537, 537, 546->548, 548-549, 585->586, 586, 615->616, 616, 718->719, 719, 743, 937->938, 938, 939->940, 940, 954->957, 957, 970->974, 1107->1108, 1108, 1116->1121, 1121, 1131->1132, 1132, 1174->1175, 1175, 1216->1217, 1217, 1220->1226, 1285->1286, 1286, 1287->1288, 1288, 1324->1325, 1325
nvtabular/worker.py 65 1 30 2 97% 80->92, 118->121, 121
nvtabular/workflow.py 415 38 230 24 89% 96->100, 100, 106->107, 107-111, 141->exit, 157->exit, 173->exit, 189->exit, 242->244, 292->293, 293, 372->375, 375, 400->401, 401, 407->410, 410, 473->474, 474, 492->494, 494-503, 514->513, 563->568, 568, 571->572, 572, 607->608, 608, 655->646, 721->732, 732, 755-785, 813->814, 814, 827->830, 860->861, 861-863, 867->868, 868, 901->902, 902
setup.py 2 2 0 0 0% 18-20

TOTAL 2345 217 960 144 88%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 87.87%
=========== 419 passed, 1 skipped, 20 warnings in 463.53s (0:07:43) ============
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins6204768901711015414.sh

@alecgunny alecgunny merged commit 7f9c780 into NVIDIA-Merlin:main Aug 27, 2020
mikemckiernan pushed a commit that referenced this pull request Nov 24, 2022
…loader (#224)

* adding tensorflow example stuff

* getting workflow working

* training of both workflows works

* notebook updates and addding image from run

* updating workflow for nightly tf build

* Create dummy.txt

* Add files via upload

* Delete dummy.txt

* adding tensorflow example stuff

* getting workflow working

* training of both workflows works

* notebook updates and addding image from run

* adding root Dockerfile

* updating root build for 2.3 rc1

* updating Dockerfile for tf 2.3-rc1 and filling out notebook

* updating throughput curves in README

* moving dlrm-train

* cleaning up notebook and layers code, adding cupti symlink to Dockerfile

* getting rid of modprobe install in Dockerfile

* playing with requirements

* updating for tf 2.3 full release

* updating notebook

* removing old Dockerfiles, updating environment and README and finishing example notebook

* removing old images

* consolidating data loading code

* cleaning up and blackening

* finished separating loader code

* adding fixed Dockerfile

* getting tf data loading running

* blackening

* fixing bug in torch loader

* applying isort fixes

* isort fixes

* ironing out data loaders

* creating parent dataloader class

* playing with thread safe iteration

* small change

* moving tensoritr loop into asynciterator

* fixing syntax error

* debugging iter issues

* fixing generator issues

* cleaning up backend code

* got torch data loader working

* working out tf missing gradient issues

* working on gradient issues

* reformatting loader backend to use only 2 classes

* undoing changes really quick

* backend changes

* getting tf dataloader working

* trying out tensor y

* tf data loader working

* undoing some testing changes to Tensorflow

* rerunning tf example for checks

* updating tests

* blackening

* blackening

* fixing dataloader bench bug

* fixing unused variables

* isort fixes

* adding qsize to chunkedbuffer

* fixed typo in backed

* simplifying and updating DataLoader

* updating dataloader backend

* trying new async scheme

* got new implementation working

* cleaning up

* blackening

* fixing bugs

* updating wait time

* isort fixes

* minor aesthetic change

* merging upstream changes

* bug fixes

* trying to update examples

* adding custom validation callback

* got examples working

* blackening

* fixing bug and documenting

* gettin criteo most of the way through

* rearranging and adding checks

* adding proper torch documentation

* documenting and blackening

* remove trailing whitespace

* updating tests

* changing cat and cont defaults to empty lists and including checks

* updating TF example notebook

* adding PARTS_PER_CHUNK to criteo example

* adding tf config changes

* fixing tf unit tests

* blackening

* fixed tf_util bug

* fixing tf_utils bug

* blackening

* blackening

* fixing bug in loader backend

* tests passing

* blackening

* updating rossmann notebook test

* Fix cupy device errors

Co-authored-by: Alec Gunny <agunny@nvidia.com>
Co-authored-by: Ben Frederickson <github@benfrederickson.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants