[REVIEW] Creating dedicated `loader` submodule to build TF async dataloader #224

alecgunny · 2020-08-18T02:05:42Z

Addressing #218

…ng example notebook

nvidia-merlin-bot · 2020-08-18T02:06:05Z

Click to view CI Results

GitHub pull request #224 of commit 59e5a450d012fe6afb52cc9571da3be151ff0297, no merge conflicts.
Running as SYSTEM
Setting status of 59e5a450d012fe6afb52cc9571da3be151ff0297 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/547/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 59e5a450d012fe6afb52cc9571da3be151ff0297^{commit} # timeout=10
Checking out Revision 59e5a450d012fe6afb52cc9571da3be151ff0297 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 59e5a450d012fe6afb52cc9571da3be151ff0297 # timeout=10
Commit message: "cleaning up and blackening"
 > git rev-list --no-walk befdbecfce99b23272ecd4dc742294cec06cd250 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins303756966890646663.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
32 files would be left unchanged.
./nvtabular/loader/tensorflow.py:2:1: F401 'warnings' imported but unused
./nvtabular/loader/tensorflow.py:7:1: F401 '..io._shuffle_gdf' imported but unused
./nvtabular/loader/tensorflow.py:7:1: F401 '..io.device_mem_size' imported but unused
./nvtabular/loader/tensorflow.py:8:1: F401 '..workflow.BaseWorkflow' imported but unused
./nvtabular/loader/backend.py:1:1: F401 'math' imported but unused
./nvtabular/loader/backend.py:9:1: F401 'nvtabular.ops._get_embedding_order' imported but unused
./nvtabular/loader/backend.py:52:15: F821 undefined name 'torch'
./nvtabular/loader/backend.py:52:54: F821 undefined name 'torch'
./nvtabular/loader/backend.py:122:26: F821 undefined name 'workflows'
./nvtabular/loader/tf_utils.py:7:23: F821 undefined name 'device_mem_size'
./nvtabular/loader/tf_utils.py:9:29: F821 undefined name 'os'
./nvtabular/loader/tf_utils.py:19:18: F821 undefined name 'os'
./nvtabular/loader/tf_utils.py:26:9: F821 undefined name 'warnings'
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins6245704696221640167.sh

nvidia-merlin-bot · 2020-08-18T18:34:16Z

Click to view CI Results

GitHub pull request #224 of commit 16f9d754f6530865c3c8734fa6347c59ee1645f5, no merge conflicts.
Running as SYSTEM
Setting status of 16f9d754f6530865c3c8734fa6347c59ee1645f5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/549/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 16f9d754f6530865c3c8734fa6347c59ee1645f5^{commit} # timeout=10
Checking out Revision 16f9d754f6530865c3c8734fa6347c59ee1645f5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 16f9d754f6530865c3c8734fa6347c59ee1645f5 # timeout=10
Commit message: "finished separating loader code"
 > git rev-list --no-walk ad99e5becb61d8b246a5bd652700c9c8059d4d0b # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins7354477975365041431.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
./nvtabular/loader/tensorflow.py:20:1: F401 'tensorflow.python.feature_column.feature_column_v2 as fc' imported but unused
./nvtabular/loader/torch.py:44:16: F821 undefined name 'TorchTensorBatchDatasetItr'
./nvtabular/loader/tf_utils.py:10:29: F821 undefined name 'os'
./nvtabular/loader/tf_utils.py:20:18: F821 undefined name 'os'
./nvtabular/loader/tf_utils.py:27:9: F821 undefined name 'warnings'
./nvtabular/loader/tf_utils.py:68:19: F821 undefined name 'columns'
./nvtabular/loader/tf_utils.py:73:31: F821 undefined name 'fc'
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins414755964881219107.sh

nvidia-merlin-bot · 2020-08-26T21:02:03Z

Click to view CI Results

GitHub pull request #224 of commit bb63cc5fb5a4817a5dcc96f394b84a81f01d886b, no merge conflicts.
Running as SYSTEM
Setting status of bb63cc5fb5a4817a5dcc96f394b84a81f01d886b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/670/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse bb63cc5fb5a4817a5dcc96f394b84a81f01d886b^{commit} # timeout=10
Checking out Revision bb63cc5fb5a4817a5dcc96f394b84a81f01d886b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f bb63cc5fb5a4817a5dcc96f394b84a81f01d886b # timeout=10
Commit message: "adding PARTS_PER_CHUNK to criteo example"
 > git rev-list --no-walk f272dc9c302d25f7025b3d97513860fd4c95f299 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins4374779857816902899.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: hypothesis-5.28.0, forked-1.3.0, xdist-2.1.0, cov-2.10.1
collected 419 items / 1 skipped / 418 selected
tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [ 11%]

..........                                                               [ 14%]

tests/unit/test_io.py .................................................. [ 26%]

............................                                             [ 32%]

tests/unit/test_notebooks.py F.s.                                        [ 33%]

tests/unit/test_ops.py ................................................. [ 45%]

........................................................................ [ 62%]

........................                                                 [ 68%]

tests/unit/test_tf_dataloader.py FFFFFFFFFFFF                            [ 71%]

tests/unit/test_torch_dataloader.py ......FFFFFFFFFFFFFBuild timed out (after 15 minutes). Marking the build as failed.

Build was aborted

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate

cd /var/jenkins_home/

python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins7800292252871003934.sh

nvidia-merlin-bot · 2020-08-26T21:02:12Z

Click to view CI Results

GitHub pull request #224 of commit 716825debda4ed32108942f20ac45b1baa7fcbea, no merge conflicts.
Running as SYSTEM
Setting status of 716825debda4ed32108942f20ac45b1baa7fcbea to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/671/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 716825debda4ed32108942f20ac45b1baa7fcbea^{commit} # timeout=10
Checking out Revision 716825debda4ed32108942f20ac45b1baa7fcbea (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 716825debda4ed32108942f20ac45b1baa7fcbea # timeout=10
Commit message: "fixing tf unit tests"
 > git rev-list --no-walk bb63cc5fb5a4817a5dcc96f394b84a81f01d886b # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins4548874604139810294.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/tensorflow.py
Oh no! 💥 💔 💥
1 file would be reformatted, 30 files would be left unchanged.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins320685992468812347.sh

nvidia-merlin-bot · 2020-08-26T21:17:32Z

Click to view CI Results

GitHub pull request #224 of commit 056155ef1c13b9d976a86ce2d86cbe7a3607ed38, no merge conflicts.
Running as SYSTEM
Setting status of 056155ef1c13b9d976a86ce2d86cbe7a3607ed38 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/672/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 056155ef1c13b9d976a86ce2d86cbe7a3607ed38^{commit} # timeout=10
Checking out Revision 056155ef1c13b9d976a86ce2d86cbe7a3607ed38 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 056155ef1c13b9d976a86ce2d86cbe7a3607ed38 # timeout=10
Commit message: "blackening"
 > git rev-list --no-walk 716825debda4ed32108942f20ac45b1baa7fcbea # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins3255953055005442194.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
./nvtabular/loader/tf_utils.py:31:24: F821 undefined name 'tf_device'
./nvtabular/loader/tf_utils.py:31:87: F821 undefined name 'tf_mem_size'
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins1468419213717319336.sh

nvidia-merlin-bot · 2020-08-26T21:21:42Z

Click to view CI Results

GitHub pull request #224 of commit 9c82df5f68c2605773fa3b744ca8e67cf4594947, no merge conflicts.
Running as SYSTEM
Setting status of 9c82df5f68c2605773fa3b744ca8e67cf4594947 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/673/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 9c82df5f68c2605773fa3b744ca8e67cf4594947^{commit} # timeout=10
Checking out Revision 9c82df5f68c2605773fa3b744ca8e67cf4594947 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9c82df5f68c2605773fa3b744ca8e67cf4594947 # timeout=10
Commit message: "fixed tf_util bug"
 > git rev-list --no-walk 056155ef1c13b9d976a86ce2d86cbe7a3607ed38 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7392629584272941548.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
./nvtabular/loader/tf_utils.py:31:84: F821 undefined name 'tf_mem_size'
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins996061379472624272.sh

nvidia-merlin-bot · 2020-08-26T21:27:06Z

Click to view CI Results

GitHub pull request #224 of commit 933b71900713e90a0044cf0d294ef3de24067712, no merge conflicts.
Running as SYSTEM
Setting status of 933b71900713e90a0044cf0d294ef3de24067712 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/674/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 933b71900713e90a0044cf0d294ef3de24067712^{commit} # timeout=10
Checking out Revision 933b71900713e90a0044cf0d294ef3de24067712 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 933b71900713e90a0044cf0d294ef3de24067712 # timeout=10
Commit message: "fixing tf_utils bug"
 > git rev-list --no-walk 9c82df5f68c2605773fa3b744ca8e67cf4594947 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6709133231254430996.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/tf_utils.py
Oh no! 💥 💔 💥
1 file would be reformatted, 30 files would be left unchanged.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins8185771004694239439.sh

nvidia-merlin-bot · 2020-08-26T21:28:17Z

Click to view CI Results

GitHub pull request #224 of commit 3db05b05683d1af460215eca87b707b6ccfb7bbf, no merge conflicts.
Running as SYSTEM
Setting status of 3db05b05683d1af460215eca87b707b6ccfb7bbf to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/675/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 3db05b05683d1af460215eca87b707b6ccfb7bbf^{commit} # timeout=10
Checking out Revision 3db05b05683d1af460215eca87b707b6ccfb7bbf (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3db05b05683d1af460215eca87b707b6ccfb7bbf # timeout=10
Commit message: "blackening"
 > git rev-list --no-walk 933b71900713e90a0044cf0d294ef3de24067712 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8254015544088330717.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
./nvtabular/loader/tf_utils.py:32:64: F821 undefined name 'memory_allcation'
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins8077269931564506719.sh

nvidia-merlin-bot · 2020-08-26T21:36:24Z

Click to view CI Results

GitHub pull request #224 of commit 9458a8d241f925c7c38a438af3ed2c17334f9ad8, no merge conflicts.
Running as SYSTEM
Setting status of 9458a8d241f925c7c38a438af3ed2c17334f9ad8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/676/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 9458a8d241f925c7c38a438af3ed2c17334f9ad8^{commit} # timeout=10
Checking out Revision 9458a8d241f925c7c38a438af3ed2c17334f9ad8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9458a8d241f925c7c38a438af3ed2c17334f9ad8 # timeout=10
Commit message: "blackening"
 > git rev-list --no-walk 3db05b05683d1af460215eca87b707b6ccfb7bbf # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins839065965201183044.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: hypothesis-5.28.0, forked-1.3.0, xdist-2.1.0, cov-2.10.1
collected 419 items / 1 skipped / 418 selected
tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [ 11%]

..........                                                               [ 14%]

tests/unit/test_io.py .................................................. [ 26%]

............................                                             [ 32%]

tests/unit/test_notebooks.py F.F.                                        [ 33%]

tests/unit/test_ops.py ................................................. [ 45%]

........................................................................ [ 62%]

........................                                                 [ 68%]

tests/unit/test_tf_dataloader.py FFFFFFFFFFFF                            [ 71%]

tests/unit/test_torch_dataloader.py ......FFFFFFFFFFFFFFF                [ 76%]

tests/unit/test_workflow.py ............................................ [ 86%]

.......................................................                  [100%]
=================================== FAILURES ===================================

_____________________________ test_criteo_notebook _____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_criteo_notebook0')
def test_criteo_notebook(tmpdir):
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, f"day_{i}.parquet"))
    os.environ["INPUT_DATA_DIR"] = str(tmpdir)
    os.environ["OUTPUT_DATA_DIR"] = str(tmpdir)

    _run_notebook(
        tmpdir,
        os.path.join(dirname(TEST_PATH), "examples", "criteo-example.ipynb"),
        # disable rmm.reinitialize, seems to be causing issues


      transform=lambda line: line.replace("rmm.reinitialize(", "# rmm.reinitialize("),


    )

tests/unit/test_notebooks.py:29:

tests/unit/test_notebooks.py:92: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/opt/conda/lib/python3.7/subprocess.py:395: in check_output

**kwargs).stdout

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/opt/conda/bin/python', '/tmp/pytest-of-jenkins/pytest-13/test_criteo_notebook0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f577e07c7d0>

stdout = b'\xe2\x96\x88\repoch     train_loss  valid_loss  accuracy  time    \n\xe2\x96\x88\r'

stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired:
            process.kill()
            stdout, stderr = process.communicate()
            raise TimeoutExpired(process.args, timeout, output=stdout,
                                 stderr=stderr)
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
            raise CalledProcessError(retcode, process.args,


                                   output=stdout, stderr=stderr)


E               subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '/tmp/pytest-of-jenkins/pytest-13/test_criteo_notebook0/notebook.py']' returned non-zero exit status 1.
/opt/conda/lib/python3.7/subprocess.py:487: CalledProcessError

----------------------------- Captured stderr call -----------------------------

/opt/conda/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))

/opt/conda/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))

Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-13/test_criteo_notebook0/notebook.py", line 90, in 

learn.fit_one_cycle(epochs, learning_rate)

File "/opt/conda/lib/python3.7/site-packages/fastai/train.py", line 23, in fit_one_cycle

learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)

File "/opt/conda/lib/python3.7/site-packages/fastai/basic_train.py", line 200, in fit

fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)

File "/opt/conda/lib/python3.7/site-packages/fastai/basic_train.py", line 99, in fit

for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):

File "/opt/conda/lib/python3.7/site-packages/fastprogress/fastprogress.py", line 47, in iter

raise e

File "/opt/conda/lib/python3.7/site-packages/fastprogress/fastprogress.py", line 41, in iter

for i,o in enumerate(self.gen):

File "/opt/conda/lib/python3.7/site-packages/fastai/basic_data.py", line 75, in iter

for b in self.dl: yield self.proc_batch(b)

File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in next

data = self._next_data()

File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 403, in _next_data

data = self._dataset_fetcher.fetch(index)  # may raise StopIteration

File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch

data.append(next(self.dataset_iter))

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/backend.py", line 262, in next

return self._get_next_batch()

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/backend.py", line 289, in _get_next_batch

self._fetch_chunk()

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/backend.py", line 268, in _fetch_chunk

raise chunks

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/backend.py", line 140, in load_chunks

spill = dataloader._handle_tensors(spill)

TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'

_____________________________ test_rossman_example _____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_rossman_example0')
def test_rossman_example(tmpdir):
    pytest.importorskip("nvtabular.loader.tensorflow")
    _get_random_rossmann_data(1000).to_csv(os.path.join(tmpdir, "train.csv"))
    _get_random_rossmann_data(1000).to_csv(os.path.join(tmpdir, "valid.csv"))
    os.environ["INPUT_DATA_DIR"] = str(tmpdir)

    notebook_path = os.path.join(
        dirname(TEST_PATH), "examples", "rossmann-store-sales-example.ipynb"
    )


  _run_notebook(tmpdir, notebook_path, lambda line: line.replace("EPOCHS = 25", "EPOCHS = 1"))


tests/unit/test_notebooks.py:51:

tests/unit/test_notebooks.py:92: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/opt/conda/lib/python3.7/subprocess.py:395: in check_output

**kwargs).stdout

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/opt/conda/bin/python', '/tmp/pytest-of-jenkins/pytest-13/test_rossman_example0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f577dfb8550>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired:
            process.kill()
            stdout, stderr = process.communicate()
            raise TimeoutExpired(process.args, timeout, output=stdout,
                                 stderr=stderr)
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
            raise CalledProcessError(retcode, process.args,


                                   output=stdout, stderr=stderr)


E               subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '/tmp/pytest-of-jenkins/pytest-13/test_rossman_example0/notebook.py']' returned non-zero exit status 1.
/opt/conda/lib/python3.7/subprocess.py:487: CalledProcessError

----------------------------- Captured stderr call -----------------------------

/opt/conda/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))

/opt/conda/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))

Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-13/test_rossman_example0/notebook.py", line 59, in 

proc.apply(train_dataset, record_stats=True, output_path=PREPROCESS_DIR_TRAIN, shuffle=nvt.io.Shuffle.PER_WORKER, out_files_per_proc=2)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 729, in apply

num_io_threads=num_io_threads,

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 829, in build_and_process_graph

self.exec_phase(idx, record_stats=record_stats)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 612, in exec_phase

self._aggregated_dask_transform(transforms)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 587, in _aggregated_dask_transform

ddf = self.get_ddf()

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 575, in get_ddf

return self.ddf.to_ddf(columns=columns, shuffle=self._shuffle_parts)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py", line 809, in to_ddf

ddf = self.engine.to_ddf(columns=columns)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py", line 1060, in to_ddf

return dask_cudf.read_csv(self.paths, chunksize=self.part_size, **self.csv_kwargs)[

File "/opt/conda/lib/python3.7/site-packages/dask_cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7.egg/dask_cudf/io/csv.py", line 19, in read_csv

return _internal_read_csv(path=path, chunksize=chunksize, **kwargs)

File "/opt/conda/lib/python3.7/site-packages/dask_cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7.egg/dask_cudf/io/csv.py", line 59, in _internal_read_csv

meta = dask_reader(filenames[0], **kwargs)._meta

File "/opt/conda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 649, in read

**kwargs,

File "/opt/conda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 479, in read_pandas

**(storage_options or {}),

File "/opt/conda/lib/python3.7/site-packages/dask/bytes/core.py", line 125, in read_bytes

size = fs.info(path)["size"]

File "/opt/conda/lib/python3.7/site-packages/fsspec/implementations/local.py", line 60, in info

out = os.stat(path, follow_symlinks=False)

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-jenkins/pytest-13/test_optimize_criteo0/train.csv'

_____________________ test_tf_gpu_dl[True-1-parquet-0.01] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_True_1_parquet_0')

paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f580118fad0>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):


      X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:64:

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

nvtabular/loader/backend.py:289: in _get_next_batch

self._fetch_chunk()

nvtabular/loader/backend.py:268: in _fetch_chunk

raise chunks

nvtabular/loader/backend.py:120: in load_chunks

chunks = dataloader._create_tensors(chunks)

nvtabular/loader/backend.py:372: in _create_tensors

conts = self._to_tensor(gdf_conts)

nvtabular/loader/tensorflow.py:265: in _to_tensor

dlpack = gdf.values.T.toDlpack()

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:907: in values

return cupy.asarray(self.as_gpu_matrix())

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:3208: in as_gpu_matrix

matrix[:, colidx] = dense

cupy/core/core.pyx:1248: in cupy.core.core.ndarray.setitem

???

cupy/core/_routines_indexing.pyx:49: in cupy.core._routines_indexing._ndarray_setitem

???

cupy/core/_routines_indexing.pyx:801: in cupy.core._routines_indexing._scatter_op

???

cupy/core/core.pyx:517: in cupy.core.core.ndarray.fill

???

cupy/core/_kernel.pyx:605: in cupy.core._kernel.ElementwiseKernel.call

???


???

E   ValueError: Array device must be same as the current device: array device = 3 while current = 0

cupy/core/_kernel.pyx:95: ValueError

----------------------------- Captured stderr call -----------------------------

2020-08-26 21:33:55.366901: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2020-08-26 21:33:55.391236: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3198080000 Hz

2020-08-26 21:33:55.392182: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f573025c1e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:

2020-08-26 21:33:55.392220: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version

2020-08-26 21:33:55.945518: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f57302c7d70 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:

2020-08-26 21:33:55.945589: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla P100-DGXS-16GB, Compute Capability 6.0

2020-08-26 21:33:55.945613: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): Tesla P100-DGXS-16GB, Compute Capability 6.0

2020-08-26 21:33:55.945632: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): Tesla P100-DGXS-16GB, Compute Capability 6.0

2020-08-26 21:33:55.945651: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): Tesla P100-DGXS-16GB, Compute Capability 6.0

2020-08-26 21:33:55.949931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2020-08-26 21:33:55.951931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2020-08-26 21:33:55.953342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 2 with properties:

pciBusID: 0000:0e:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2020-08-26 21:33:55.954727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 3 with properties:

pciBusID: 0000:0f:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2020-08-26 21:33:55.954841: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1

2020-08-26 21:33:55.954875: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10

2020-08-26 21:33:55.954904: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10

2020-08-26 21:33:55.954932: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10

2020-08-26 21:33:55.954958: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10

2020-08-26 21:33:55.954983: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10

2020-08-26 21:33:55.955010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7

2020-08-26 21:33:55.965076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1, 2, 3

2020-08-26 21:33:55.965137: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1

2020-08-26 21:33:55.970257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:

2020-08-26 21:33:55.970285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 1 2 3

2020-08-26 21:33:55.970295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N Y Y Y

2020-08-26 21:33:55.970330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 1:   Y N Y Y

2020-08-26 21:33:55.970340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 2:   Y Y N Y

2020-08-26 21:33:55.970348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 3:   Y Y Y N

2020-08-26 21:33:55.975406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory) -> physical GPU (device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0)

2020-08-26 21:33:55.976876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15212 MB memory) -> physical GPU (device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0)

2020-08-26 21:33:55.978348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15212 MB memory) -> physical GPU (device: 2, name: Tesla P100-DGXS-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)

2020-08-26 21:33:55.979827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15212 MB memory) -> physical GPU (device: 3, name: Tesla P100-DGXS-16GB, pci bus id: 0000:0f:00.0, compute capability: 6.0)

_____________________ test_tf_gpu_dl[True-1-parquet-0.06] ______________________
self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f57802342d0>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:


      batch = next(self._batch_itr)


E           StopIteration
nvtabular/loader/backend.py:293: StopIteration
During handling of the above exception, another exception occurred:
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_True_1_parquet_1')

paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f57802347d0>

batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
        X, y = next(data_itr)

        # first elements to check epoch-to-epoch consistency
        if idx == 0:
            X0, y0 = X, y

        # check that we have at most batch_size elements
        num_samples = y[0].shape[0]
        if num_samples != batch_size:
            try:
                next(data_itr)
            except StopIteration:
                continue
            else:
                raise ValueError("Batch size too small at idx {}".format(idx))

        # check that all the features in X have the
        # appropriate length and that the set of
        # their names is exactly the set of names in
        # `columns`
        these_cols = columns.copy()
        for column, x in X.items():
            try:
                these_cols.remove(column)
            except ValueError:
                raise AssertionError
            assert x.shape[0] == num_samples
        assert len(these_cols) == 0

        rows += num_samples

    # check start of next epoch to ensure consistency


  X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:96:

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f57802342d0>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
        batch = next(self._batch_itr)
    except StopIteration:
        # anticipate any more chunks getting created
        # if not, raise the StopIteration
        if not self._working and self._buff.empty:
            self._workers = None
            self._batch_itr = None


          raise StopIteration


E               StopIteration
nvtabular/loader/backend.py:300: StopIteration

_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________
self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577e0799d0>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:


      batch = next(self._batch_itr)


E           StopIteration
nvtabular/loader/backend.py:293: StopIteration
During handling of the above exception, another exception occurred:
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_True_10_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f58010d7210>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):


      X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:64:

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

nvtabular/loader/backend.py:304: in _get_next_batch

self._fetch_chunk()

nvtabular/loader/backend.py:268: in _fetch_chunk

raise chunks

self = <nvtabular.loader.backend.ChunkQueue object at 0x7f58036618d0>, dev = 0

dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577e0799d0>
def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)


              spill = dataloader._handle_tensors(spill)


E                   TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'
nvtabular/loader/backend.py:140: TypeError

_____________________ test_tf_gpu_dl[True-10-parquet-0.06] _____________________
self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f57802d2ad0>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:


      batch = next(self._batch_itr)


E           StopIteration
nvtabular/loader/backend.py:293: StopIteration
During handling of the above exception, another exception occurred:
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_True_10_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f578027b050>

batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):


      X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:64:

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

nvtabular/loader/backend.py:304: in _get_next_batch

self._fetch_chunk()

nvtabular/loader/backend.py:268: in _fetch_chunk

raise chunks

self = <nvtabular.loader.backend.ChunkQueue object at 0x7f5781635b10>, dev = 0

dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f57802d2ad0>
def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)


              spill = dataloader._handle_tensors(spill)


E                   TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'
nvtabular/loader/backend.py:140: TypeError

____________________ test_tf_gpu_dl[True-100-parquet-0.01] _____________________
self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577dfb8f10>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:


      batch = next(self._batch_itr)


E           StopIteration
nvtabular/loader/backend.py:293: StopIteration
During handling of the above exception, another exception occurred:
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_True_100_parque0')

paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f577dfb8310>

batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):


      X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:64:

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

nvtabular/loader/backend.py:304: in _get_next_batch

self._fetch_chunk()

nvtabular/loader/backend.py:268: in _fetch_chunk

raise chunks

self = <nvtabular.loader.backend.ChunkQueue object at 0x7f577dfb8a90>, dev = 0

dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577dfb8f10>
def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)


              spill = dataloader._handle_tensors(spill)


E                   TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'
nvtabular/loader/backend.py:140: TypeError

____________________ test_tf_gpu_dl[True-100-parquet-0.06] _____________________
self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577d2aea50>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:


      batch = next(self._batch_itr)


E           StopIteration
nvtabular/loader/backend.py:293: StopIteration
During handling of the above exception, another exception occurred:
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_True_100_parque1')

paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f5800f67890>

batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):


      X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:64:

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

nvtabular/loader/backend.py:304: in _get_next_batch

self._fetch_chunk()

nvtabular/loader/backend.py:268: in _fetch_chunk

raise chunks

self = <nvtabular.loader.backend.ChunkQueue object at 0x7f5803767f10>, dev = 0

dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577d2aea50>
def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)


              spill = dataloader._handle_tensors(spill)


E                   TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'
nvtabular/loader/backend.py:140: TypeError

_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________
self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f5800fb5610>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:


      batch = next(self._batch_itr)


E           StopIteration
nvtabular/loader/backend.py:293: StopIteration
During handling of the above exception, another exception occurred:
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_False_1_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']

use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f58085c3050>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
        X, y = next(data_itr)

        # first elements to check epoch-to-epoch consistency
        if idx == 0:
            X0, y0 = X, y

        # check that we have at most batch_size elements
        num_samples = y[0].shape[0]
        if num_samples != batch_size:
            try:
                next(data_itr)
            except StopIteration:
                continue
            else:
                raise ValueError("Batch size too small at idx {}".format(idx))

        # check that all the features in X have the
        # appropriate length and that the set of
        # their names is exactly the set of names in
        # `columns`
        these_cols = columns.copy()
        for column, x in X.items():
            try:
                these_cols.remove(column)
            except ValueError:
                raise AssertionError
            assert x.shape[0] == num_samples
        assert len(these_cols) == 0

        rows += num_samples

    # check start of next epoch to ensure consistency


  X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:96:

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f5800fb5610>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
        batch = next(self._batch_itr)
    except StopIteration:
        # anticipate any more chunks getting created
        # if not, raise the StopIteration
        if not self._working and self._buff.empty:
            self._workers = None
            self._batch_itr = None


          raise StopIteration


E               StopIteration
nvtabular/loader/backend.py:300: StopIteration

_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________
self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f580104c550>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:


      batch = next(self._batch_itr)


E           StopIteration
nvtabular/loader/backend.py:293: StopIteration
During handling of the above exception, another exception occurred:
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_False_1_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']

use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f5800f12fd0>

batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):
        X, y = next(data_itr)

        # first elements to check epoch-to-epoch consistency
        if idx == 0:
            X0, y0 = X, y

        # check that we have at most batch_size elements
        num_samples = y[0].shape[0]
        if num_samples != batch_size:
            try:
                next(data_itr)
            except StopIteration:
                continue
            else:
                raise ValueError("Batch size too small at idx {}".format(idx))

        # check that all the features in X have the
        # appropriate length and that the set of
        # their names is exactly the set of names in
        # `columns`
        these_cols = columns.copy()
        for column, x in X.items():
            try:
                these_cols.remove(column)
            except ValueError:
                raise AssertionError
            assert x.shape[0] == num_samples
        assert len(these_cols) == 0

        rows += num_samples

    # check start of next epoch to ensure consistency


  X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:96:

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f580104c550>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:
        batch = next(self._batch_itr)
    except StopIteration:
        # anticipate any more chunks getting created
        # if not, raise the StopIteration
        if not self._working and self._buff.empty:
            self._workers = None
            self._batch_itr = None


          raise StopIteration


E               StopIteration
nvtabular/loader/backend.py:300: StopIteration

____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________
self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f5801077190>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:


      batch = next(self._batch_itr)


E           StopIteration
nvtabular/loader/backend.py:293: StopIteration
During handling of the above exception, another exception occurred:
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_False_10_parque0')

paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']

use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f577e053a10>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):


      X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:64:

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

nvtabular/loader/backend.py:304: in _get_next_batch

self._fetch_chunk()

nvtabular/loader/backend.py:268: in _fetch_chunk

raise chunks

self = <nvtabular.loader.backend.ChunkQueue object at 0x7f5800d47490>, dev = 0

dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f5801077190>
def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)


              spill = dataloader._handle_tensors(spill)


E                   TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'
nvtabular/loader/backend.py:140: TypeError

____________________ test_tf_gpu_dl[False-10-parquet-0.06] _____________________
self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f57802c8e90>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:


      batch = next(self._batch_itr)


E           StopIteration
nvtabular/loader/backend.py:293: StopIteration
During handling of the above exception, another exception occurred:
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_False_10_parque1')

paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']

use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f58011aac10>

batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):


      X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:64:

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

nvtabular/loader/backend.py:304: in _get_next_batch

self._fetch_chunk()

nvtabular/loader/backend.py:268: in _fetch_chunk

raise chunks

self = <nvtabular.loader.backend.ChunkQueue object at 0x7f5781610a50>, dev = 0

dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f57802c8e90>
def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)


              spill = dataloader._handle_tensors(spill)


E                   TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'
nvtabular/loader/backend.py:140: TypeError

____________________ test_tf_gpu_dl[False-100-parquet-0.01] ____________________
self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577e07cb90>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:


      batch = next(self._batch_itr)


E           StopIteration
nvtabular/loader/backend.py:293: StopIteration
During handling of the above exception, another exception occurred:
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_False_100_parqu0')

paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']

use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f578034af50>

batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):


      X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:64:

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

nvtabular/loader/backend.py:304: in _get_next_batch

self._fetch_chunk()

nvtabular/loader/backend.py:268: in _fetch_chunk

raise chunks

self = <nvtabular.loader.backend.ChunkQueue object at 0x7f5780208990>, dev = 0

dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f577e07cb90>
def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)


              spill = dataloader._handle_tensors(spill)


E                   TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'
nvtabular/loader/backend.py:140: TypeError

____________________ test_tf_gpu_dl[False-100-parquet-0.06] ____________________
self = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f5780264610>
def _get_next_batch(self):
    """
    adding this cheap shim so that we can call this
    step without it getting overridden by the
    framework-specific parent class's `__next__` method.
    TODO: can this be better solved with a metaclass
    implementation? My gut is that we don't actually
    necessarily *want*, in general, to be overriding
    __next__ and __iter__ methods
    """
    # we've never initialized, do that now
    # need this because tf.keras.Model.fit will
    # call next() cold
    if self._workers is None:
        DataLoader.__iter__(self)

    # get the first chunks
    if self._batch_itr is None:
        self._fetch_chunk()

    # try to iterate through existing batches
    try:


      batch = next(self._batch_itr)


E           StopIteration
nvtabular/loader/backend.py:293: StopIteration
During handling of the above exception, another exception occurred:
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_tf_gpu_dl_False_100_parqu1')

paths = ['/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-13/parquet0/dataset-1.parquet']

use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f57802c4790>

batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    for idx in range(len(data_itr)):


      X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:64:

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

nvtabular/loader/backend.py:304: in _get_next_batch

self._fetch_chunk()

nvtabular/loader/backend.py:268: in _fetch_chunk

raise chunks

self = <nvtabular.loader.backend.ChunkQueue object at 0x7f578025c4d0>, dev = 0

dataloader = <nvtabular.loader.tensorflow.KerasSequenceLoader object at 0x7f5780264610>
def load_chunks(self, dev, dataloader):
    try:
        indices = dataloader._gather_indices_for_dev(dev)
        itr = dataloader.data.to_iter(indices=indices)

        with dataloader._get_device_ctx(dev):
            spill = None
            for chunks in self.batch(itr):
                if self.stopped:
                    return

                if spill and not spill.empty:
                    chunks.insert(0, spill)

                chunks = cudf.core.reshape.concat(chunks)
                chunks.reset_index(drop=True, inplace=True)
                chunks, spill = self.get_batch_div_chunk(chunks, dataloader.batch_size)
                if self.shuffle:
                    _shuffle_gdf(chunks)

                num_samples = len(chunks)
                if num_samples > 0:
                    for workflow in dataloader.workflows:
                        chunks = workflow.apply_ops(chunks)

                    # map from big chunk to fraemwork specific tensors
                    chunks = dataloader._create_tensors(chunks)

                    # split them into batches and map to
                    # the framework-specific output format
                    chunks = [dataloader._create_batch(x, num_samples) for x in chunks]
                    chunks = zip(*chunks)
                    chunks = [dataloader._handle_tensors(*tensors) for tensors in chunks]

                    # put returns True if buffer is stopped before
                    # packet can be put in queue. Keeps us from
                    # freezing on a put on a full queue
                    if self.put(chunks):
                        return
                chunks = None

            # takes care final batch, which is less than batch size
            if spill:
                for workflow in dataloader.workflows:
                    spill = workflow.apply_ops(spill)
                spill = dataloader._create_tensors(spill)


              spill = dataloader._handle_tensors(spill)


E                   TypeError: _handle_tensors() missing 2 required positional arguments: 'conts' and 'labels'
nvtabular/loader/backend.py:140: TypeError

___________________________ test_empty_cols[parquet] ___________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_empty_cols_parquet_0')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f577e08a290>, engine = 'parquet'
@pytest.mark.parametrize("engine", ["parquet"])
def test_empty_cols(tmpdir, df, dataset, engine):
    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    # first with no continuous columns
    no_conts = torch_dataloader.TorchAsyncItr(
        dataset, cats=["id"], conts=[], labels=["label"], batch_size=1
    )


  assert all(conts is None for _, conts, _ in no_conts)


tests/unit/test_torch_dataloader.py:52:

tests/unit/test_torch_dataloader.py:52: in 

assert all(conts is None for _, conts, _ in no_conts)

nvtabular/loader/backend.py:262: in next

return self._get_next_batch()

nvtabular/loader/backend.py:289: in _get_next_batch

self._fetch_chunk()

nvtabular/loader/backend.py:268: in _fetch_chunk

raise chunks

nvtabular/loader/backend.py:124: in load_chunks

chunks = [dataloader._create_batch(x, num_samples) for x in chunks]

nvtabular/loader/backend.py:124: in 

chunks = [dataloader._create_batch(x, num_samples) for x in chunks]

nvtabular/loader/torch.py:103: in _create_batch

return torch.split(tensor, idx)

tensor = None, split_size_or_sections = [1, 1, 1, 1, 1, 1, ...], dim = 0
def split(tensor, split_size_or_sections, dim=0):
    r"""Splits the tensor into chunks. Each chunk is a view of the original tensor.

    If :attr:`split_size_or_sections` is an integer type, then :attr:`tensor` will
    be split into equally sized chunks (if possible). Last chunk will be smaller if
    the tensor size along the given dimension :attr:`dim` is not divisible by
    :attr:`split_size`.

    If :attr:`split_size_or_sections` is a list, then :attr:`tensor` will be split
    into ``len(split_size_or_sections)`` chunks with sizes in :attr:`dim` according
    to :attr:`split_size_or_sections`.

    Arguments:
        tensor (Tensor): tensor to split.
        split_size_or_sections (int) or (list(int)): size of a single chunk or
            list of sizes for each chunk
        dim (int): dimension along which to split the tensor.

    Example::
        >>> a = torch.arange(10).reshape(5,2)
        >>> a
        tensor([[0, 1],
                [2, 3],
                [4, 5],
                [6, 7],
                [8, 9]])
        >>> torch.split(a, 2)
        (tensor([[0, 1],
                 [2, 3]]),
         tensor([[4, 5],
                 [6, 7]]),
         tensor([[8, 9]]))
        >>> torch.split(a, [1,4])
        (tensor([[0, 1]]),
         tensor([[2, 3],
                 [4, 5],
                 [6, 7],
                 [8, 9]]))
    """
    if not torch.jit.is_scripting():
        if type(tensor) is not Tensor and has_torch_function((tensor,)):
            return handle_torch_function(split, (tensor,), tensor, split_size_or_sections,
                                         dim=dim)
    # Overwriting reason:
    # This dispatches to two ATen functions depending on the type of
    # split_size_or_sections. The branching code is in tensor.py, which we
    # call here.


  return tensor.split(split_size_or_sections, dim)


E       AttributeError: 'NoneType' object has no attribute 'split'
/opt/conda/lib/python3.7/site-packages/torch/functional.py:115: AttributeError

______________________ test_gpu_dl[None-parquet-1-1e-06] _______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_None_parquet_1_1e_0')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f570845a5d0>, batch_size = 1

part_mem_fraction = 1e-06, engine = 'parquet', devices = None
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:91: AttributeError

_______________________ test_gpu_dl[None-parquet-1-0.06] _______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_None_parquet_1_0_00')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f577dfb8a90>, batch_size = 1

part_mem_fraction = 0.06, engine = 'parquet', devices = None
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:91: AttributeError

______________________ test_gpu_dl[None-parquet-10-1e-06] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_None_parquet_10_1e0')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f578020b610>, batch_size = 10

part_mem_fraction = 1e-06, engine = 'parquet', devices = None
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:91: AttributeError

______________________ test_gpu_dl[None-parquet-10-0.06] _______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_None_parquet_10_0_0')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f5800f85f50>, batch_size = 10

part_mem_fraction = 0.06, engine = 'parquet', devices = None
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:91: AttributeError

_____________________ test_gpu_dl[None-parquet-100-1e-06] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_None_parquet_100_10')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f578164e610>, batch_size = 100

part_mem_fraction = 1e-06, engine = 'parquet', devices = None
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:91: AttributeError

______________________ test_gpu_dl[None-parquet-100-0.06] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_None_parquet_100_00')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f5800f16d90>, batch_size = 100

part_mem_fraction = 0.06, engine = 'parquet', devices = None
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:91: AttributeError

____________________ test_gpu_dl[devices1-parquet-1-1e-06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_devices1_parquet_10')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f5781705790>, batch_size = 1

part_mem_fraction = 1e-06, engine = 'parquet', devices = [0, 1]
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:91: AttributeError

_____________________ test_gpu_dl[devices1-parquet-1-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_devices1_parquet_11')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f577dfc50d0>, batch_size = 1

part_mem_fraction = 0.06, engine = 'parquet', devices = [0, 1]
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:91: AttributeError

____________________ test_gpu_dl[devices1-parquet-10-1e-06] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_devices1_parquet_12')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f5781705cd0>, batch_size = 10

part_mem_fraction = 1e-06, engine = 'parquet', devices = [0, 1]
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:91: AttributeError

____________________ test_gpu_dl[devices1-parquet-10-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_devices1_parquet_13')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f5780264990>, batch_size = 10

part_mem_fraction = 0.06, engine = 'parquet', devices = [0, 1]
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:91: AttributeError

___________________ test_gpu_dl[devices1-parquet-100-1e-06] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_devices1_parquet_14')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f5775367c10>, batch_size = 100

part_mem_fraction = 1e-06, engine = 'parquet', devices = [0, 1]
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:91: AttributeError

____________________ test_gpu_dl[devices1-parquet-100-0.06] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_gpu_dl_devices1_parquet_15')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f577e068790>, batch_size = 100

part_mem_fraction = 0.06, engine = 'parquet', devices = [0, 1]
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, [0, 1]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data,
        batch_size=batch_size,
        cats=cat_names,
        conts=cont_names,
        labels=["label"],
        devices=devices,
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:91: AttributeError

_________________________ test_kill_dl[parquet-1e-06] __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_kill_dl_parquet_1e_06_0')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f577dfc5090>

part_mem_fraction = 1e-06, engine = 'parquet'
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
def test_kill_dl(tmpdir, df, dataset, part_mem_fraction, engine):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data, cats=cat_names, conts=cont_names, labels=["label"]
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:163: AttributeError

__________________________ test_kill_dl[parquet-0.1] ___________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_kill_dl_parquet_0_1_0')

df =      name-cat name-string    id  label         x         y

0      Ingrid       Jerry   954    966 -0.692361  0.614564

...er  1023   1012 -0.365027  0.816941

2160  Charlie         Ray  1005   1016  0.056081 -0.808740
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f5779548190>

part_mem_fraction = 0.1, engine = 'parquet'
@pytest.mark.parametrize("part_mem_fraction", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
def test_kill_dl(tmpdir, df, dataset, part_mem_fraction, engine):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
    )

    tar_paths = [
        os.path.join(output_train, x) for x in os.listdir(output_train) if x.endswith("parquet")
    ]

    nvt_data = nvt.Dataset(tar_paths[0], engine="parquet", part_mem_fraction=part_mem_fraction)


  data_itr = nvt.torch_dataloader.TorchAsyncItr(


        nvt_data, cats=cat_names, conts=cont_names, labels=["label"]
    )

E       AttributeError: module 'nvtabular' has no attribute 'torch_dataloader'
tests/unit/test_torch_dataloader.py:163: AttributeError

=============================== warnings summary ===============================

/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12

/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

import pandas.util.testing
tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:77: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.

warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39959 instead

http_address["port"], self.http_server.port
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30548 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30800 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29876 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 31108 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 31472 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 60480 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_workflow.py::test_chaining_3

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:748: UserWarning: part_mem_fraction is ignored for DataFrame input.

warnings.warn("part_mem_fraction is ignored for DataFrame input.")
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.4-final-0 -----------

Name                             Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                6      0      0      0   100%

nvtabular/categorify.py            269     49    150     31    78%   69->70, 70-72, 74->75, 75, 76->77, 77, 95->98, 98, 109->110, 110, 116->118, 141->142, 142-143, 145->146, 146-147, 149->150, 150-166, 168->172, 172, 176->177, 177, 178->179, 179, 186->187, 187, 190->192, 192->193, 193, 196->200, 200-203, 213->214, 214, 216->218, 220->237, 237-240, 263->264, 264, 267->268, 268, 269->270, 270, 277->278, 278, 279->282, 282, 386->387, 387, 388->389, 389, 410->425, 450->455, 453->454, 454, 464->461, 469->461

nvtabular/column_similarity.py      88     21     28      4    70%   170-171, 180-182, 190-206, 221->231, 223->226, 226->227, 227, 236->237, 237

nvtabular/io.py                    569     41    226     32    91%   73->74, 74, 78->81, 81, 86-88, 101->102, 102, 104->105, 105, 112->113, 113, 129->130, 130, 134->136, 136->132, 140->141, 141, 142->143, 143, 151->156, 162->163, 163-164, 182->183, 183, 195->198, 198, 213->214, 214, 230, 247, 271->272, 272, 310, 313, 374->375, 375, 396-398, 479->481, 526->549, 553, 685->688, 745->746, 746, 758->759, 759, 767->768, 768, 776->788, 781->786, 786-788, 863->864, 864, 961->963, 963-965, 973->975, 975, 1001->1002, 1002, 1055->1056, 1056, 1079->1080, 1080, 1085, 1100->1101, 1101

nvtabular/loader/init.py         0      0      0      0   100%

nvtabular/loader/backend.py        183     19     62      8    87%   71->72, 72, 77-78, 102->103, 103, 111->112, 112, 131->132, 132, 141, 147-151, 154, 221->223, 223, 228->229, 229-233, 243->244, 244, 248->249, 249, 379

nvtabular/loader/tensorflow.py     102     34     40     10    62%   37->38, 38-39, 49->50, 50, 57->58, 58-61, 70->71, 71, 74->75, 75, 76->81, 81, 240-251, 257->258, 258, 278->279, 279, 280->283, 283, 288->289, 289, 297-299, 302-304, 312, 315-323

nvtabular/loader/tf_utils.py        51     24     20      5    45%   13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48, 60-65, 75-88

nvtabular/loader/torch.py           31      0      2      0   100%

nvtabular/ops.py                   547     34    166     31    90%   54->53, 56->57, 57, 82-86, 109->111, 128->129, 129, 223, 281, 331, 356->357, 357, 365->366, 366, 370->372, 372->373, 373, 426->427, 427, 442->443, 443-445, 446->449, 449, 495->496, 496, 503->502, 536->537, 537, 546->548, 548-549, 585->586, 586, 615->616, 616, 718->719, 719, 743, 937->938, 938, 939->940, 940, 954->957, 957, 970->974, 1107->1108, 1108, 1116->1121, 1121, 1131->1132, 1132, 1174->1175, 1175, 1216->1217, 1217, 1220->1226, 1285->1286, 1286, 1287->1288, 1288, 1324->1325, 1325

nvtabular/worker.py                 65      1     30      2    97%   80->92, 118->121, 121

nvtabular/workflow.py              415     47    230     22    87%   96->100, 100, 106->107, 107-111, 141->exit, 157->exit, 173->exit, 189->exit, 242->244, 292->293, 293, 372->375, 375, 396-411, 473->474, 474, 492->494, 494-503, 514->513, 563->568, 568, 571->572, 572, 607->608, 608, 655->646, 721->732, 732, 755-785, 813->814, 814, 827->830, 860->861, 861-863, 867->868, 868, 901->902, 902

setup.py                             2      2      0      0     0%   18-20
TOTAL                             2328    272    954    145    85%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 85.16%

=========================== short test summary info ============================

FAILED tests/unit/test_notebooks.py::test_criteo_notebook - subprocess.Called...

FAILED tests/unit/test_notebooks.py::test_rossman_example - subprocess.Called...

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[parquet] - Attrib...

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-1e-06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-1e-06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-1e-06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06] - Att...

FAILED tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-0.1] - Attri...

====== 29 failed, 390 passed, 1 skipped, 17 warnings in 350.82s (0:05:50) ======

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate

cd /var/jenkins_home/

python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins8374203174643733384.sh

nvidia-merlin-bot · 2020-08-26T21:54:08Z

Click to view CI Results

GitHub pull request #224 of commit 6f543f9d99d5aaf52ea62e6d512e2fa5647f42ff, no merge conflicts.
Running as SYSTEM
Setting status of 6f543f9d99d5aaf52ea62e6d512e2fa5647f42ff to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/677/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 6f543f9d99d5aaf52ea62e6d512e2fa5647f42ff^{commit} # timeout=10
Checking out Revision 6f543f9d99d5aaf52ea62e6d512e2fa5647f42ff (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6f543f9d99d5aaf52ea62e6d512e2fa5647f42ff # timeout=10
Commit message: "fixing bug in loader backend"
 > git rev-list --no-walk 9458a8d241f925c7c38a438af3ed2c17334f9ad8 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6580127825192388955.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: hypothesis-5.28.0, forked-1.3.0, xdist-2.1.0, cov-2.10.1
collected 419 items / 1 skipped / 418 selected
tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [ 11%]

..........                                                               [ 14%]

tests/unit/test_io.py .................................................. [ 26%]

............................                                             [ 32%]

tests/unit/test_notebooks.py ..F.                                        [ 33%]

tests/unit/test_ops.py ................................................. [ 45%]

........................................................................ [ 62%]

........................                                                 [ 68%]

tests/unit/test_tf_dataloader.py FFFFFFFFFFFF                            [ 71%]

tests/unit/test_torch_dataloader.py ......FFFFFFFFFFFFBuild timed out (after 15 minutes). Marking the build as failed.

Build was aborted

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate

cd /var/jenkins_home/

python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins3625263205336297522.sh

nvidia-merlin-bot · 2020-08-27T00:53:12Z

Click to view CI Results

GitHub pull request #224 of commit 1042d968da88cfba12597c35e42fae013d06d139, no merge conflicts.
Running as SYSTEM
Setting status of 1042d968da88cfba12597c35e42fae013d06d139 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/680/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 1042d968da88cfba12597c35e42fae013d06d139^{commit} # timeout=10
Checking out Revision 1042d968da88cfba12597c35e42fae013d06d139 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 1042d968da88cfba12597c35e42fae013d06d139 # timeout=10
Commit message: "tests passing"
 > git rev-list --no-walk 823142479af5643c9325479ca695ef2d5455a657 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins4226273224442509525.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/tests/unit/test_tf_dataloader.py
Oh no! 💥 💔 💥
1 file would be reformatted, 30 files would be left unchanged.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins5384196860892043366.sh

nvidia-merlin-bot · 2020-08-27T01:01:28Z

Click to view CI Results

GitHub pull request #224 of commit 05b9a25ee4c0bdb9001c40f597bacaad3921e37c, no merge conflicts.
Running as SYSTEM
Setting status of 05b9a25ee4c0bdb9001c40f597bacaad3921e37c to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/681/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 05b9a25ee4c0bdb9001c40f597bacaad3921e37c^{commit} # timeout=10
Checking out Revision 05b9a25ee4c0bdb9001c40f597bacaad3921e37c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 05b9a25ee4c0bdb9001c40f597bacaad3921e37c # timeout=10
Commit message: "blackening"
 > git rev-list --no-walk 1042d968da88cfba12597c35e42fae013d06d139 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6903780172210067657.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: hypothesis-5.28.0, forked-1.3.0, xdist-2.1.0, cov-2.10.1
collected 419 items / 1 skipped / 418 selected
tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [ 11%]

..........                                                               [ 14%]

tests/unit/test_io.py .................................................. [ 26%]

............................                                             [ 32%]

tests/unit/test_notebooks.py ..F.                                        [ 33%]

tests/unit/test_ops.py ................................................. [ 45%]

........................................................................ [ 62%]

........................                                                 [ 68%]

tests/unit/test_tf_dataloader.py F...........                            [ 71%]

tests/unit/test_torch_dataloader.py .....................                [ 76%]

tests/unit/test_workflow.py ............................................ [ 86%]

.......................................................                  [100%]
=================================== FAILURES ===================================

_____________________________ test_rossman_example _____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-19/test_rossman_example0')
def test_rossman_example(tmpdir):
    pytest.importorskip("nvtabular.loader.tensorflow")
    _get_random_rossmann_data(1000).to_csv(os.path.join(tmpdir, "train.csv"))
    _get_random_rossmann_data(1000).to_csv(os.path.join(tmpdir, "valid.csv"))
    os.environ["INPUT_DATA_DIR"] = str(tmpdir)

    notebook_path = os.path.join(
        dirname(TEST_PATH), "examples", "rossmann-store-sales-example.ipynb"
    )


  _run_notebook(tmpdir, notebook_path, lambda line: line.replace("EPOCHS = 25", "EPOCHS = 1"))


tests/unit/test_notebooks.py:51:

tests/unit/test_notebooks.py:92: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/opt/conda/lib/python3.7/subprocess.py:395: in check_output

**kwargs).stdout

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/opt/conda/bin/python', '/tmp/pytest-of-jenkins/pytest-19/test_rossman_example0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f01e5141e50>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired:
            process.kill()
            stdout, stderr = process.communicate()
            raise TimeoutExpired(process.args, timeout, output=stdout,
                                 stderr=stderr)
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
            raise CalledProcessError(retcode, process.args,


                                   output=stdout, stderr=stderr)


E               subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '/tmp/pytest-of-jenkins/pytest-19/test_rossman_example0/notebook.py']' returned non-zero exit status 1.
/opt/conda/lib/python3.7/subprocess.py:487: CalledProcessError

----------------------------- Captured stderr call -----------------------------

/opt/conda/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))

/opt/conda/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))

Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-19/test_rossman_example0/notebook.py", line 59, in 

proc.apply(train_dataset, record_stats=True, output_path=PREPROCESS_DIR_TRAIN, shuffle=nvt.io.Shuffle.PER_WORKER, out_files_per_proc=2)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 729, in apply

num_io_threads=num_io_threads,

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 829, in build_and_process_graph

self.exec_phase(idx, record_stats=record_stats)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 612, in exec_phase

self._aggregated_dask_transform(transforms)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 587, in _aggregated_dask_transform

ddf = self.get_ddf()

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 575, in get_ddf

return self.ddf.to_ddf(columns=columns, shuffle=self._shuffle_parts)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py", line 809, in to_ddf

ddf = self.engine.to_ddf(columns=columns)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py", line 1060, in to_ddf

return dask_cudf.read_csv(self.paths, chunksize=self.part_size, **self.csv_kwargs)[

File "/opt/conda/lib/python3.7/site-packages/dask_cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7.egg/dask_cudf/io/csv.py", line 19, in read_csv

return _internal_read_csv(path=path, chunksize=chunksize, **kwargs)

File "/opt/conda/lib/python3.7/site-packages/dask_cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7.egg/dask_cudf/io/csv.py", line 59, in _internal_read_csv

meta = dask_reader(filenames[0], **kwargs)._meta

File "/opt/conda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 649, in read

**kwargs,

File "/opt/conda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 479, in read_pandas

**(storage_options or {}),

File "/opt/conda/lib/python3.7/site-packages/dask/bytes/core.py", line 125, in read_bytes

size = fs.info(path)["size"]

File "/opt/conda/lib/python3.7/site-packages/fsspec/implementations/local.py", line 60, in info

out = os.stat(path, follow_symlinks=False)

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-jenkins/pytest-19/test_optimize_criteo0/train.csv'

_____________________ test_tf_gpu_dl[True-1-parquet-0.01] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-19/test_tf_gpu_dl_True_1_parquet_0')

paths = ['/tmp/pytest-of-jenkins/pytest-19/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-19/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f026462dad0>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    dont_iter = False
    for idx in range(len(data_itr)):


      X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:65:

nvtabular/loader/backend.py:267: in next

return self._get_next_batch()

nvtabular/loader/backend.py:294: in _get_next_batch

self._fetch_chunk()

nvtabular/loader/backend.py:273: in _fetch_chunk

raise chunks

nvtabular/loader/backend.py:124: in load_chunks

chunks = dataloader._create_tensors(chunks)

nvtabular/loader/backend.py:377: in _create_tensors

conts = self._to_tensor(gdf_conts)

nvtabular/loader/tensorflow.py:266: in _to_tensor

dlpack = gdf.values.T.toDlpack()

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:907: in values

return cupy.asarray(self.as_gpu_matrix())

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:3208: in as_gpu_matrix

matrix[:, colidx] = dense

cupy/core/core.pyx:1248: in cupy.core.core.ndarray.setitem

???

cupy/core/_routines_indexing.pyx:49: in cupy.core._routines_indexing._ndarray_setitem

???

cupy/core/_routines_indexing.pyx:801: in cupy.core._routines_indexing._scatter_op

???

cupy/core/core.pyx:517: in cupy.core.core.ndarray.fill

???

cupy/core/_kernel.pyx:605: in cupy.core._kernel.ElementwiseKernel.call

???


???

E   ValueError: Array device must be same as the current device: array device = 3 while current = 0

cupy/core/_kernel.pyx:95: ValueError

----------------------------- Captured stderr call -----------------------------

2020-08-27 00:57:06.964471: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2020-08-27 00:57:06.987273: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3198080000 Hz

2020-08-27 00:57:06.988492: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f018c25caf0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:

2020-08-27 00:57:06.988555: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version

2020-08-27 00:57:07.310560: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f018c2c85f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:

2020-08-27 00:57:07.310636: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla P100-DGXS-16GB, Compute Capability 6.0

2020-08-27 00:57:07.310660: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): Tesla P100-DGXS-16GB, Compute Capability 6.0

2020-08-27 00:57:07.310681: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): Tesla P100-DGXS-16GB, Compute Capability 6.0

2020-08-27 00:57:07.310711: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): Tesla P100-DGXS-16GB, Compute Capability 6.0

2020-08-27 00:57:07.315351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2020-08-27 00:57:07.317501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2020-08-27 00:57:07.319304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 2 with properties:

pciBusID: 0000:0e:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2020-08-27 00:57:07.320892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 3 with properties:

pciBusID: 0000:0f:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2020-08-27 00:57:07.321014: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1

2020-08-27 00:57:07.321052: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10

2020-08-27 00:57:07.321083: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10

2020-08-27 00:57:07.321113: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10

2020-08-27 00:57:07.321141: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10

2020-08-27 00:57:07.321169: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10

2020-08-27 00:57:07.321198: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7

2020-08-27 00:57:07.329782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1, 2, 3

2020-08-27 00:57:07.329849: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1

2020-08-27 00:57:07.335064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:

2020-08-27 00:57:07.335098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 1 2 3

2020-08-27 00:57:07.335111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N Y Y Y

2020-08-27 00:57:07.335142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 1:   Y N Y Y

2020-08-27 00:57:07.335153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 2:   Y Y N Y

2020-08-27 00:57:07.335161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 3:   Y Y Y N

2020-08-27 00:57:07.340168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory) -> physical GPU (device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0)

2020-08-27 00:57:07.341703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15212 MB memory) -> physical GPU (device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0)

2020-08-27 00:57:07.343167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15212 MB memory) -> physical GPU (device: 2, name: Tesla P100-DGXS-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)

2020-08-27 00:57:07.344638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15212 MB memory) -> physical GPU (device: 3, name: Tesla P100-DGXS-16GB, pci bus id: 0000:0f:00.0, compute capability: 6.0)

=============================== warnings summary ===============================

/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12

/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

import pandas.util.testing
tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:77: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.

warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39315 instead

http_address["port"], self.http_server.port
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29400 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30828 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30100 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30464 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30352 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30856 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 60480 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_workflow.py::test_chaining_3

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:748: UserWarning: part_mem_fraction is ignored for DataFrame input.

warnings.warn("part_mem_fraction is ignored for DataFrame input.")
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.4-final-0 -----------

Name                             Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                6      0      0      0   100%

nvtabular/categorify.py            269     49    150     31    78%   69->70, 70-72, 74->75, 75, 76->77, 77, 95->98, 98, 109->110, 110, 116->118, 141->142, 142-143, 145->146, 146-147, 149->150, 150-166, 168->172, 172, 176->177, 177, 178->179, 179, 186->187, 187, 190->192, 192->193, 193, 196->200, 200-203, 213->214, 214, 216->218, 220->237, 237-240, 263->264, 264, 267->268, 268, 269->270, 270, 277->278, 278, 279->282, 282, 386->387, 387, 388->389, 389, 410->425, 450->455, 453->454, 454, 464->461, 469->461

nvtabular/column_similarity.py      88     21     28      4    70%   170-171, 180-182, 190-206, 221->231, 223->226, 226->227, 227, 236->237, 237

nvtabular/io.py                    569     41    226     32    91%   73->74, 74, 78->81, 81, 86-88, 101->102, 102, 104->105, 105, 112->113, 113, 129->130, 130, 134->136, 136->132, 140->141, 141, 142->143, 143, 151->156, 162->163, 163-164, 182->183, 183, 195->198, 198, 213->214, 214, 230, 247, 271->272, 272, 310, 313, 374->375, 375, 396-398, 479->481, 526->549, 553, 685->688, 745->746, 746, 758->759, 759, 767->768, 768, 776->788, 781->786, 786-788, 863->864, 864, 961->963, 963-965, 973->975, 975, 1001->1002, 1002, 1055->1056, 1056, 1079->1080, 1080, 1085, 1100->1101, 1101

nvtabular/loader/init.py         0      0      0      0   100%

nvtabular/loader/backend.py        188      6     60      6    95%   71->72, 72, 115->116, 116, 135->136, 136, 158, 233->235, 248->249, 249, 253->254, 254

nvtabular/loader/tensorflow.py     108     35     46     11    64%   37->38, 38-39, 49->50, 50, 57->58, 58-61, 70->71, 71, 74->75, 75, 76->81, 81, 240-251, 257->258, 258, 277->278, 278, 285->286, 286, 287->290, 290, 295->296, 296, 304-306, 309-311, 319, 322-330

nvtabular/loader/tf_utils.py        51     24     20      5    45%   13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48, 60-65, 75-88

nvtabular/loader/torch.py           33      0      4      0   100%

nvtabular/ops.py                   547     34    166     31    90%   54->53, 56->57, 57, 82-86, 109->111, 128->129, 129, 223, 281, 331, 356->357, 357, 365->366, 366, 370->372, 372->373, 373, 426->427, 427, 442->443, 443-445, 446->449, 449, 495->496, 496, 503->502, 536->537, 537, 546->548, 548-549, 585->586, 586, 615->616, 616, 718->719, 719, 743, 937->938, 938, 939->940, 940, 954->957, 957, 970->974, 1107->1108, 1108, 1116->1121, 1121, 1131->1132, 1132, 1174->1175, 1175, 1216->1217, 1217, 1220->1226, 1285->1286, 1286, 1287->1288, 1288, 1324->1325, 1325

nvtabular/worker.py                 65      1     30      2    97%   80->92, 118->121, 121

nvtabular/workflow.py              415     47    230     22    87%   96->100, 100, 106->107, 107-111, 141->exit, 157->exit, 173->exit, 189->exit, 242->244, 292->293, 293, 372->375, 375, 396-411, 473->474, 474, 492->494, 494-503, 514->513, 563->568, 568, 571->572, 572, 607->608, 608, 655->646, 721->732, 732, 755-785, 813->814, 814, 827->830, 860->861, 861-863, 867->868, 868, 901->902, 902

setup.py                             2      2      0      0     0%   18-20
TOTAL                             2341    260    960    144    86%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 85.76%

=========================== short test summary info ============================

FAILED tests/unit/test_notebooks.py::test_rossman_example - subprocess.Called...

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.01]

====== 2 failed, 417 passed, 1 skipped, 20 warnings in 432.94s (0:07:12) =======

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate

cd /var/jenkins_home/

python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins7684354513045421438.sh

benfred · 2020-08-27T03:29:32Z

examples/rossmann-store-sales-example.ipynb

   "metadata": {},
   "outputs": [],
   "source": [
-    "DATA_DIR = os.environ.get('INPUT_DATA_DIR', './data')\n",
-    "OUTPUT_DATA_DIR = os.environ.get('INPUT_DATA_DIR', './data')\n",
+    "DATA_DIR = os.environ.get(\"OUTPUT_DATA_DIR\", \"./data\")\n",


This probably should be INPUT_DATA_DIR, since it's also used for getting reading the dataset.

Right now the unittests are failling because of this in tests/unit/test_notebooks.py::test_rossman_example

Traceback (most recent call last): File "/tmp/pytest-of-jenkins/pytest-20/test_rossman_example0/notebook.py", line 59, in <module> proc.apply(train_dataset, record_stats=True, output_path=PREPROCESS_DIR_TRAIN, shuffle=nvt.io.Shuffle.PER_WORKER, out_files_per_proc=2) File "/var/jenkins_home/nvtabular/nvtabular/workflow.py", line 729, in apply num_io_threads=num_io_threads, File "/var/jenkins_home/nvtabular/nvtabular/workflow.py", line 829, in build_and_process_graph self.exec_phase(idx, record_stats=record_stats) File "/var/jenkins_home/nvtabular/nvtabular/workflow.py", line 612, in exec_phase self._aggregated_dask_transform(transforms) File "/var/jenkins_home/nvtabular/nvtabular/workflow.py", line 587, in _aggregated_dask_transform ddf = self.get_ddf() File "/var/jenkins_home/nvtabular/nvtabular/workflow.py", line 575, in get_ddf return self.ddf.to_ddf(columns=columns, shuffle=self._shuffle_parts) File "/var/jenkins_home/nvtabular/nvtabular/io.py", line 809, in to_ddf ddf = self.engine.to_ddf(columns=columns) File "/var/jenkins_home/nvtabular/nvtabular/io.py", line 1060, in to_ddf return dask_cudf.read_csv(self.paths, chunksize=self.part_size, **self.csv_kwargs)[ File "/opt/conda/lib/python3.7/site-packages/dask_cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7.egg/dask_cudf/io/csv.py", line 19, in read_csv return _internal_read_csv(path=path, chunksize=chunksize, **kwargs) File "/opt/conda/lib/python3.7/site-packages/dask_cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7.egg/dask_cudf/io/csv.py", line 59, in _internal_read_csv meta = dask_reader(filenames[0], **kwargs)._meta File "/opt/conda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 649, in read **kwargs, File "/opt/conda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 479, in read_pandas **(storage_options or {}), File "/opt/conda/lib/python3.7/site-packages/dask/bytes/core.py", line 125, in read_bytes size = fs.info(path)["size"] File "/opt/conda/lib/python3.7/site-packages/fsspec/implementations/local.py", line 60, in info out = os.stat(path, follow_symlinks=False) FileNotFoundError: [Errno 2] No such file or directory: '/var/jenkins_home/nvtabular/data/train.csv'

Suggested change

"DATA_DIR = os.environ.get(\"OUTPUT_DATA_DIR\", \"./data\")\n",

"DATA_DIR = os.environ.get(\"INPUT_DATA_DIR\", \"./data\")\n",

Ok I think I'm confused then, because the preprocessing notebook uses INPUT_DATA_DIR to read the original data, but then places the preprocessed data with the extra features in OUTPUT_DATA_DIR, which is where train.csv should live. Unless we're not using the preprocessing notebook first in this test? When I run the preproc notebook followed by the example notebook locally, this works.

nvidia-merlin-bot · 2020-08-27T16:59:00Z

Click to view CI Results

GitHub pull request #224 of commit 32f15f8605bd770b66f8311c7de28504f26d12dd, no merge conflicts.
Running as SYSTEM
Setting status of 32f15f8605bd770b66f8311c7de28504f26d12dd to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/685/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 32f15f8605bd770b66f8311c7de28504f26d12dd^{commit} # timeout=10
Checking out Revision 32f15f8605bd770b66f8311c7de28504f26d12dd (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 32f15f8605bd770b66f8311c7de28504f26d12dd # timeout=10
Commit message: "Merge branch 'tfasync' of github.com:alecgunny/NVTabular into tfasync"
 > git rev-list --no-walk 0d354244b8c3f516737bf881bf4b78b8e002ebfb # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins8758181856450508653.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: hypothesis-5.28.0, forked-1.3.0, xdist-2.1.0, cov-2.10.1
collected 419 items / 1 skipped / 418 selected
tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [ 11%]

..........                                                               [ 14%]

tests/unit/test_io.py .................................................. [ 26%]

............................                                             [ 32%]

tests/unit/test_notebooks.py ....                                        [ 33%]

tests/unit/test_ops.py ................................................. [ 45%]

........................................................................ [ 62%]

........................                                                 [ 68%]

tests/unit/test_tf_dataloader.py F...........                            [ 71%]

tests/unit/test_torch_dataloader.py .....................                [ 76%]

tests/unit/test_workflow.py ............................................ [ 86%]

.......................................................                  [100%]
=================================== FAILURES ===================================

_____________________ test_tf_gpu_dl[True-1-parquet-0.01] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_tf_gpu_dl_True_1_parquet_0')

paths = ['/tmp/pytest-of-jenkins/pytest-3/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-3/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7fe9eabbc550>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceLoader(
        paths if use_paths else dataset,
        cat_names=cat_names,
        cont_names=cont_names,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_names=label_name,
        engine=engine,
        shuffle=False,
    )
    processor.update_stats(dataset)
    data_itr.map(processor)

    rows = 0
    dont_iter = False
    for idx in range(len(data_itr)):


      X, y = next(data_itr)


tests/unit/test_tf_dataloader.py:65:

nvtabular/loader/backend.py:267: in next

return self._get_next_batch()

nvtabular/loader/backend.py:294: in _get_next_batch

self._fetch_chunk()

nvtabular/loader/backend.py:273: in _fetch_chunk

raise chunks

nvtabular/loader/backend.py:124: in load_chunks

chunks = dataloader._create_tensors(chunks)

nvtabular/loader/backend.py:377: in _create_tensors

conts = self._to_tensor(gdf_conts)

nvtabular/loader/tensorflow.py:266: in _to_tensor

dlpack = gdf.values.T.toDlpack()

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:907: in values

return cupy.asarray(self.as_gpu_matrix())

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:3208: in as_gpu_matrix

matrix[:, colidx] = dense

cupy/core/core.pyx:1248: in cupy.core.core.ndarray.setitem

???

cupy/core/_routines_indexing.pyx:49: in cupy.core._routines_indexing._ndarray_setitem

???

cupy/core/_routines_indexing.pyx:801: in cupy.core._routines_indexing._scatter_op

???

cupy/core/core.pyx:517: in cupy.core.core.ndarray.fill

???

cupy/core/_kernel.pyx:605: in cupy.core._kernel.ElementwiseKernel.call

???


???

E   ValueError: Array device must be same as the current device: array device = 3 while current = 0

cupy/core/_kernel.pyx:95: ValueError

----------------------------- Captured stderr call -----------------------------

2020-08-27 16:54:40.420477: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2020-08-27 16:54:40.455273: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3198080000 Hz

2020-08-27 16:54:40.456170: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe99825c040 initialized for platform Host (this does not guarantee that XLA will be used). Devices:

2020-08-27 16:54:40.456198: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version

2020-08-27 16:54:40.772611: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe9982c7bd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:

2020-08-27 16:54:40.772681: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla P100-DGXS-16GB, Compute Capability 6.0

2020-08-27 16:54:40.772704: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): Tesla P100-DGXS-16GB, Compute Capability 6.0

2020-08-27 16:54:40.772722: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): Tesla P100-DGXS-16GB, Compute Capability 6.0

2020-08-27 16:54:40.772740: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): Tesla P100-DGXS-16GB, Compute Capability 6.0

2020-08-27 16:54:40.777045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2020-08-27 16:54:40.779733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2020-08-27 16:54:40.782176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 2 with properties:

pciBusID: 0000:0e:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2020-08-27 16:54:40.783923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 3 with properties:

pciBusID: 0000:0f:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2020-08-27 16:54:40.784012: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1

2020-08-27 16:54:40.784036: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10

2020-08-27 16:54:40.784056: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10

2020-08-27 16:54:40.784075: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10

2020-08-27 16:54:40.784093: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10

2020-08-27 16:54:40.784110: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10

2020-08-27 16:54:40.784129: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7

2020-08-27 16:54:40.792073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1, 2, 3

2020-08-27 16:54:40.792126: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1

2020-08-27 16:54:40.797814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:

2020-08-27 16:54:40.797835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 1 2 3

2020-08-27 16:54:40.797846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N Y Y Y

2020-08-27 16:54:40.797878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 1:   Y N Y Y

2020-08-27 16:54:40.797889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 2:   Y Y N Y

2020-08-27 16:54:40.797896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 3:   Y Y Y N

2020-08-27 16:54:40.802905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory) -> physical GPU (device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0)

2020-08-27 16:54:40.804329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15212 MB memory) -> physical GPU (device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0)

2020-08-27 16:54:40.805955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15212 MB memory) -> physical GPU (device: 2, name: Tesla P100-DGXS-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)

2020-08-27 16:54:40.807355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15212 MB memory) -> physical GPU (device: 3, name: Tesla P100-DGXS-16GB, pci bus id: 0000:0f:00.0, compute capability: 6.0)

=============================== warnings summary ===============================

/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12

/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

import pandas.util.testing
tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:77: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.

warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43211 instead

http_address["port"], self.http_server.port
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29148 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30212 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29680 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29876 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 31472 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29960 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 60480 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_workflow.py::test_chaining_3

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:748: UserWarning: part_mem_fraction is ignored for DataFrame input.

warnings.warn("part_mem_fraction is ignored for DataFrame input.")
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.4-final-0 -----------

Name                             Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                6      0      0      0   100%

nvtabular/categorify.py            269     49    150     31    78%   69->70, 70-72, 74->75, 75, 76->77, 77, 95->98, 98, 109->110, 110, 116->118, 141->142, 142-143, 145->146, 146-147, 149->150, 150-166, 168->172, 172, 176->177, 177, 178->179, 179, 186->187, 187, 190->192, 192->193, 193, 196->200, 200-203, 213->214, 214, 216->218, 220->237, 237-240, 263->264, 264, 267->268, 268, 269->270, 270, 277->278, 278, 279->282, 282, 386->387, 387, 388->389, 389, 410->425, 450->455, 453->454, 454, 464->461, 469->461

nvtabular/column_similarity.py      88     21     28      4    70%   170-171, 180-182, 190-206, 221->231, 223->226, 226->227, 227, 236->237, 237

nvtabular/io.py                    569     41    226     32    91%   73->74, 74, 78->81, 81, 86-88, 101->102, 102, 104->105, 105, 112->113, 113, 129->130, 130, 134->136, 136->132, 140->141, 141, 142->143, 143, 151->156, 162->163, 163-164, 182->183, 183, 195->198, 198, 213->214, 214, 230, 247, 271->272, 272, 310, 313, 374->375, 375, 396-398, 479->481, 526->549, 553, 685->688, 745->746, 746, 758->759, 759, 767->768, 768, 776->788, 781->786, 786-788, 863->864, 864, 961->963, 963-965, 973->975, 975, 1001->1002, 1002, 1055->1056, 1056, 1079->1080, 1080, 1085, 1100->1101, 1101

nvtabular/loader/init.py         0      0      0      0   100%

nvtabular/loader/backend.py        188      4     60      4    97%   71->72, 72, 135->136, 136, 158, 233->235, 248->249, 249

nvtabular/loader/tensorflow.py     108     16     46     10    82%   37->38, 38-39, 49->50, 50, 57->58, 58-61, 70->71, 71, 76->81, 81, 242-251, 257->258, 258, 277->278, 278, 285->286, 286, 287->290, 290, 295->296, 296

nvtabular/loader/tf_utils.py        51      7     20      5    83%   13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48

nvtabular/loader/torch.py           33      0      4      0   100%

nvtabular/ops.py                   547     34    166     31    90%   54->53, 56->57, 57, 82-86, 109->111, 128->129, 129, 223, 281, 331, 356->357, 357, 365->366, 366, 370->372, 372->373, 373, 426->427, 427, 442->443, 443-445, 446->449, 449, 495->496, 496, 503->502, 536->537, 537, 546->548, 548-549, 585->586, 586, 615->616, 616, 718->719, 719, 743, 937->938, 938, 939->940, 940, 954->957, 957, 970->974, 1107->1108, 1108, 1116->1121, 1121, 1131->1132, 1132, 1174->1175, 1175, 1216->1217, 1217, 1220->1226, 1285->1286, 1286, 1287->1288, 1288, 1324->1325, 1325

nvtabular/worker.py                 65      1     30      2    97%   80->92, 118->121, 121

nvtabular/workflow.py              415     38    230     24    89%   96->100, 100, 106->107, 107-111, 141->exit, 157->exit, 173->exit, 189->exit, 242->244, 292->293, 293, 372->375, 375, 400->401, 401, 407->410, 410, 473->474, 474, 492->494, 494-503, 514->513, 563->568, 568, 571->572, 572, 607->608, 608, 655->646, 721->732, 732, 755-785, 813->814, 814, 827->830, 860->861, 861-863, 867->868, 868, 901->902, 902

setup.py                             2      2      0      0     0%   18-20
TOTAL                             2341    213    960    143    88%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 88.00%

=========================== short test summary info ============================

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.01]

====== 1 failed, 418 passed, 1 skipped, 20 warnings in 470.77s (0:07:50) =======

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate

cd /var/jenkins_home/

python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins2718176934798198774.sh

nvidia-merlin-bot · 2020-08-27T22:47:34Z

Click to view CI Results

GitHub pull request #224 of commit 372520b35b46416e02ec52a45dc9dcaf731f29d8, no merge conflicts.
Running as SYSTEM
Setting status of 372520b35b46416e02ec52a45dc9dcaf731f29d8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/692/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/224/*:refs/remotes/origin/pr/224/* # timeout=10
 > git rev-parse 372520b35b46416e02ec52a45dc9dcaf731f29d8^{commit} # timeout=10
Checking out Revision 372520b35b46416e02ec52a45dc9dcaf731f29d8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 372520b35b46416e02ec52a45dc9dcaf731f29d8 # timeout=10
Commit message: "Fix cupy device errors"
 > git rev-list --no-walk a4468067a7a72c821fda94573dc264e26454ee40 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins2550838868390358978.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
31 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: hypothesis-5.28.0, forked-1.3.0, xdist-2.1.0, cov-2.10.1
collected 419 items / 1 skipped / 418 selected
tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [ 11%]

..........                                                               [ 14%]

tests/unit/test_io.py .................................................. [ 26%]

............................                                             [ 32%]

tests/unit/test_notebooks.py ....                                        [ 33%]

tests/unit/test_ops.py ................................................. [ 45%]

........................................................................ [ 62%]

........................                                                 [ 68%]

tests/unit/test_tf_dataloader.py ............                            [ 71%]

tests/unit/test_torch_dataloader.py .....................                [ 76%]

tests/unit/test_workflow.py ............................................ [ 86%]

.......................................................                  [100%]
=============================== warnings summary ===============================

/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12

/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

import pandas.util.testing
tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:77: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.

warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45061 instead

http_address["port"], self.http_server.port
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+4964.g4b6b7c0fd.dirty-py3.7-linux-x86_64.egg/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30380 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 31444 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29932 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30128 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 29344 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 30296 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:937: UserWarning: Row group size 60480 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_workflow.py::test_chaining_3

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io.py:748: UserWarning: part_mem_fraction is ignored for DataFrame input.

warnings.warn("part_mem_fraction is ignored for DataFrame input.")
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.4-final-0 -----------

Name                             Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                6      0      0      0   100%

nvtabular/categorify.py            269     49    150     31    78%   69->70, 70-72, 74->75, 75, 76->77, 77, 95->98, 98, 109->110, 110, 116->118, 141->142, 142-143, 145->146, 146-147, 149->150, 150-166, 168->172, 172, 176->177, 177, 178->179, 179, 186->187, 187, 190->192, 192->193, 193, 196->200, 200-203, 213->214, 214, 216->218, 220->237, 237-240, 263->264, 264, 267->268, 268, 269->270, 270, 277->278, 278, 279->282, 282, 386->387, 387, 388->389, 389, 410->425, 450->455, 453->454, 454, 464->461, 469->461

nvtabular/column_similarity.py      88     21     28      4    70%   170-171, 180-182, 190-206, 221->231, 223->226, 226->227, 227, 236->237, 237

nvtabular/io.py                    569     41    226     32    91%   73->74, 74, 78->81, 81, 86-88, 101->102, 102, 104->105, 105, 112->113, 113, 129->130, 130, 134->136, 136->132, 140->141, 141, 142->143, 143, 151->156, 162->163, 163-164, 182->183, 183, 195->198, 198, 213->214, 214, 230, 247, 271->272, 272, 310, 313, 374->375, 375, 396-398, 479->481, 526->549, 553, 685->688, 745->746, 746, 758->759, 759, 767->768, 768, 776->788, 781->786, 786-788, 863->864, 864, 961->963, 963-965, 973->975, 975, 1001->1002, 1002, 1055->1056, 1056, 1079->1080, 1080, 1085, 1100->1101, 1101

nvtabular/loader/init.py         0      0      0      0   100%

nvtabular/loader/backend.py        188      8     60      5    95%   71->72, 72, 135->136, 136, 146-147, 158, 233->235, 248->249, 249, 271->272, 272-273

nvtabular/loader/tensorflow.py     112     16     46     10    82%   39->40, 40-41, 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 244-253, 264->265, 265, 284->285, 285, 292->293, 293, 294->297, 297, 302->303, 303

nvtabular/loader/tf_utils.py        51      7     20      5    83%   13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48

nvtabular/loader/torch.py           33      0      4      0   100%

nvtabular/ops.py                   547     34    166     31    90%   54->53, 56->57, 57, 82-86, 109->111, 128->129, 129, 223, 281, 331, 356->357, 357, 365->366, 366, 370->372, 372->373, 373, 426->427, 427, 442->443, 443-445, 446->449, 449, 495->496, 496, 503->502, 536->537, 537, 546->548, 548-549, 585->586, 586, 615->616, 616, 718->719, 719, 743, 937->938, 938, 939->940, 940, 954->957, 957, 970->974, 1107->1108, 1108, 1116->1121, 1121, 1131->1132, 1132, 1174->1175, 1175, 1216->1217, 1217, 1220->1226, 1285->1286, 1286, 1287->1288, 1288, 1324->1325, 1325

nvtabular/worker.py                 65      1     30      2    97%   80->92, 118->121, 121

nvtabular/workflow.py              415     38    230     24    89%   96->100, 100, 106->107, 107-111, 141->exit, 157->exit, 173->exit, 189->exit, 242->244, 292->293, 293, 372->375, 375, 400->401, 401, 407->410, 410, 473->474, 474, 492->494, 494-503, 514->513, 563->568, 568, 571->572, 572, 607->608, 608, 655->646, 721->732, 732, 755-785, 813->814, 814, 827->830, 860->861, 861-863, 867->868, 868, 901->902, 902

setup.py                             2      2      0      0     0%   18-20
TOTAL                             2345    217    960    144    88%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 87.87%

=========== 419 passed, 1 skipped, 20 warnings in 463.53s (0:07:43) ============

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate

cd /var/jenkins_home/

python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins6204768901711015414.sh

…loader (#224) * adding tensorflow example stuff * getting workflow working * training of both workflows works * notebook updates and addding image from run * updating workflow for nightly tf build * Create dummy.txt * Add files via upload * Delete dummy.txt * adding tensorflow example stuff * getting workflow working * training of both workflows works * notebook updates and addding image from run * adding root Dockerfile * updating root build for 2.3 rc1 * updating Dockerfile for tf 2.3-rc1 and filling out notebook * updating throughput curves in README * moving dlrm-train * cleaning up notebook and layers code, adding cupti symlink to Dockerfile * getting rid of modprobe install in Dockerfile * playing with requirements * updating for tf 2.3 full release * updating notebook * removing old Dockerfiles, updating environment and README and finishing example notebook * removing old images * consolidating data loading code * cleaning up and blackening * finished separating loader code * adding fixed Dockerfile * getting tf data loading running * blackening * fixing bug in torch loader * applying isort fixes * isort fixes * ironing out data loaders * creating parent dataloader class * playing with thread safe iteration * small change * moving tensoritr loop into asynciterator * fixing syntax error * debugging iter issues * fixing generator issues * cleaning up backend code * got torch data loader working * working out tf missing gradient issues * working on gradient issues * reformatting loader backend to use only 2 classes * undoing changes really quick * backend changes * getting tf dataloader working * trying out tensor y * tf data loader working * undoing some testing changes to Tensorflow * rerunning tf example for checks * updating tests * blackening * blackening * fixing dataloader bench bug * fixing unused variables * isort fixes * adding qsize to chunkedbuffer * fixed typo in backed * simplifying and updating DataLoader * updating dataloader backend * trying new async scheme * got new implementation working * cleaning up * blackening * fixing bugs * updating wait time * isort fixes * minor aesthetic change * merging upstream changes * bug fixes * trying to update examples * adding custom validation callback * got examples working * blackening * fixing bug and documenting * gettin criteo most of the way through * rearranging and adding checks * adding proper torch documentation * documenting and blackening * remove trailing whitespace * updating tests * changing cat and cont defaults to empty lists and including checks * updating TF example notebook * adding PARTS_PER_CHUNK to criteo example * adding tf config changes * fixing tf unit tests * blackening * fixed tf_util bug * fixing tf_utils bug * blackening * blackening * fixing bug in loader backend * tests passing * blackening * updating rossmann notebook test * Fix cupy device errors Co-authored-by: Alec Gunny <agunny@nvidia.com> Co-authored-by: Ben Frederickson <github@benfrederickson.com>

Alec Gunny and others added 26 commits August 17, 2020 17:06

adding tensorflow example stuff

c65c269

getting workflow working

041b083

training of both workflows works

54eebaa

notebook updates and addding image from run

3358cb2

updating workflow for nightly tf build

5e27b76

Create dummy.txt

5ac8ab1

Add files via upload

b899185

Delete dummy.txt

dc315aa

adding tensorflow example stuff

79ba66f

getting workflow working

5ba7093

training of both workflows works

ecc50e7

notebook updates and addding image from run

5952ae5

adding root Dockerfile

a48bebd

updating root build for 2.3 rc1

f8ac57c

updating Dockerfile for tf 2.3-rc1 and filling out notebook

7960ac1

updating throughput curves in README

d8c4b3f

moving dlrm-train

163db09

cleaning up notebook and layers code, adding cupti symlink to Dockerfile

7b879e8

getting rid of modprobe install in Dockerfile

7e878e3

playing with requirements

470b2ed

updating for tf 2.3 full release

3440d9e

updating notebook

3bf0c2c

removing old Dockerfiles, updating environment and README and finishi…

9f3df35

…ng example notebook

removing old images

11833f9

consolidating data loading code

21bc8d3

cleaning up and blackening

59e5a45

finished separating loader code

16f9d75

adding fixed Dockerfile

803aea5

fixing tf unit tests

716825d

blackening

056155e

fixed tf_util bug

9c82df5

fixing tf_utils bug

933b719

blackening

3db05b0

blackening

9458a8d

fixing bug in loader backend

6f543f9

tests passing

1042d96

blackening

05b9a25

benfred reviewed Aug 27, 2020

View reviewed changes

Alec Gunny added 2 commits August 27, 2020 09:48

updating rossmann notebook test

3b974e0

Merge branch 'tfasync' of github.com:alecgunny/NVTabular into tfasync

32f15f8

Fix cupy device errors

372520b

benfred approved these changes Aug 27, 2020

View reviewed changes

alecgunny merged commit 7f9c780 into NVIDIA-Merlin:main Aug 27, 2020

benfred mentioned this pull request Aug 28, 2020

[FEA] Asynchronous TensorFlow DataLoader #218

Closed

karlhigley mentioned this pull request Apr 4, 2023

[RMP] Set up unit tests for Merlin example notebooks NVIDIA-Merlin/Merlin#205

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Creating dedicated `loader` submodule to build TF async dataloader #224

[REVIEW] Creating dedicated `loader` submodule to build TF async dataloader #224

alecgunny commented Aug 18, 2020

nvidia-merlin-bot commented Aug 18, 2020

nvidia-merlin-bot commented Aug 18, 2020

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 26, 2020

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 27, 2020

nvidia-merlin-bot commented Aug 27, 2020

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

benfred Aug 27, 2020

alecgunny Aug 27, 2020

nvidia-merlin-bot commented Aug 27, 2020

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Aug 27, 2020

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

	"DATA_DIR = os.environ.get(\"OUTPUT_DATA_DIR\", \"./data\")\n",
	"DATA_DIR = os.environ.get(\"INPUT_DATA_DIR\", \"./data\")\n",

[REVIEW] Creating dedicated loader submodule to build TF async dataloader #224

[REVIEW] Creating dedicated loader submodule to build TF async dataloader #224

Conversation

alecgunny commented Aug 18, 2020

nvidia-merlin-bot commented Aug 18, 2020

nvidia-merlin-bot commented Aug 18, 2020

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 26, 2020

----------- coverage: platform linux, python 3.7.4-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Aug 26, 2020

nvidia-merlin-bot commented Aug 27, 2020

nvidia-merlin-bot commented Aug 27, 2020

----------- coverage: platform linux, python 3.7.4-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred Aug 27, 2020

Choose a reason for hiding this comment

alecgunny Aug 27, 2020

Choose a reason for hiding this comment

nvidia-merlin-bot commented Aug 27, 2020

----------- coverage: platform linux, python 3.7.4-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Aug 27, 2020

----------- coverage: platform linux, python 3.7.4-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

[REVIEW] Creating dedicated `loader` submodule to build TF async dataloader #224

[REVIEW] Creating dedicated `loader` submodule to build TF async dataloader #224

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing