Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data inspect #521

Merged
merged 21 commits into from
Jan 26, 2021
Merged

Data inspect #521

merged 21 commits into from
Jan 26, 2021

Conversation

albert17
Copy link
Contributor

@albert17 albert17 commented Jan 8, 2021

Opening because #510 was closed after new_api branch was merged into main.

@jperez999 I applied the feedback you told me, and I also added unit test. I have seen your data generation branch, I can follow your packaging style when you get that PR Merged.

Pending for the future:

  1. Fix dask-cudf dtypes for lists: We need cudf support, there is an issue created ([FEA] Add list len support rapidsai/cudf#7157)
  2. When list are supported, test them in the unit tests. @benfred can I modify the testing dataset to add one column that is a list?

@albert17 albert17 linked an issue Jan 8, 2021 that may be closed by this pull request
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 3e29ef32bd211f40bca067ded04651caa09409ef, no merge conflicts.
Running as SYSTEM
Setting status of 3e29ef32bd211f40bca067ded04651caa09409ef to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1449/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 3e29ef32bd211f40bca067ded04651caa09409ef^{commit} # timeout=10
Checking out Revision 3e29ef32bd211f40bca067ded04651caa09409ef (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3e29ef32bd211f40bca067ded04651caa09409ef # timeout=10
Commit message: "Adds unit test"
 > git rev-list --no-walk bc6dc7c51f0acd5514888b6be647907efde89d10 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins3274625358681975311.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
81 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 537 items

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 9%]
........ [ 10%]
tests/unit/test_io.py .................................................. [ 20%]
........................................ssssssss [ 29%]
tests/unit/test_notebooks.py .... [ 29%]
tests/unit/test_ops.py ................................................. [ 38%]
........................................................................ [ 52%]
.................................... [ 59%]
tests/unit/test_s3.py .. [ 59%]
tests/unit/test_tf_dataloader.py ................... [ 62%]
tests/unit/test_tf_layers.py ........................................... [ 70%]
................................... [ 77%]
tests/unit/test_tools.py FF [ 77%]
tests/unit/test_torch_dataloader.py .............................. [ 83%]
tests/unit/test_workflow.py ............................................ [ 91%]
............................................. [100%]

=================================== FAILURES ===================================
______________________________ test_inspect[csv] _______________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_inspect_csv_0')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-1/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-1/csv0'), 'csv-no... local('/tmp/pytest-of-jenkins/pytest-1/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-1/parquet0')}
engine = 'csv'

@pytest.mark.parametrize("engine", ["csv", "parquet"])
def test_inspect(tmpdir, datasets, engine):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    output_file = tmpdir + "/dataset_info.json"

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["cats_mh"] = []
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]
    all_cols = (
        columns_dict["cats"]
        + columns_dict["cats_mh"]
        + columns_dict["conts"]
        + columns_dict["labels"]
    )

    # Create inspector and inspect
  a = nvt.tools.DatasetInspector()

E AttributeError: module 'nvtabular' has no attribute 'tools'

tests/unit/test_tools.py:34: AttributeError
____________________________ test_inspect[parquet] _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_inspect_parquet_0')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-1/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-1/csv0'), 'csv-no... local('/tmp/pytest-of-jenkins/pytest-1/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-1/parquet0')}
engine = 'parquet'

@pytest.mark.parametrize("engine", ["csv", "parquet"])
def test_inspect(tmpdir, datasets, engine):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    output_file = tmpdir + "/dataset_info.json"

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["cats_mh"] = []
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]
    all_cols = (
        columns_dict["cats"]
        + columns_dict["cats_mh"]
        + columns_dict["conts"]
        + columns_dict["labels"]
    )

    # Create inspector and inspect
  a = nvt.tools.DatasetInspector()

E AttributeError: module 'nvtabular' has no attribute 'tools'

tests/unit/test_tools.py:34: AttributeError
=============================== warnings summary ===============================
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

tests/unit/test_column_similarity.py::test_column_similarity[tfidf-True]
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_column_similarity.py::test_column_similarity[tfidf-True]
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43315 instead
http_address["port"], self.http_server.port

tests/unit/test_io.py: 5 warnings
tests/unit/test_tf_dataloader.py: 24 warnings
tests/unit/test_torch_dataloader.py: 6 warnings
tests/unit/test_workflow.py: 2 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39161 instead
http_address["port"], self.http_server.port

tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45071 instead
http_address["port"], self.http_server.port

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 133 18 70 6 84% 66->67, 67, 77->78, 78, 81->83, 107->108, 108, 131-144, 168->171, 171, 255->258, 258
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 13-17, 54-288
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 48->49, 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 22-23, 26-45, 56-69, 72
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 1 12 1 95% 46->47, 47
nvtabular/framework_utils/torch/models.py 38 0 22 0 100%
nvtabular/framework_utils/torch/utils.py 31 4 10 2 85% 51->52, 52, 55->56, 56-58
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 78 78 26 0 0% 16-175
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 82 3 34 7 91% 129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181
nvtabular/io/dataframe_engine.py 12 1 4 1 88% 31->32, 32
nvtabular/io/dataset.py 134 18 56 10 84% 196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516
nvtabular/io/dataset_engine.py 13 0 0 0 100%
nvtabular/io/hugectr.py 45 2 22 2 91% 27->32, 32, 72->95, 99
nvtabular/io/parquet.py 124 2 40 2 98% 54->55, 55-63, 187->189
nvtabular/io/shuffle.py 25 7 10 2 63% 37->40, 38->39, 39-46
nvtabular/io/writer.py 123 9 45 2 92% 30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 260 13 108 7 95% 71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457
nvtabular/loader/tensorflow.py 117 8 52 7 90% 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70
nvtabular/loader/torch.py 41 10 8 0 67% 25-27, 30-36
nvtabular/ops/init.py 18 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 45->46, 46, 47->49, 49-52
nvtabular/ops/categorify.py 463 86 260 50 79% 203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986
nvtabular/ops/clip.py 19 2 6 3 80% 43->44, 44, 52->54, 54->55, 55
nvtabular/ops/column_similarity.py 86 22 32 5 69% 79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 40 4 6 1 89% 75->76, 76, 98, 101, 104
nvtabular/ops/filter.py 21 1 6 1 93% 43->44, 44
nvtabular/ops/hash_bucket.py 31 2 18 2 88% 70->73, 73, 98->102, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51->52, 52, 66->67, 67, 81->82, 81->exit, 82
nvtabular/ops/join_external.py 66 5 28 6 88% 93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165
nvtabular/ops/join_groupby.py 77 5 28 2 93% 99->100, 100, 103->110, 174, 177, 180-181
nvtabular/ops/lambdaop.py 27 4 10 4 78% 60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 62 0 18 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 63->62, 105->107, 107-108, 128, 131-132, 135-136
nvtabular/ops/operator.py 15 1 2 1 88% 22->24, 24
nvtabular/ops/rename.py 18 3 10 3 71% 40->41, 41, 53->54, 54, 55->58, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 12 66 6 90% 139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/dataset_inspector.py 65 65 32 0 0% 17-135
nvtabular/utils.py 27 6 10 5 70% 26->27, 27, 28->31, 31, 37->38, 38, 40->41, 41, 45->47, 47, 53
nvtabular/worker.py 65 1 30 3 96% 69->70, 70, 77->97, 80->92
nvtabular/workflow.py 127 10 72 7 91% 32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284

TOTAL 3324 630 1449 173 78%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 77.85%
=========================== short test summary info ============================
FAILED tests/unit/test_tools.py::test_inspect[csv] - AttributeError: module '...
FAILED tests/unit/test_tools.py::test_inspect[parquet] - AttributeError: modu...
====== 2 failed, 527 passed, 8 skipped, 44 warnings in 419.18s (0:06:59) =======
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins6213806168292701879.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit a3c8722eeb1eac7202aff3170c4f218da3426b51, no merge conflicts.
Running as SYSTEM
Setting status of a3c8722eeb1eac7202aff3170c4f218da3426b51 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1468/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse a3c8722eeb1eac7202aff3170c4f218da3426b51^{commit} # timeout=10
Checking out Revision a3c8722eeb1eac7202aff3170c4f218da3426b51 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a3c8722eeb1eac7202aff3170c4f218da3426b51 # timeout=10
Commit message: "Updates json output and multihot calculation"
 > git rev-list --no-walk ad83c7ae8a5d34adf9d21127946f98ae40795364 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins8836009779784977150.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+15.ga3c8722
    Uninstalling nvtabular-0.3.0+15.ga3c8722:
      Successfully uninstalled nvtabular-0.3.0+15.ga3c8722
  Running setup.py develop for nvtabular
Successfully installed nvtabular
error: cannot format /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/tools/dataset_inspector.py: Cannot parse: 69:16:                 if ddf[col].dtype.leaf_type == "string":
Oh no! 💥 💔 💥
80 files would be left unchanged, 1 file would fail to reformat.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins8797665715078067190.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 2a52415d4f25ba13d8405b1e5496a0751958eb7b, no merge conflicts.
Running as SYSTEM
Setting status of 2a52415d4f25ba13d8405b1e5496a0751958eb7b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1470/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 2a52415d4f25ba13d8405b1e5496a0751958eb7b^{commit} # timeout=10
Checking out Revision 2a52415d4f25ba13d8405b1e5496a0751958eb7b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 2a52415d4f25ba13d8405b1e5496a0751958eb7b # timeout=10
Commit message: "Updates list processing"
 > git rev-list --no-walk fcfd38534871827d001d98c9db23af626749e375 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins5268918968425135953.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+16.g2a52415
    Uninstalling nvtabular-0.3.0+16.g2a52415:
      Successfully uninstalled nvtabular-0.3.0+16.g2a52415
  Running setup.py develop for nvtabular
Successfully installed nvtabular
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/tools/dataset_inspector.py
Oh no! 💥 💔 💥
1 file would be reformatted, 80 files would be left unchanged.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins3675556177946569106.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit e9b87734ff56c66d4fee7f1060b1d9af7764167f, no merge conflicts.
Running as SYSTEM
Setting status of e9b87734ff56c66d4fee7f1060b1d9af7764167f to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1471/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse e9b87734ff56c66d4fee7f1060b1d9af7764167f^{commit} # timeout=10
Checking out Revision e9b87734ff56c66d4fee7f1060b1d9af7764167f (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e9b87734ff56c66d4fee7f1060b1d9af7764167f # timeout=10
Commit message: "Updates test"
 > git rev-list --no-walk 2a52415d4f25ba13d8405b1e5496a0751958eb7b # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins4430635498173288458.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+17.ge9b8773
    Uninstalling nvtabular-0.3.0+17.ge9b8773:
      Successfully uninstalled nvtabular-0.3.0+17.ge9b8773
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
81 files would be left unchanged.
./nvtabular/tools/dataset_inspector.py:19:1: F401 'cudf' imported but unused
./nvtabular/tools/__init__.py:16:1: F401 '.dataset_inspector.DatasetInspector' imported but unused
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins8051002341542512124.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 199e818570a49d20c1f23132e2299aa595b16d2b, no merge conflicts.
Running as SYSTEM
Setting status of 199e818570a49d20c1f23132e2299aa595b16d2b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1483/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 199e818570a49d20c1f23132e2299aa595b16d2b^{commit} # timeout=10
Checking out Revision 199e818570a49d20c1f23132e2299aa595b16d2b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 199e818570a49d20c1f23132e2299aa595b16d2b # timeout=10
Commit message: "Adds cudf issue"
 > git rev-list --no-walk 7ea69ebc5bb4c9e588af5f0f37675b0a5f80ba72 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins3691191528921297468.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+18.g199e818
    Uninstalling nvtabular-0.3.0+18.g199e818:
      Successfully uninstalled nvtabular-0.3.0+18.g199e818
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
81 files would be left unchanged.
./nvtabular/tools/dataset_inspector.py:88:101: E501 line too long (103 > 100 characters)
./nvtabular/tools/dataset_inspector.py:89:101: E501 line too long (107 > 100 characters)
./nvtabular/tools/__init__.py:16:1: F401 '.dataset_inspector.DatasetInspector' imported but unused
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins5340054830216335842.sh

@albert17 albert17 marked this pull request as ready for review January 19, 2021 17:13
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit e3a807a231113f9d5b371b808c49f6f1bd80e98b, no merge conflicts.
Running as SYSTEM
Setting status of e3a807a231113f9d5b371b808c49f6f1bd80e98b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1485/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse e3a807a231113f9d5b371b808c49f6f1bd80e98b^{commit} # timeout=10
Checking out Revision e3a807a231113f9d5b371b808c49f6f1bd80e98b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e3a807a231113f9d5b371b808c49f6f1bd80e98b # timeout=10
Commit message: "Data inspector ready"
 > git rev-list --no-walk b08f8781935a592601320e5beedec1ced0a1e113 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins7470636000492385073.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 9%]
........ [ 10%]
tests/unit/test_datagen.py ..................... [ 14%]
tests/unit/test_io.py .................................................. [ 23%]
........................................ssssssss [ 31%]
tests/unit/test_notebooks.py .... [ 32%]
tests/unit/test_ops.py ................................................. [ 41%]
........................................................................ [ 54%]
.................................... [ 60%]
tests/unit/test_s3.py .. [ 61%]
tests/unit/test_tf_dataloader.py ................... [ 64%]
tests/unit/test_tf_layers.py ........................................... [ 72%]
................................... [ 78%]
tests/unit/test_tools.py FF [ 78%]
tests/unit/test_torch_dataloader.py .............................. [ 84%]
tests/unit/test_workflow.py ............................................ [ 91%]
............................................. [100%]

=================================== FAILURES ===================================
______________________________ test_inspect[csv] _______________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_inspect_csv_0')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-1/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-1/csv0'), 'csv-no... local('/tmp/pytest-of-jenkins/pytest-1/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-1/parquet0')}
engine = 'csv'

@pytest.mark.parametrize("engine", ["csv", "parquet"])
def test_inspect(tmpdir, datasets, engine):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    output_file = tmpdir + "/dataset_info.json"

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]
    all_cols = columns_dict["cats"] + columns_dict["conts"] + columns_dict["labels"]

    # Create inspector and inspect
  a = nvt.tools.DatasetInspector()

E AttributeError: module 'nvtabular.tools' has no attribute 'DatasetInspector'

tests/unit/test_tools.py:28: AttributeError
____________________________ test_inspect[parquet] _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_inspect_parquet_0')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-1/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-1/csv0'), 'csv-no... local('/tmp/pytest-of-jenkins/pytest-1/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-1/parquet0')}
engine = 'parquet'

@pytest.mark.parametrize("engine", ["csv", "parquet"])
def test_inspect(tmpdir, datasets, engine):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    output_file = tmpdir + "/dataset_info.json"

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]
    all_cols = columns_dict["cats"] + columns_dict["conts"] + columns_dict["labels"]

    # Create inspector and inspect
  a = nvt.tools.DatasetInspector()

E AttributeError: module 'nvtabular.tools' has no attribute 'DatasetInspector'

tests/unit/test_tools.py:28: AttributeError
=============================== warnings summary ===============================
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_datagen.py: 1392 warnings
tests/unit/test_io.py: 5 warnings
tests/unit/test_tf_dataloader.py: 24 warnings
tests/unit/test_torch_dataloader.py: 6 warnings
tests/unit/test_workflow.py: 2 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_datagen.py: 696 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
[x.dtype for x in self._data.columns], index=self._data.names

tests/unit/test_datagen.py::test_full_df[None-1000]
tests/unit/test_datagen.py::test_full_df[None-100000]
tests/unit/test_datagen.py::test_full_df[distro1-1000]
tests/unit/test_datagen.py::test_full_df[distro1-100000]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.
warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")

tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42165 instead
http_address["port"], self.http_server.port

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44197 instead
http_address["port"], self.http_server.port

tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45623 instead
http_address["port"], self.http_server.port

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 144 19 80 7 85% 53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 13-17, 54-288
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 48->49, 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 22-23, 26-45, 56-69, 72
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 1 12 1 95% 46->47, 47
nvtabular/framework_utils/torch/models.py 38 0 22 0 100%
nvtabular/framework_utils/torch/utils.py 31 4 10 2 85% 51->52, 52, 55->56, 56-58
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 78 78 26 0 0% 16-175
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 82 3 34 7 91% 129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181
nvtabular/io/dataframe_engine.py 12 1 4 1 88% 31->32, 32
nvtabular/io/dataset.py 134 18 56 10 84% 196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516
nvtabular/io/dataset_engine.py 13 0 0 0 100%
nvtabular/io/hugectr.py 45 2 22 2 91% 27->32, 32, 72->95, 99
nvtabular/io/parquet.py 124 2 40 2 98% 54->55, 55-63, 189->191
nvtabular/io/shuffle.py 25 7 10 2 63% 37->40, 38->39, 39-46
nvtabular/io/writer.py 123 9 45 2 92% 30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 260 13 108 7 95% 71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457
nvtabular/loader/tensorflow.py 117 8 52 7 90% 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70
nvtabular/loader/torch.py 41 10 8 0 67% 25-27, 30-36
nvtabular/ops/init.py 18 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 45->46, 46, 47->49, 49-52
nvtabular/ops/categorify.py 463 86 260 50 79% 203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986
nvtabular/ops/clip.py 19 2 6 3 80% 43->44, 44, 52->54, 54->55, 55
nvtabular/ops/column_similarity.py 86 22 32 5 69% 79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 40 4 6 1 89% 75->76, 76, 98, 101, 104
nvtabular/ops/filter.py 21 1 6 1 93% 43->44, 44
nvtabular/ops/hash_bucket.py 31 2 18 2 88% 70->73, 73, 98->102, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51->52, 52, 66->67, 67, 81->82, 81->exit, 82
nvtabular/ops/join_external.py 66 5 28 6 88% 93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165
nvtabular/ops/join_groupby.py 77 5 28 2 93% 99->100, 100, 103->110, 174, 177, 180-181
nvtabular/ops/lambdaop.py 27 4 10 4 78% 60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 62 0 18 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 63->62, 105->107, 107-108, 128, 131-132, 135-136
nvtabular/ops/operator.py 15 1 2 1 88% 22->24, 24
nvtabular/ops/rename.py 18 3 10 3 71% 40->41, 41, 53->54, 54, 55->58, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 12 66 6 90% 139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 233 1 60 4 98% 170->172, 217->221, 306->309, 307->306, 309
nvtabular/tools/dataset_inspector.py 77 77 34 0 0% 16-186
nvtabular/utils.py 27 4 10 3 81% 26->27, 27, 28->31, 31, 37->38, 38, 53
nvtabular/worker.py 65 1 30 3 96% 69->70, 70, 77->97, 80->92
nvtabular/workflow.py 127 10 72 7 91% 32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284

TOTAL 3580 642 1521 176 79%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 78.95%
=========================== short test summary info ============================
FAILED tests/unit/test_tools.py::test_inspect[csv] - AttributeError: module '...
FAILED tests/unit/test_tools.py::test_inspect[parquet] - AttributeError: modu...
===== 2 failed, 549 passed, 8 skipped, 2138 warnings in 454.22s (0:07:34) ======
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins1325219421683715633.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 6bee99fcdb5d675c2c56ff59825fe34f8c2f1350, no merge conflicts.
Running as SYSTEM
Setting status of 6bee99fcdb5d675c2c56ff59825fe34f8c2f1350 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1486/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 6bee99fcdb5d675c2c56ff59825fe34f8c2f1350^{commit} # timeout=10
Checking out Revision 6bee99fcdb5d675c2c56ff59825fe34f8c2f1350 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6bee99fcdb5d675c2c56ff59825fe34f8c2f1350 # timeout=10
Commit message: "Test works"
 > git rev-list --no-walk e3a807a231113f9d5b371b808c49f6f1bd80e98b # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins2320933489149017601.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+23.g6bee99f
    Uninstalling nvtabular-0.3.0+23.g6bee99f:
      Successfully uninstalled nvtabular-0.3.0+23.g6bee99f
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 557 items / 1 error / 556 selected

==================================== ERRORS ====================================
__________________ ERROR collecting tests/unit/test_tools.py ___________________
ImportError while importing test module '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/tests/unit/test_tools.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/conda/envs/rapids/lib/python3.7/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/test_tools.py:10: in
import nvtabular.tools.data_inspector as datains
E ModuleNotFoundError: No module named 'nvtabular.tools.data_inspector'
=============================== warnings summary ===============================
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 144 126 80 0 8% 38-66, 79-107, 120-135, 151-164, 172, 177-178, 184-191, 195-198, 202, 206-212, 217-235, 241-269, 273-278
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 13-17, 54-288
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 130 89 0 10% 23, 27-28, 34-35, 44-71, 75-89, 98-126, 130-133, 182-194, 197-220, 223-237, 240-248, 251, 258-274, 316-320, 323-353, 356-369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 39 20 0 12% 48-52, 55-71, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 22-23, 26-45, 56-69, 72
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 20 12 0 18% 36-43, 46-51, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 38 33 22 0 8% 52-79, 82-99
nvtabular/framework_utils/torch/utils.py 31 29 10 0 5% 47-78
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 78 78 26 0 0% 16-175
nvtabular/io/csv.py 14 9 4 0 28% 28-36, 39-43
nvtabular/io/dask.py 82 68 34 0 12% 43-65, 82-115, 120-150, 154-160, 167-182
nvtabular/io/dataframe_engine.py 12 7 4 0 31% 26, 29-33, 37
nvtabular/io/dataset.py 134 101 56 0 17% 183-250, 271-297, 325-328, 380-412, 462-473, 489, 493-500, 505-507, 510, 513-519
nvtabular/io/dataset_engine.py 13 7 0 0 46% 25-31
nvtabular/io/hugectr.py 45 35 22 0 15% 26-39, 43-56, 59-68, 71-96, 99
nvtabular/io/parquet.py 124 89 40 0 21% 49-67, 72-83, 90-99, 102, 115-119, 122-127, 131-144, 147-155, 158-167, 173-176, 179-183, 186-192, 197-202, 207, 214-222
nvtabular/io/shuffle.py 25 16 10 0 26% 34-47, 53-56
nvtabular/io/writer.py 123 95 45 0 17% 30, 47, 65-101, 104-107, 110, 113, 118-153, 156-177, 181-196, 200, 203-205, 208-224
nvtabular/io/writer_factory.py 16 11 6 0 23% 31-35, 47-55
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 260 219 108 0 11% 29, 52-56, 60, 64, 67, 70-78, 85-97, 100-132, 136-140, 143, 147-154, 173-187, 190, 194-196, 201-207, 210-213, 216-233, 236, 239-243, 258-280, 283-364, 372-375, 406-412, 420-454, 457
nvtabular/loader/tensorflow.py 117 92 52 0 15% 34-66, 70-83, 206-218, 236, 244-255, 267, 270, 274, 278, 281-313, 316-343, 351, 354-363
nvtabular/loader/tf_utils.py 55 27 20 5 44% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70, 85-90, 100-113
nvtabular/loader/torch.py 41 23 8 0 37% 25-27, 30-36, 74, 87, 90, 93-95, 98, 102, 106, 109-111, 121
nvtabular/ops/init.py 18 0 0 0 100%
nvtabular/ops/bucketize.py 25 16 16 0 22% 45-55, 59-67
nvtabular/ops/categorify.py 463 407 260 0 8% 193-233, 242-275, 278-282, 285, 288, 291-292, 296-350, 353-356, 359, 362, 377, 383-406, 410-421, 425, 429, 436-503, 510-577, 581-586, 591-627, 632-666, 670, 687-781, 798-820, 849-924, 928-933, 937-949, 953-956, 960-961, 970-987
nvtabular/ops/clip.py 19 11 6 0 32% 43-47, 51-56
nvtabular/ops/column_similarity.py 86 62 32 0 20% 62-70, 74-87, 92, 118-147, 154-155, 164-166, 174-190, 199-224, 228-231, 235-236
nvtabular/ops/difference_lag.py 26 15 8 0 32% 56-58, 64-73, 78, 81, 84
nvtabular/ops/dropna.py 9 3 0 0 67% 39-41
nvtabular/ops/fill.py 40 17 6 0 50% 42-43, 47, 70-71, 75-80, 85-86, 90-91, 98, 101, 104
nvtabular/ops/filter.py 21 12 6 0 33% 42-45, 49-58
nvtabular/ops/hash_bucket.py 31 19 18 0 24% 68-78, 82-92, 97-102
nvtabular/ops/hashed_cross.py 29 18 13 0 26% 50-56, 60-71, 76, 81-84
nvtabular/ops/join_external.py 66 53 28 0 14% 83-96, 101-133, 136-143, 148-150, 156-166
nvtabular/ops/join_groupby.py 77 57 28 0 19% 85-100, 103-122, 125-126, 129-153, 156, 161-171, 174, 177, 180-181
nvtabular/ops/lambdaop.py 27 17 10 0 27% 59-66, 70-79, 84
nvtabular/ops/logop.py 9 1 0 0 89% 38
nvtabular/ops/moments.py 62 50 18 0 15% 30-62, 66-77, 81-86, 90-112
nvtabular/ops/normalize.py 70 38 14 0 38% 46-48, 52, 55-57, 61-66, 69, 72-73, 76-77, 95-96, 102-110, 116, 123-125, 128, 131-132, 135-136
nvtabular/ops/operator.py 15 5 2 1 65% 22->24, 24, 65, 76, 79-81
nvtabular/ops/rename.py 18 11 10 0 25% 40-44, 47-48, 53-58
nvtabular/ops/stat_operator.py 11 1 0 0 91% 30
nvtabular/ops/target_encoding.py 151 126 66 0 12% 134-156, 159-196, 199-202, 205, 208-216, 219, 222-223, 226-227, 230-231, 235-313, 317-347, 357-366
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 233 189 60 0 15% 18-19, 22, 27, 30-42, 45, 55-57, 64-73, 83-109, 112-120, 125-133, 136-143, 150-158, 166-182, 191-221, 225-249, 253-271, 274-280, 284-286, 292-303, 306-309, 317-319, 335-341, 359-367, 372-373, 410-422
nvtabular/tools/dataset_inspector.py 77 77 34 0 0% 16-186
nvtabular/utils.py 27 6 10 5 70% 26->27, 27, 28->31, 31, 37->38, 38, 40->41, 41, 45->47, 47, 53
nvtabular/worker.py 65 54 30 0 12% 34-35, 44-57, 67-97, 105-122
nvtabular/workflow.py 127 102 72 1 13% 32->33, 33, 66-67, 85-87, 98-141, 155-156, 159-188, 191-214, 217-218, 221-222, 226-229, 233-241, 249, 254-289

TOTAL 3580 2782 1521 12 16%
Coverage XML written to file coverage.xml

FAIL Required test coverage of 70% not reached. Total coverage: 15.88%
=========================== short test summary info ============================
ERROR tests/unit/test_tools.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
========================= 4 warnings, 1 error in 6.44s =========================
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins7784097497585232826.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit d8a564333a67bb474d767f4d9d9baff212014e33, no merge conflicts.
Running as SYSTEM
Setting status of d8a564333a67bb474d767f4d9d9baff212014e33 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1488/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse d8a564333a67bb474d767f4d9d9baff212014e33^{commit} # timeout=10
Checking out Revision d8a564333a67bb474d767f4d9d9baff212014e33 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d8a564333a67bb474d767f4d9d9baff212014e33 # timeout=10
Commit message: "Dataset inspect read - Tests passing"
 > git rev-list --no-walk c0ae28de1e60c4fd14e4dc4e04e83775db55b7f5 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6307068742077306661.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+24.gd8a5643
    Uninstalling nvtabular-0.3.0+24.gd8a5643:
      Successfully uninstalled nvtabular-0.3.0+24.gd8a5643
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 9%]
........ [ 10%]
tests/unit/test_datagen.py ..................... [ 14%]
tests/unit/test_io.py .................................................. [ 23%]
........................................ssssssss [ 31%]
tests/unit/test_notebooks.py .... [ 32%]
tests/unit/test_ops.py ................................................. [ 41%]
........................................................................ [ 54%]
.................................... [ 60%]
tests/unit/test_s3.py .. [ 61%]
tests/unit/test_tf_dataloader.py ................... [ 64%]
tests/unit/test_tf_layers.py ........................................... [ 72%]
................................... [ 78%]
tests/unit/test_tools.py .. [ 78%]
tests/unit/test_torch_dataloader.py .............................. [ 84%]
tests/unit/test_workflow.py ............................................ [ 91%]
............................................. [100%]

=============================== warnings summary ===============================
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_datagen.py: 1392 warnings
tests/unit/test_io.py: 5 warnings
tests/unit/test_tf_dataloader.py: 24 warnings
tests/unit/test_torch_dataloader.py: 6 warnings
tests/unit/test_workflow.py: 2 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_datagen.py: 696 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
[x.dtype for x in self._data.columns], index=self._data.names

tests/unit/test_datagen.py::test_full_df[None-1000]
tests/unit/test_datagen.py::test_full_df[None-100000]
tests/unit/test_datagen.py::test_full_df[distro1-1000]
tests/unit/test_datagen.py::test_full_df[distro1-100000]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.
warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")

tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36353 instead
http_address["port"], self.http_server.port

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43973 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[csv]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33719 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[csv]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36473 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41465 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38073 instead
http_address["port"], self.http_server.port

tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 32995 instead
http_address["port"], self.http_server.port

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 144 19 80 7 85% 53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 13-17, 54-288
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 48->49, 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 22-23, 26-45, 56-69, 72
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 1 12 1 95% 46->47, 47
nvtabular/framework_utils/torch/models.py 38 0 22 0 100%
nvtabular/framework_utils/torch/utils.py 31 4 10 2 85% 51->52, 52, 55->56, 56-58
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 78 78 26 0 0% 16-175
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 82 3 34 7 91% 129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181
nvtabular/io/dataframe_engine.py 12 1 4 1 88% 31->32, 32
nvtabular/io/dataset.py 134 18 56 10 84% 196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516
nvtabular/io/dataset_engine.py 13 0 0 0 100%
nvtabular/io/hugectr.py 45 2 22 2 91% 27->32, 32, 72->95, 99
nvtabular/io/parquet.py 124 2 40 2 98% 54->55, 55-63, 189->191
nvtabular/io/shuffle.py 25 7 10 2 63% 37->40, 38->39, 39-46
nvtabular/io/writer.py 123 9 45 2 92% 30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 260 13 108 7 95% 71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457
nvtabular/loader/tensorflow.py 117 8 52 7 90% 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70
nvtabular/loader/torch.py 41 10 8 0 67% 25-27, 30-36
nvtabular/ops/init.py 18 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 45->46, 46, 47->49, 49-52
nvtabular/ops/categorify.py 463 86 260 50 79% 203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986
nvtabular/ops/clip.py 19 2 6 3 80% 43->44, 44, 52->54, 54->55, 55
nvtabular/ops/column_similarity.py 86 22 32 5 69% 79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 40 4 6 1 89% 75->76, 76, 98, 101, 104
nvtabular/ops/filter.py 21 1 6 1 93% 43->44, 44
nvtabular/ops/hash_bucket.py 31 2 18 2 88% 70->73, 73, 98->102, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51->52, 52, 66->67, 67, 81->82, 81->exit, 82
nvtabular/ops/join_external.py 66 5 28 6 88% 93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165
nvtabular/ops/join_groupby.py 77 5 28 2 93% 99->100, 100, 103->110, 174, 177, 180-181
nvtabular/ops/lambdaop.py 27 4 10 4 78% 60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 62 0 18 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 63->62, 105->107, 107-108, 128, 131-132, 135-136
nvtabular/ops/operator.py 15 1 2 1 88% 22->24, 24
nvtabular/ops/rename.py 18 3 10 3 71% 40->41, 41, 53->54, 54, 55->58, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 12 66 6 90% 139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 233 1 60 4 98% 170->172, 217->221, 306->309, 307->306, 309
nvtabular/tools/dataset_inspector.py 77 15 34 2 72% 30->32, 32-39, 74->75, 75-91
nvtabular/utils.py 27 4 10 3 81% 26->27, 27, 28->31, 31, 37->38, 38, 53
nvtabular/worker.py 65 1 30 3 96% 69->70, 70, 77->97, 80->92
nvtabular/workflow.py 127 10 72 7 91% 32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284

TOTAL 3580 580 1521 178 81%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.51%
========== 551 passed, 8 skipped, 2142 warnings in 487.53s (0:08:07) ===========
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins578013672945747349.sh

@albert17 albert17 requested a review from benfred January 19, 2021 18:34
Copy link
Contributor

@jperez999 jperez999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all of those examples/dataset_inspector assets. put the inspector example as a comment within the command line tool when you move it into the code base. This will allow the docs to pick that up and turn it into an example within the documentation that users can reference.

@@ -0,0 +1,73 @@
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is actually the command line tool for the dataset Inspector class... It's not really an example. We should move this into the code base. We wont be able to actually have examples of this... In this format. It will have to be more like a readme type with examples on how to use. We wont actually have notebooks.

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit caa312cd357e2d53925db9cd53d9b2420b551221, no merge conflicts.
Running as SYSTEM
Setting status of caa312cd357e2d53925db9cd53d9b2420b551221 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1489/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse caa312cd357e2d53925db9cd53d9b2420b551221^{commit} # timeout=10
Checking out Revision caa312cd357e2d53925db9cd53d9b2420b551221 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f caa312cd357e2d53925db9cd53d9b2420b551221 # timeout=10
Commit message: "Moves dataset inspector script"
 > git rev-list --no-walk d8a564333a67bb474d767f4d9d9baff212014e33 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7688479711240999931.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+25.gcaa312c
    Uninstalling nvtabular-0.3.0+25.gcaa312c:
      Successfully uninstalled nvtabular-0.3.0+25.gcaa312c
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 9%]
........ [ 10%]
tests/unit/test_datagen.py ..................... [ 14%]
tests/unit/test_io.py .................................................. [ 23%]
........................................ssssssss [ 31%]
tests/unit/test_notebooks.py .... [ 32%]
tests/unit/test_ops.py ................................................. [ 41%]
........................................................................ [ 54%]
.................................... [ 60%]
tests/unit/test_s3.py .. [ 61%]
tests/unit/test_tf_dataloader.py ................... [ 64%]
tests/unit/test_tf_layers.py .............F............................. [ 72%]
................................... [ 78%]
tests/unit/test_tools.py .. [ 78%]
tests/unit/test_torch_dataloader.py .............................. [ 84%]
tests/unit/test_workflow.py ............................................ [ 91%]
............................................. [100%]

=================================== FAILURES ===================================
_____________ test_dot_product_interaction_layer[True-None-64-16] ______________

embedding_dim = 16, num_features = 64, interaction_type = None
self_interaction = True

@pytest.mark.parametrize("embedding_dim", [1, 4, 16])
@pytest.mark.parametrize("num_features", [1, 16, 64])
@pytest.mark.parametrize("interaction_type", [None, "field_all", "field_each", "field_interaction"])
@pytest.mark.parametrize("self_interaction", [True, False])
def test_dot_product_interaction_layer(
    embedding_dim, num_features, interaction_type, self_interaction
):
    if num_features == 1 and not self_interaction:
        return

    input = tf.keras.Input(name="x", shape=(num_features, embedding_dim), dtype=tf.float32)
    interaction_layer = layers.DotProductInteraction(interaction_type, self_interaction)
    output = interaction_layer(input)
    model = tf.keras.Model(inputs=input, outputs=output)
    model.compile("sgd", "mse")

    x = np.random.randn(8, num_features, embedding_dim).astype(np.float32)
    y_hat = model.predict(x)

    if self_interaction:
        expected_dim = num_features * (num_features + 1) // 2
    else:
        expected_dim = num_features * (num_features - 1) // 2
    assert y_hat.shape[1] == expected_dim

    if interaction_type is not None:
        W = interaction_layer.kernel.numpy()
    expected_outputs = []
    for i in range(num_features):
        j_start = i if self_interaction else i + 1
        for j in range(j_start, num_features):
            x_i = x[:, i]
            x_j = x[:, j]
            if interaction_type == "field_all":
                W_ij = W
            elif interaction_type == "field_each":
                W_ij = W[i].T
            elif interaction_type == "field_interaction":
                W_ij = W[i, j]

            if interaction_type is not None:
                x_i = x_i @ W_ij
            expected_outputs.append((x_i * x_j).sum(axis=1))
    expected_output = np.stack(expected_outputs).T

    rtol = 1e-3
    atol = 1e-6
    frac_correct = 1.0
    match = np.isclose(expected_output, y_hat, rtol=rtol, atol=atol)
  assert match.mean() >= frac_correct

E assert 0.9999399038461538 >= 1.0
E + where 0.9999399038461538 = <built-in method mean of numpy.ndarray object at 0x7fb8c876cda0>()
E + where <built-in method mean of numpy.ndarray object at 0x7fb8c876cda0> = array([[ True, True, True, ..., True, True, True],\n [ True, True, True, ..., True, True, True],\n ...True],\n [ True, True, True, ..., True, True, True],\n [ True, True, True, ..., True, True, True]]).mean

tests/unit/test_tf_layers.py:291: AssertionError
=============================== warnings summary ===============================
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_datagen.py: 1392 warnings
tests/unit/test_io.py: 5 warnings
tests/unit/test_tf_dataloader.py: 24 warnings
tests/unit/test_torch_dataloader.py: 6 warnings
tests/unit/test_workflow.py: 2 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_datagen.py: 696 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
[x.dtype for x in self._data.columns], index=self._data.names

tests/unit/test_datagen.py::test_full_df[None-1000]
tests/unit/test_datagen.py::test_full_df[None-100000]
tests/unit/test_datagen.py::test_full_df[distro1-1000]
tests/unit/test_datagen.py::test_full_df[distro1-100000]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.
warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")

tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34173 instead
http_address["port"], self.http_server.port

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44355 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[csv]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45053 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[csv]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40299 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44719 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44621 instead
http_address["port"], self.http_server.port

tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43463 instead
http_address["port"], self.http_server.port

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 144 19 80 7 85% 53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 13-17, 54-288
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 48->49, 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 22-23, 26-45, 56-69, 72
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 1 12 1 95% 46->47, 47
nvtabular/framework_utils/torch/models.py 38 0 22 0 100%
nvtabular/framework_utils/torch/utils.py 31 4 10 2 85% 51->52, 52, 55->56, 56-58
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 78 78 26 0 0% 16-175
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 82 3 34 7 91% 129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181
nvtabular/io/dataframe_engine.py 12 1 4 1 88% 31->32, 32
nvtabular/io/dataset.py 134 18 56 10 84% 196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516
nvtabular/io/dataset_engine.py 13 0 0 0 100%
nvtabular/io/hugectr.py 45 2 22 2 91% 27->32, 32, 72->95, 99
nvtabular/io/parquet.py 124 2 40 2 98% 54->55, 55-63, 189->191
nvtabular/io/shuffle.py 25 7 10 2 63% 37->40, 38->39, 39-46
nvtabular/io/writer.py 123 9 45 2 92% 30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 260 13 108 7 95% 71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457
nvtabular/loader/tensorflow.py 117 8 52 7 90% 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70
nvtabular/loader/torch.py 41 10 8 0 67% 25-27, 30-36
nvtabular/ops/init.py 18 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 45->46, 46, 47->49, 49-52
nvtabular/ops/categorify.py 463 86 260 50 79% 203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986
nvtabular/ops/clip.py 19 2 6 3 80% 43->44, 44, 52->54, 54->55, 55
nvtabular/ops/column_similarity.py 86 22 32 5 69% 79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 40 4 6 1 89% 75->76, 76, 98, 101, 104
nvtabular/ops/filter.py 21 1 6 1 93% 43->44, 44
nvtabular/ops/hash_bucket.py 31 2 18 2 88% 70->73, 73, 98->102, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51->52, 52, 66->67, 67, 81->82, 81->exit, 82
nvtabular/ops/join_external.py 66 5 28 6 88% 93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165
nvtabular/ops/join_groupby.py 77 5 28 2 93% 99->100, 100, 103->110, 174, 177, 180-181
nvtabular/ops/lambdaop.py 27 4 10 4 78% 60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 62 0 18 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 63->62, 105->107, 107-108, 128, 131-132, 135-136
nvtabular/ops/operator.py 15 1 2 1 88% 22->24, 24
nvtabular/ops/rename.py 18 3 10 3 71% 40->41, 41, 53->54, 54, 55->58, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 12 66 6 90% 139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 233 1 60 4 98% 170->172, 217->221, 306->309, 307->306, 309
nvtabular/tools/dataset_inspector.py 77 15 34 2 72% 30->32, 32-39, 74->75, 75-91
nvtabular/tools/inspector_script.py 17 17 0 0 0% 17-75
nvtabular/utils.py 27 4 10 3 81% 26->27, 27, 28->31, 31, 37->38, 38, 53
nvtabular/worker.py 65 1 30 3 96% 69->70, 70, 77->97, 80->92
nvtabular/workflow.py 127 10 72 7 91% 32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284

TOTAL 3597 597 1521 178 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.25%
=========================== short test summary info ============================
FAILED tests/unit/test_tf_layers.py::test_dot_product_interaction_layer[True-None-64-16]
===== 1 failed, 550 passed, 8 skipped, 2142 warnings in 489.03s (0:08:09) ======
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5745337491396734923.sh

@albert17
Copy link
Contributor Author

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit caa312cd357e2d53925db9cd53d9b2420b551221, no merge conflicts.
Running as SYSTEM
Setting status of caa312cd357e2d53925db9cd53d9b2420b551221 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1490/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse caa312cd357e2d53925db9cd53d9b2420b551221^{commit} # timeout=10
Checking out Revision caa312cd357e2d53925db9cd53d9b2420b551221 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f caa312cd357e2d53925db9cd53d9b2420b551221 # timeout=10
Commit message: "Moves dataset inspector script"
 > git rev-list --no-walk caa312cd357e2d53925db9cd53d9b2420b551221 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7652535698069483529.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+25.gcaa312c
    Uninstalling nvtabular-0.3.0+25.gcaa312c:
      Successfully uninstalled nvtabular-0.3.0+25.gcaa312c
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 9%]
........ [ 10%]
tests/unit/test_datagen.py ..................... [ 14%]
tests/unit/test_io.py .................................................. [ 23%]
........................................ssssssss [ 31%]
tests/unit/test_notebooks.py .... [ 32%]
tests/unit/test_ops.py ................................................. [ 41%]
........................................................................ [ 54%]
.................................... [ 60%]
tests/unit/test_s3.py .. [ 61%]
tests/unit/test_tf_dataloader.py ................... [ 64%]
tests/unit/test_tf_layers.py ........................................... [ 72%]
................................... [ 78%]
tests/unit/test_tools.py .. [ 78%]
tests/unit/test_torch_dataloader.py .............................. [ 84%]
tests/unit/test_workflow.py ............................................ [ 91%]
............................................. [100%]

=============================== warnings summary ===============================
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_datagen.py: 1392 warnings
tests/unit/test_io.py: 5 warnings
tests/unit/test_tf_dataloader.py: 24 warnings
tests/unit/test_torch_dataloader.py: 6 warnings
tests/unit/test_workflow.py: 2 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_datagen.py: 696 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
[x.dtype for x in self._data.columns], index=self._data.names

tests/unit/test_datagen.py::test_full_df[None-1000]
tests/unit/test_datagen.py::test_full_df[None-100000]
tests/unit/test_datagen.py::test_full_df[distro1-1000]
tests/unit/test_datagen.py::test_full_df[distro1-100000]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.
warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")

tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43873 instead
http_address["port"], self.http_server.port

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35987 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[csv]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40693 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[csv]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44211 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39353 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42403 instead
http_address["port"], self.http_server.port

tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38443 instead
http_address["port"], self.http_server.port

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 144 19 80 7 85% 53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 13-17, 54-288
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 48->49, 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 22-23, 26-45, 56-69, 72
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 1 12 1 95% 46->47, 47
nvtabular/framework_utils/torch/models.py 38 0 22 0 100%
nvtabular/framework_utils/torch/utils.py 31 4 10 2 85% 51->52, 52, 55->56, 56-58
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 78 78 26 0 0% 16-175
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 82 3 34 7 91% 129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181
nvtabular/io/dataframe_engine.py 12 1 4 1 88% 31->32, 32
nvtabular/io/dataset.py 134 18 56 10 84% 196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516
nvtabular/io/dataset_engine.py 13 0 0 0 100%
nvtabular/io/hugectr.py 45 2 22 2 91% 27->32, 32, 72->95, 99
nvtabular/io/parquet.py 124 2 40 2 98% 54->55, 55-63, 189->191
nvtabular/io/shuffle.py 25 7 10 2 63% 37->40, 38->39, 39-46
nvtabular/io/writer.py 123 9 45 2 92% 30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 260 13 108 7 95% 71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457
nvtabular/loader/tensorflow.py 117 8 52 7 90% 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70
nvtabular/loader/torch.py 41 10 8 0 67% 25-27, 30-36
nvtabular/ops/init.py 18 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 45->46, 46, 47->49, 49-52
nvtabular/ops/categorify.py 463 86 260 50 79% 203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986
nvtabular/ops/clip.py 19 2 6 3 80% 43->44, 44, 52->54, 54->55, 55
nvtabular/ops/column_similarity.py 86 22 32 5 69% 79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 40 4 6 1 89% 75->76, 76, 98, 101, 104
nvtabular/ops/filter.py 21 1 6 1 93% 43->44, 44
nvtabular/ops/hash_bucket.py 31 2 18 2 88% 70->73, 73, 98->102, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51->52, 52, 66->67, 67, 81->82, 81->exit, 82
nvtabular/ops/join_external.py 66 5 28 6 88% 93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165
nvtabular/ops/join_groupby.py 77 5 28 2 93% 99->100, 100, 103->110, 174, 177, 180-181
nvtabular/ops/lambdaop.py 27 4 10 4 78% 60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 62 0 18 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 63->62, 105->107, 107-108, 128, 131-132, 135-136
nvtabular/ops/operator.py 15 1 2 1 88% 22->24, 24
nvtabular/ops/rename.py 18 3 10 3 71% 40->41, 41, 53->54, 54, 55->58, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 12 66 6 90% 139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 233 1 60 4 98% 170->172, 217->221, 306->309, 307->306, 309
nvtabular/tools/dataset_inspector.py 77 15 34 2 72% 30->32, 32-39, 74->75, 75-91
nvtabular/tools/inspector_script.py 17 17 0 0 0% 17-75
nvtabular/utils.py 27 4 10 3 81% 26->27, 27, 28->31, 31, 37->38, 38, 53
nvtabular/worker.py 65 1 30 3 96% 69->70, 70, 77->97, 80->92
nvtabular/workflow.py 127 10 72 7 91% 32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284

TOTAL 3597 597 1521 178 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.25%
========== 551 passed, 8 skipped, 2142 warnings in 488.86s (0:08:08) ===========
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins173310810183650036.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 1d2fb521abb4f3ad9fcff980eefbf3349626a53f, no merge conflicts.
Running as SYSTEM
Setting status of 1d2fb521abb4f3ad9fcff980eefbf3349626a53f to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1491/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 1d2fb521abb4f3ad9fcff980eefbf3349626a53f^{commit} # timeout=10
Checking out Revision 1d2fb521abb4f3ad9fcff980eefbf3349626a53f (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 1d2fb521abb4f3ad9fcff980eefbf3349626a53f # timeout=10
Commit message: "Initial inspect-datagent test"
 > git rev-list --no-walk caa312cd357e2d53925db9cd53d9b2420b551221 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7399859435030933647.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
83 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 560 items

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 9%]
........ [ 10%]
tests/unit/test_io.py .................................................. [ 19%]
........................................ssssssss [ 28%]
tests/unit/test_notebooks.py .... [ 28%]
tests/unit/test_ops.py ................................................. [ 37%]
........................................................................ [ 50%]
.................................... [ 56%]
tests/unit/test_s3.py .. [ 57%]
tests/unit/test_tf_dataloader.py ................... [ 60%]
tests/unit/test_tf_layers.py ........................................... [ 68%]
................................... [ 74%]
tests/unit/test_tools.py FFFFFFFF....FFFFFFFFFFF. [ 78%]
tests/unit/test_torch_dataloader.py .............................F [ 84%]
tests/unit/test_workflow.py ............................................ [ 91%]
............................................. [100%]

=================================== FAILURES ===================================
___________________________ test_powerlaw[None-1000] ___________________________

num_rows = 1000, distro = None

@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_powerlaw(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:56:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}
distros = None

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
__________________________ test_powerlaw[None-10000] ___________________________

num_rows = 10000, distro = None

@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_powerlaw(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:56:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}
distros = None

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
_________________________ test_powerlaw[distro1-1000] __________________________

num_rows = 1000
distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_powerlaw(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:56:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}
distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
_________________________ test_powerlaw[distro1-10000] _________________________

num_rows = 10000
distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_powerlaw(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:56:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}
distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
___________________________ test_uniform[None-1000] ____________________________

num_rows = 1000, distro = None

@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_uniform(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:72:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}
distros = None

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
___________________________ test_uniform[None-10000] ___________________________

num_rows = 10000, distro = None

@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_uniform(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:72:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}
distros = None

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
__________________________ test_uniform[distro1-1000] __________________________

num_rows = 1000
distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_uniform(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:72:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}
distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
_________________________ test_uniform[distro1-10000] __________________________

num_rows = 10000
distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_uniform(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:72:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}
distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
___________________________ test_cat_rep[None-1000] ____________________________

num_rows = 1000, distro = None

@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_cat_rep(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:101:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}
distros = None

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
___________________________ test_cat_rep[None-10000] ___________________________

num_rows = 10000, distro = None

@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_cat_rep(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:101:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}
distros = None

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
__________________________ test_cat_rep[distro1-1000] __________________________

num_rows = 1000
distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_cat_rep(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:101:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}
distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
_________________________ test_cat_rep[distro1-10000] __________________________

num_rows = 10000
distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_cat_rep(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:101:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}
distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
______________________________ test_json_convert _______________________________

def test_json_convert():
  cols = datagen._get_cols_from_schema(json_sample)

tests/unit/test_tools.py:120:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}
distros = None

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
___________________________ test_full_df[None-1000] ____________________________

num_rows = 1000
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_full_df_None_1000_0')
distro = None

@pytest.mark.parametrize("num_rows", [1000, 100000])
@pytest.mark.parametrize("distro", [None, distros])
def test_full_df(num_rows, tmpdir, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:131:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}
distros = None

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
__________________________ test_full_df[None-100000] ___________________________

num_rows = 100000
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_full_df_None_100000_0')
distro = None

@pytest.mark.parametrize("num_rows", [1000, 100000])
@pytest.mark.parametrize("distro", [None, distros])
def test_full_df(num_rows, tmpdir, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:131:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 100000}
distros = None

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
__________________________ test_full_df[distro1-1000] __________________________

num_rows = 1000
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_full_df_distro1_1000_0')
distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

@pytest.mark.parametrize("num_rows", [1000, 100000])
@pytest.mark.parametrize("distro", [None, distros])
def test_full_df(num_rows, tmpdir, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:131:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}
distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
_________________________ test_full_df[distro1-100000] _________________________

num_rows = 100000
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_full_df_distro1_100000_0')
distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

@pytest.mark.parametrize("num_rows", [1000, 100000])
@pytest.mark.parametrize("distro", [None, distros])
def test_full_df(num_rows, tmpdir, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())
  cols = datagen._get_cols_from_schema(json_sample, distros=distro)

tests/unit/test_tools.py:131:


schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 100000}
distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}

def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})
          cols[section].append(executor[section](**v_dict))

E KeyError: 'labs'

nvtabular/tools/data_gen.py:424: KeyError
______________________________ test_inspect[csv] _______________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_inspect_csv_0')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-1/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-1/csv0'), 'csv-no... local('/tmp/pytest-of-jenkins/pytest-1/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-1/parquet0')}
engine = 'csv'

@pytest.mark.parametrize("engine", ["csv", "parquet"])
def test_inspect(tmpdir, datasets, engine):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    output_file = tmpdir + "/dataset_info.json"

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]
    all_cols = columns_dict["cats"] + columns_dict["conts"] + columns_dict["labels"]

    # Create inspector and inspect
    a = datains.DatasetInspector()
    a.inspect(paths, engine, columns_dict, output_file)

    # Check output_file was created
    assert os.path.isfile(output_file)

    # Read output file
    with fsspec.open(output_file) as f:
        output = json.load(f)

    # Get ddf and cluster to check
    dataset = Dataset(paths, engine=engine)
    ddf = dataset.to_ddf()
    cluster = LocalCUDACluster()
    client = Client(cluster)

    # Dictionary with json output key names
    key_names = {}
    key_names["min"] = {}
    key_names["min"]["cat"] = "min_entry_size"
    key_names["min"]["cont"] = "min_val"
    key_names["max"] = {}
    key_names["max"]["cat"] = "max_entry_size"
    key_names["max"]["cont"] = "max_val"
    key_names["mean"] = {}
    key_names["mean"]["cat"] = "avg_entry_size"
    key_names["mean"]["cont"] = "mean"
    # Correct dtypes
    ddf_dtypes = ddf.head(1)

    # Check output
    for col in all_cols:
        # Check dtype for all
      assert output[col]["dtype"] == str(ddf_dtypes[col].dtype)

E KeyError: 'name-string'

tests/unit/test_tools.py:211: KeyError
____________________________ test_inspect[parquet] _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_inspect_parquet_0')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-1/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-1/csv0'), 'csv-no... local('/tmp/pytest-of-jenkins/pytest-1/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-1/parquet0')}
engine = 'parquet'

@pytest.mark.parametrize("engine", ["csv", "parquet"])
def test_inspect(tmpdir, datasets, engine):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    output_file = tmpdir + "/dataset_info.json"

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]
    all_cols = columns_dict["cats"] + columns_dict["conts"] + columns_dict["labels"]

    # Create inspector and inspect
    a = datains.DatasetInspector()
    a.inspect(paths, engine, columns_dict, output_file)

    # Check output_file was created
    assert os.path.isfile(output_file)

    # Read output file
    with fsspec.open(output_file) as f:
        output = json.load(f)

    # Get ddf and cluster to check
    dataset = Dataset(paths, engine=engine)
    ddf = dataset.to_ddf()
    cluster = LocalCUDACluster()
    client = Client(cluster)

    # Dictionary with json output key names
    key_names = {}
    key_names["min"] = {}
    key_names["min"]["cat"] = "min_entry_size"
    key_names["min"]["cont"] = "min_val"
    key_names["max"] = {}
    key_names["max"]["cat"] = "max_entry_size"
    key_names["max"]["cont"] = "max_val"
    key_names["mean"] = {}
    key_names["mean"]["cat"] = "avg_entry_size"
    key_names["mean"]["cont"] = "mean"
    # Correct dtypes
    ddf_dtypes = ddf.head(1)

    # Check output
    for col in all_cols:
        # Check dtype for all
      assert output[col]["dtype"] == str(ddf_dtypes[col].dtype)

E KeyError: 'name-cat'

tests/unit/test_tools.py:211: KeyError
----------------------------- Captured stderr call -----------------------------
distributed.client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client
------------------------------ Captured log call -------------------------------
ERROR asyncio:base_events.py:1619 _GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=CancelledError()>
concurrent.futures._base.CancelledError
____________________________ test_mh_model_support _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_mh_model_support0')

def test_mh_model_support(tmpdir):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Reviewers": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Null User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
            "Cont1": [0.3, 0.4, 0.5, 0.6],
            "Cont2": [0.3, 0.4, 0.5, 0.6],
            "Cat1": ["A", "B", "A", "C"],
        }
    )
    cat_names = ["Cat1", "Null User", "Authors", "Reviewers"]  # , "Engaging User"]
    cont_names = ["Cont1", "Cont2"]
    label_name = ["Post"]
    out_path = os.path.join(tmpdir, "train/")
    os.mkdir(out_path)

    cats = cat_names >> ops.Categorify()
    conts = cont_names >> ops.Normalize()

    processor = nvt.Workflow(cats + conts + label_name)
  df_out = processor.fit_transform(nvt.Dataset(df)).to_ddf().compute()

tests/unit/test_torch_dataloader.py:279:


/opt/conda/envs/rapids/lib/python3.7/site-packages/dask/base.py:167: in compute
(result,) = compute(self, traverse=False, **kwargs)
/opt/conda/envs/rapids/lib/python3.7/site-packages/dask/base.py:452: in compute
results = schedule(dsk, keys, **kwargs)
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py:2725: in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py:1992: in gather
asynchronous=asynchronous,
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py:833: in sync
self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py:340: in sync
raise exc.with_traceback(tb)
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py:324: in f
result[0] = yield future


self = <tornado.gen.Runner object at 0x7f29a01b6a10>

def run(self) -> None:
    """Starts or resumes the generator, running until it reaches a
    yield point that is not ready.
    """
    if self.running or self.finished:
        return
    try:
        self.running = True
        while True:
            future = self.future
            if future is None:
                raise Exception("No pending future")
            if not future.done():
                return
            self.future = None
            try:
                exc_info = None

                try:
                  value = future.result()

E concurrent.futures._base.CancelledError

/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/gen.py:735: CancelledError
----------------------------- Captured stderr call -----------------------------
distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors
yield
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object
return x.host_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize
header, frames = self.device_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize
header, frames = self.serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize
column_header, column_frames = column.serialize_columns(self._columns)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize
header["dtype"] = self.dtype.str
AttributeError: 'ListDtype' object has no attribute 'str'
distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames
msg, serializers=serializers, on_error=on_error, context=context
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.batched - WARNING - Error in batched write, retrying
distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors
yield
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object
return x.host_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize
header, frames = self.device_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize
header, frames = self.serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize
column_header, column_frames = column.serialize_columns(self._columns)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize
header["dtype"] = self.dtype.str
AttributeError: 'ListDtype' object has no attribute 'str'
distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames
msg, serializers=serializers, on_error=on_error, context=context
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.batched - WARNING - Error in batched write, retrying
distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors
yield
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object
return x.host_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize
header, frames = self.device_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize
header, frames = self.serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize
column_header, column_frames = column.serialize_columns(self._columns)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize
header["dtype"] = self.dtype.str
AttributeError: 'ListDtype' object has no attribute 'str'
distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames
msg, serializers=serializers, on_error=on_error, context=context
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.batched - WARNING - Error in batched write, retrying
distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors
yield
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object
return x.host_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize
header, frames = self.device_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize
header, frames = self.serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize
column_header, column_frames = column.serialize_columns(self._columns)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize
header["dtype"] = self.dtype.str
AttributeError: 'ListDtype' object has no attribute 'str'
distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames
msg, serializers=serializers, on_error=on_error, context=context
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.batched - WARNING - Error in batched write, retrying
distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors
yield
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object
return x.host_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize
header, frames = self.device_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize
header, frames = self.serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize
column_header, column_frames = column.serialize_columns(self._columns)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize
header["dtype"] = self.dtype.str
AttributeError: 'ListDtype' object has no attribute 'str'
distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames
msg, serializers=serializers, on_error=on_error, context=context
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.batched - WARNING - Error in batched write, retrying
distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors
yield
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object
return x.host_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize
header, frames = self.device_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize
header, frames = self.serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize
column_header, column_frames = column.serialize_columns(self._columns)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize
header["dtype"] = self.dtype.str
AttributeError: 'ListDtype' object has no attribute 'str'
distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames
msg, serializers=serializers, on_error=on_error, context=context
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.batched - WARNING - Error in batched write, retrying
distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors
yield
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object
return x.host_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize
header, frames = self.device_serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize
header, frames = self.serialize()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize
column_header, column_frames = column.serialize_columns(self._columns)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in
header_columns = [c.serialize() for c in columns]
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize
header["dtype"] = self.dtype.str
AttributeError: 'ListDtype' object has no attribute 'str'
distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames
msg, serializers=serializers, on_error=on_error, context=context
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
distributed.batched - ERROR - Error in batched write
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/batched.py", line 94, in _background_send
payload, serializers=self.serializers, on_error="raise"
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/tcp.py", line 230, in write
**self.handshake_options,
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 54, in to_frames
return _to_frames()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames
msg, serializers=serializers, on_error=on_error, context=context
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps
for key, value in data.items()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in
if type(value) is Serialize
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type DataFrame.', ' Authors Reviewers Engaging User ... Cont1 Cont2 Cat1\n0 [User_A] [User_A] User_B ... 0.3 0.3 A\n1 [User_A, User_E] [User_A, User_E] User_B ... 0.4 0.4 B\n2 [User_B, User_C] [User_B, User_C] User_A ... 0.5 0.5 A\n3 [User_C] [User_C] User_D ... 0.6 0.6 C\n\n[4 rows x 8 columns]')
=============================== warnings summary ===============================
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39829 instead
http_address["port"], self.http_server.port

tests/unit/test_io.py: 5 warnings
tests/unit/test_tf_dataloader.py: 24 warnings
tests/unit/test_tools.py: 40 warnings
tests/unit/test_torch_dataloader.py: 6 warnings
tests/unit/test_workflow.py: 2 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45829 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py: 20 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
[x.dtype for x in self._data.columns], index=self._data.names

tests/unit/test_tools.py::test_inspect[csv]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33571 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[csv]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42493 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36669 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33607 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33603 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.
warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42549 instead
http_address["port"], self.http_server.port

tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42421 instead
http_address["port"], self.http_server.port

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 144 19 80 7 85% 53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 13-17, 54-288
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 48->49, 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 22-23, 26-45, 56-69, 72
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 46->47, 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 38 6 22 7 75% 55->56, 56, 58->59, 59, 63->64, 64, 83->84, 84, 85->86, 86, 89->90, 90, 96->98
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 51->52, 52, 55->56, 56-58, 61->67, 67-69
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 78 78 26 0 0% 16-175
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 82 3 34 7 91% 129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181
nvtabular/io/dataframe_engine.py 12 1 4 1 88% 31->32, 32
nvtabular/io/dataset.py 134 18 56 10 84% 196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516
nvtabular/io/dataset_engine.py 13 0 0 0 100%
nvtabular/io/hugectr.py 45 2 22 2 91% 27->32, 32, 72->95, 99
nvtabular/io/parquet.py 124 2 40 2 98% 54->55, 55-63, 189->191
nvtabular/io/shuffle.py 25 7 10 2 63% 37->40, 38->39, 39-46
nvtabular/io/writer.py 123 9 45 2 92% 30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 260 15 108 8 94% 71->72, 72, 77-78, 123->124, 124, 131-132, 143, 202->204, 217->218, 218, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457
nvtabular/loader/tensorflow.py 117 8 52 7 90% 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70
nvtabular/loader/torch.py 41 10 8 0 67% 25-27, 30-36
nvtabular/ops/init.py 18 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 45->46, 46, 47->49, 49-52
nvtabular/ops/categorify.py 463 89 260 51 78% 203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 401->404, 404-406, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986
nvtabular/ops/clip.py 19 2 6 3 80% 43->44, 44, 52->54, 54->55, 55
nvtabular/ops/column_similarity.py 86 22 32 5 69% 79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 40 4 6 1 89% 75->76, 76, 98, 101, 104
nvtabular/ops/filter.py 21 1 6 1 93% 43->44, 44
nvtabular/ops/hash_bucket.py 31 2 18 2 88% 70->73, 73, 98->102, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51->52, 52, 66->67, 67, 81->82, 81->exit, 82
nvtabular/ops/join_external.py 66 5 28 6 88% 93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165
nvtabular/ops/join_groupby.py 77 5 28 2 93% 99->100, 100, 103->110, 174, 177, 180-181
nvtabular/ops/lambdaop.py 27 4 10 4 78% 60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 62 0 18 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 63->62, 105->107, 107-108, 128, 131-132, 135-136
nvtabular/ops/operator.py 15 1 2 1 88% 22->24, 24
nvtabular/ops/rename.py 18 3 10 3 71% 40->41, 41, 53->54, 54, 55->58, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 12 66 6 90% 139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 27 62 8 86% 22, 45, 89->90, 90-95, 104->106, 106, 126->127, 127-128, 141->137, 150-158, 170->172, 217->221, 264->265, 265, 274-280, 294->297, 297-300, 306-309
nvtabular/tools/dataset_inspector.py 77 15 34 2 72% 30->32, 32-39, 79->80, 80-96
nvtabular/tools/inspector_script.py 17 17 0 0 0% 17-75
nvtabular/utils.py 27 4 10 3 81% 26->27, 27, 28->31, 31, 37->38, 38, 53
nvtabular/worker.py 65 1 30 3 96% 69->70, 70, 77->97, 80->92
nvtabular/workflow.py 127 10 72 7 91% 32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284

TOTAL 3600 648 1523 192 79%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 78.65%
=========================== short test summary info ============================
FAILED tests/unit/test_tools.py::test_powerlaw[None-1000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_powerlaw[None-10000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_powerlaw[distro1-1000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_powerlaw[distro1-10000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_uniform[None-1000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_uniform[None-10000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_uniform[distro1-1000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_uniform[distro1-10000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_cat_rep[None-1000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_cat_rep[None-10000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_cat_rep[distro1-1000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_cat_rep[distro1-10000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_json_convert - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_full_df[None-1000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_full_df[None-100000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_full_df[distro1-1000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_full_df[distro1-100000] - KeyError: 'labs'
FAILED tests/unit/test_tools.py::test_inspect[csv] - KeyError: 'name-string'
FAILED tests/unit/test_tools.py::test_inspect[parquet] - KeyError: 'name-cat'
FAILED tests/unit/test_torch_dataloader.py::test_mh_model_support - concurren...
===== 20 failed, 532 passed, 8 skipped, 111 warnings in 486.32s (0:08:06) ======
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins4543960522503149747.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 8111e3e78efb2dcb5094177d720729acb68582fb, no merge conflicts.
Running as SYSTEM
Setting status of 8111e3e78efb2dcb5094177d720729acb68582fb to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1497/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 8111e3e78efb2dcb5094177d720729acb68582fb^{commit} # timeout=10
Checking out Revision 8111e3e78efb2dcb5094177d720729acb68582fb (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
Commit message: "Data gen and data inspect work together"
 > git rev-list --no-walk b55bef79bd496a2cf7505d302b5b48a7f4dc8da6 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins1032848889484682575.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+16.g779b544
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
    status = self.run(options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 406, in run
    pycompile=options.compile,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 76, in install_given_reqs
    auto_confirm=True
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 685, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 545, in from_dist
    link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /var/jenkins_home/workspace/nvtab_integration/nvtabular does not match installed location of nvtabular (at /var/jenkins_home/workspace/nvtab_docs/nvtabular)
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins4371275478190635855.sh

@albert17
Copy link
Contributor Author

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 8111e3e78efb2dcb5094177d720729acb68582fb, no merge conflicts.
Running as SYSTEM
Setting status of 8111e3e78efb2dcb5094177d720729acb68582fb to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1498/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 8111e3e78efb2dcb5094177d720729acb68582fb^{commit} # timeout=10
Checking out Revision 8111e3e78efb2dcb5094177d720729acb68582fb (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
Commit message: "Data gen and data inspect work together"
 > git rev-list --no-walk 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7317973124949593702.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+16.g779b544
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
    status = self.run(options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 406, in run
    pycompile=options.compile,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 76, in install_given_reqs
    auto_confirm=True
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 685, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 545, in from_dist
    link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /var/jenkins_home/workspace/nvtab_integration/nvtabular does not match installed location of nvtabular (at /var/jenkins_home/workspace/nvtab_docs/nvtabular)
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins6246142106023035049.sh

@jperez999
Copy link
Contributor

rerun tests

1 similar comment
@albert17
Copy link
Contributor Author

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 8111e3e78efb2dcb5094177d720729acb68582fb, no merge conflicts.
Running as SYSTEM
Setting status of 8111e3e78efb2dcb5094177d720729acb68582fb to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1499/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 8111e3e78efb2dcb5094177d720729acb68582fb^{commit} # timeout=10
Checking out Revision 8111e3e78efb2dcb5094177d720729acb68582fb (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
Commit message: "Data gen and data inspect work together"
 > git rev-list --no-walk 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins2916650104798956803.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+16.g779b544
    Can't uninstall 'nvtabular'. No files were found to uninstall.
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
83 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 9%]
........ [ 10%]
tests/unit/test_io.py .................................................. [ 19%]
........................................ssssssss [ 28%]
tests/unit/test_notebooks.py .... [ 28%]
tests/unit/test_ops.py ................................................. [ 37%]
........................................................................ [ 50%]
.................................... [ 56%]
tests/unit/test_s3.py .. [ 57%]
tests/unit/test_tf_dataloader.py ................... [ 60%]
tests/unit/test_tf_layers.py ........................................... [ 68%]
................................... [ 74%]
tests/unit/test_tools.py ....................... [ 78%]
tests/unit/test_torch_dataloader.py .............................. [ 84%]
tests/unit/test_workflow.py ............................................ [ 91%]
............................................. [100%]

=============================== warnings summary ===============================
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41295 instead
http_address["port"], self.http_server.port

tests/unit/test_io.py: 5 warnings
tests/unit/test_tf_dataloader.py: 24 warnings
tests/unit/test_tools.py: 1416 warnings
tests/unit/test_torch_dataloader.py: 6 warnings
tests/unit/test_workflow.py: 2 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35017 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py: 708 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
[x.dtype for x in self._data.columns], index=self._data.names

tests/unit/test_tools.py::test_full_df[None-1000]
tests/unit/test_tools.py::test_full_df[None-100000]
tests/unit/test_tools.py::test_full_df[distro1-1000]
tests/unit/test_tools.py::test_full_df[distro1-100000]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.
warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")

tests/unit/test_tools.py::test_inspect[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43507 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36691 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38165 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38857 instead
http_address["port"], self.http_server.port

tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34427 instead
http_address["port"], self.http_server.port

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 144 19 80 7 85% 53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 13-17, 54-288
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 48->49, 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 22-23, 26-45, 56-69, 72
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 1 12 1 95% 46->47, 47
nvtabular/framework_utils/torch/models.py 38 0 22 0 100%
nvtabular/framework_utils/torch/utils.py 31 4 10 2 85% 51->52, 52, 55->56, 56-58
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 78 78 26 0 0% 16-175
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 82 3 34 7 91% 129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181
nvtabular/io/dataframe_engine.py 12 1 4 1 88% 31->32, 32
nvtabular/io/dataset.py 134 18 56 10 84% 196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516
nvtabular/io/dataset_engine.py 13 0 0 0 100%
nvtabular/io/hugectr.py 45 2 22 2 91% 27->32, 32, 72->95, 99
nvtabular/io/parquet.py 124 2 40 2 98% 54->55, 55-63, 189->191
nvtabular/io/shuffle.py 25 7 10 2 63% 37->40, 38->39, 39-46
nvtabular/io/writer.py 123 9 45 2 92% 30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 260 13 108 7 95% 71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457
nvtabular/loader/tensorflow.py 117 8 52 7 90% 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70
nvtabular/loader/torch.py 41 10 8 0 67% 25-27, 30-36
nvtabular/ops/init.py 18 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 45->46, 46, 47->49, 49-52
nvtabular/ops/categorify.py 463 86 260 50 79% 203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986
nvtabular/ops/clip.py 19 2 6 3 80% 43->44, 44, 52->54, 54->55, 55
nvtabular/ops/column_similarity.py 86 22 32 5 69% 79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 40 4 6 1 89% 75->76, 76, 98, 101, 104
nvtabular/ops/filter.py 21 1 6 1 93% 43->44, 44
nvtabular/ops/hash_bucket.py 31 2 18 2 88% 70->73, 73, 98->102, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51->52, 52, 66->67, 67, 81->82, 81->exit, 82
nvtabular/ops/join_external.py 66 5 28 6 88% 93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165
nvtabular/ops/join_groupby.py 77 5 28 2 93% 99->100, 100, 103->110, 174, 177, 180-181
nvtabular/ops/lambdaop.py 27 4 10 4 78% 60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 62 0 18 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 63->62, 105->107, 107-108, 128, 131-132, 135-136
nvtabular/ops/operator.py 15 1 2 1 88% 22->24, 24
nvtabular/ops/rename.py 18 3 10 3 71% 40->41, 41, 53->54, 54, 55->58, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 12 66 6 90% 139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 168->170, 215->219, 304->307, 305->304, 307
nvtabular/tools/dataset_inspector.py 80 15 36 2 73% 30->32, 32-39, 80->81, 81-97
nvtabular/tools/inspector_script.py 17 17 0 0 0% 17-75
nvtabular/utils.py 27 4 10 3 81% 26->27, 27, 28->31, 31, 37->38, 38, 53
nvtabular/worker.py 65 1 30 3 96% 69->70, 70, 77->97, 80->92
nvtabular/workflow.py 127 10 72 7 91% 32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284

TOTAL 3602 597 1525 178 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.28%
========== 551 passed, 8 skipped, 2177 warnings in 488.69s (0:08:08) ===========
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins7097503565645035849.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 8111e3e78efb2dcb5094177d720729acb68582fb, no merge conflicts.
Running as SYSTEM
Setting status of 8111e3e78efb2dcb5094177d720729acb68582fb to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1500/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 8111e3e78efb2dcb5094177d720729acb68582fb^{commit} # timeout=10
Checking out Revision 8111e3e78efb2dcb5094177d720729acb68582fb (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
Commit message: "Data gen and data inspect work together"
 > git rev-list --no-walk 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8781814357731727211.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+28.g8111e3e
    Uninstalling nvtabular-0.3.0+28.g8111e3e:
      Successfully uninstalled nvtabular-0.3.0+28.g8111e3e
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
83 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 9%]
........ [ 10%]
tests/unit/test_io.py .................................................. [ 19%]
........................................ssssssss [ 28%]
tests/unit/test_notebooks.py .... [ 28%]
tests/unit/test_ops.py ................................................. [ 37%]
........................................................................ [ 50%]
.................................... [ 56%]
tests/unit/test_s3.py .. [ 57%]
tests/unit/test_tf_dataloader.py ................... [ 60%]
tests/unit/test_tf_layers.py ........................................... [ 68%]
................................... [ 74%]
tests/unit/test_tools.py ....................... [ 78%]
tests/unit/test_torch_dataloader.py .............................. [ 84%]
tests/unit/test_workflow.py ............................................ [ 91%]
............................................. [100%]

=============================== warnings summary ===============================
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43883 instead
http_address["port"], self.http_server.port

tests/unit/test_io.py: 5 warnings
tests/unit/test_tf_dataloader.py: 24 warnings
tests/unit/test_tools.py: 1416 warnings
tests/unit/test_torch_dataloader.py: 6 warnings
tests/unit/test_workflow.py: 2 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46197 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py: 708 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
[x.dtype for x in self._data.columns], index=self._data.names

tests/unit/test_tools.py::test_full_df[None-1000]
tests/unit/test_tools.py::test_full_df[None-100000]
tests/unit/test_tools.py::test_full_df[distro1-1000]
tests/unit/test_tools.py::test_full_df[distro1-100000]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.
warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")

tests/unit/test_tools.py::test_inspect[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35651 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33635 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39797 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42993 instead
http_address["port"], self.http_server.port

tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45851 instead
http_address["port"], self.http_server.port

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 144 19 80 7 85% 53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 13-17, 54-288
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 48->49, 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 22-23, 26-45, 56-69, 72
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 1 12 1 95% 46->47, 47
nvtabular/framework_utils/torch/models.py 38 0 22 0 100%
nvtabular/framework_utils/torch/utils.py 31 4 10 2 85% 51->52, 52, 55->56, 56-58
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 78 78 26 0 0% 16-175
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 82 3 34 7 91% 129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181
nvtabular/io/dataframe_engine.py 12 1 4 1 88% 31->32, 32
nvtabular/io/dataset.py 134 18 56 10 84% 196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516
nvtabular/io/dataset_engine.py 13 0 0 0 100%
nvtabular/io/hugectr.py 45 2 22 2 91% 27->32, 32, 72->95, 99
nvtabular/io/parquet.py 124 2 40 2 98% 54->55, 55-63, 189->191
nvtabular/io/shuffle.py 25 7 10 2 63% 37->40, 38->39, 39-46
nvtabular/io/writer.py 123 9 45 2 92% 30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 260 13 108 7 95% 71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457
nvtabular/loader/tensorflow.py 117 8 52 7 90% 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70
nvtabular/loader/torch.py 41 10 8 0 67% 25-27, 30-36
nvtabular/ops/init.py 18 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 45->46, 46, 47->49, 49-52
nvtabular/ops/categorify.py 463 86 260 50 79% 203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986
nvtabular/ops/clip.py 19 2 6 3 80% 43->44, 44, 52->54, 54->55, 55
nvtabular/ops/column_similarity.py 86 22 32 5 69% 79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 40 4 6 1 89% 75->76, 76, 98, 101, 104
nvtabular/ops/filter.py 21 1 6 1 93% 43->44, 44
nvtabular/ops/hash_bucket.py 31 2 18 2 88% 70->73, 73, 98->102, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51->52, 52, 66->67, 67, 81->82, 81->exit, 82
nvtabular/ops/join_external.py 66 5 28 6 88% 93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165
nvtabular/ops/join_groupby.py 77 5 28 2 93% 99->100, 100, 103->110, 174, 177, 180-181
nvtabular/ops/lambdaop.py 27 4 10 4 78% 60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 62 0 18 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 63->62, 105->107, 107-108, 128, 131-132, 135-136
nvtabular/ops/operator.py 15 1 2 1 88% 22->24, 24
nvtabular/ops/rename.py 18 3 10 3 71% 40->41, 41, 53->54, 54, 55->58, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 12 66 6 90% 139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 168->170, 215->219, 304->307, 305->304, 307
nvtabular/tools/dataset_inspector.py 80 15 36 2 73% 30->32, 32-39, 80->81, 81-97
nvtabular/tools/inspector_script.py 17 17 0 0 0% 17-75
nvtabular/utils.py 27 4 10 3 81% 26->27, 27, 28->31, 31, 37->38, 38, 53
nvtabular/worker.py 65 1 30 3 96% 69->70, 70, 77->97, 80->92
nvtabular/workflow.py 127 10 72 7 91% 32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284

TOTAL 3602 597 1525 178 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.28%
========== 551 passed, 8 skipped, 2177 warnings in 488.35s (0:08:08) ===========
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins844357304054879981.sh

Comment on lines 41 to 192
cluster = LocalCUDACluster()
client = Client(cluster)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have test fixtures for these - can you use that instead? https://github.com/NVIDIA/NVTabular/blob/7a8fdd7f584f3c7ca3a0acf6b61c4493e3438255/tests/conftest.py#L63-L67

The danger here is that if you throw an exception in this method the cluster won't get shutdown, causing problems in other tests.

Dask dataframe with the data
ddf : dask.dataframe.DataFrame
Dask dataframe with the correct dtypes
col: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the numpy docstring guide here for https://numpydoc.readthedocs.io/en/latest/format.html#method-docstrings (see parameters).

Suggested change
col: string
col: str

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, I will use the guide.

Dask dataframe with the correct dtypes
col: string
Col to process
data: Dictionary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data: Dictionary
data: dict

Col to process
data: Dictionary
Dictionary to store the output stats
col_type: tring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
col_type: tring
col_type: str

Dictionary to store the output stats
col_type: tring
Column type (i.e cat, cont, label)
key_names: Dictionary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
key_names: Dictionary
key_names: dict

"""
Parameters
-----------
path: str, list of str, or <dask.dataframe|cudf|pd>.DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a nvtabular.Dataset object here? That wraps all the functionality you need (takes dask dataframe/path/cudf dataframe/ pandas dataframe)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataset is the nvtabular.io.Dataset i think based on imports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paths get turned into a nvt dataset. We ask for the paths so that the class can be used from command line to generate a json file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using the Dataset object here: dataset = Dataset(path, engine=dataset_format).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the advantage here to taking a path/format and then immediately converting to a Dataset? Can' we just pass the dataset object in directly?

nvtabular/tools/dataset_inspector.py Show resolved Hide resolved

# Stop Dask Cluster
client.shutdown()
cluster.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use a context manager for this - to make sure we get shutdown/close on exceptions

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 78b7571f39b0cbf8b1ec6aa47b3c66181a9a436c, no merge conflicts.
Running as SYSTEM
Setting status of 78b7571f39b0cbf8b1ec6aa47b3c66181a9a436c to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1504/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 78b7571f39b0cbf8b1ec6aa47b3c66181a9a436c^{commit} # timeout=10
Checking out Revision 78b7571f39b0cbf8b1ec6aa47b3c66181a9a436c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 78b7571f39b0cbf8b1ec6aa47b3c66181a9a436c # timeout=10
Commit message: "Initial Stats computation as an operator"
 > git rev-list --no-walk c6c8faadcadae36f680cf2efd6cec8bf042fd483 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins2072347295624927623.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+29.g78b7571
    Uninstalling nvtabular-0.3.0+29.g78b7571:
      Successfully uninstalled nvtabular-0.3.0+29.g78b7571
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
./tests/unit/test_tools.py:12:1: F401 'nvtabular.io.Dataset' imported but unused
./tests/unit/test_tools.py:13:1: F401 'tests.conftest.client' imported but unused
./nvtabular/tools/dataset_inspector.py:17:1: F401 'contextlib.contextmanager' imported but unused
./nvtabular/tools/dataset_inspector.py:19:1: F401 'cudf' imported but unused
./nvtabular/tools/dataset_inspector.py:23:1: F401 'dask_cuda.LocalCUDACluster' imported but unused
./nvtabular/tools/dataset_inspector.py:32:21: F821 undefined name 'LocalCluster'
./nvtabular/tools/dataset_inspector.py:74:9: F841 local variable 'cats' is assigned to but never used
./nvtabular/tools/dataset_inspector.py:75:9: F841 local variable 'conts' is assigned to but never used
./nvtabular/tools/dataset_inspector.py:76:9: F841 local variable 'labels' is assigned to but never used
./nvtabular/tools/dataset_inspector.py:92:20: F821 undefined name 'all_cols'
./nvtabular/tools/dataset_inspector.py:102:24: F821 undefined name 'all_cols'
./nvtabular/ops/data_stats.py:28:6: F821 undefined name 'annotate'
./nvtabular/ops/data_stats.py:29:28: F821 undefined name 'ColumnNames'
./nvtabular/ops/data_stats.py:41:20: F821 undefined name 'ddf_dtypes'
./nvtabular/ops/data_stats.py:69:13: F841 local variable 'mean_val' is assigned to but never used
./nvtabular/ops/data_stats.py:99:13: F821 undefined name 'output'
./nvtabular/ops/data_stats.py:107:5: F821 undefined name 'transform'
./nvtabular/ops/data_stats.py:107:25: F821 undefined name 'Operator'
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins5151767239707400864.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 510e821db2a5b68be7c21092b54bf90f45d2b26a, no merge conflicts.
Running as SYSTEM
Setting status of 510e821db2a5b68be7c21092b54bf90f45d2b26a to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1512/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 510e821db2a5b68be7c21092b54bf90f45d2b26a^{commit} # timeout=10
Checking out Revision 510e821db2a5b68be7c21092b54bf90f45d2b26a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 510e821db2a5b68be7c21092b54bf90f45d2b26a # timeout=10
Commit message: "Improves but still error"
 > git rev-list --no-walk e45c08fa4242ed7f390441f915b804e578d947b6 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins7264767334017863537.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+30.g510e821
    Uninstalling nvtabular-0.3.0+30.g510e821:
      Successfully uninstalled nvtabular-0.3.0+30.g510e821
  Running setup.py develop for nvtabular
Successfully installed nvtabular
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/data_stats.py
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/tools/dataset_inspector.py
Oh no! 💥 💔 💥
2 files would be reformatted, 82 files would be left unchanged.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins4447000565955154219.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 827f7c42e7110062a24e124257c485bb219ec494, no merge conflicts.
Running as SYSTEM
Setting status of 827f7c42e7110062a24e124257c485bb219ec494 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1513/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 827f7c42e7110062a24e124257c485bb219ec494^{commit} # timeout=10
Checking out Revision 827f7c42e7110062a24e124257c485bb219ec494 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 827f7c42e7110062a24e124257c485bb219ec494 # timeout=10
Commit message: "Removes list support to simplify"
 > git rev-list --no-walk 510e821db2a5b68be7c21092b54bf90f45d2b26a # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6520956366340217961.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+31.g827f7c4
    Uninstalling nvtabular-0.3.0+31.g827f7c4:
      Successfully uninstalled nvtabular-0.3.0+31.g827f7c4
  Running setup.py develop for nvtabular
Successfully installed nvtabular
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/data_stats.py
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/tools/dataset_inspector.py
Oh no! 💥 💔 💥
2 files would be reformatted, 82 files would be left unchanged.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins1967979468793437349.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188, no merge conflicts.
Running as SYSTEM
Setting status of 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1518/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188^{commit} # timeout=10
Checking out Revision 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 # timeout=10
Commit message: "Tests inspect-datagen working"
 > git rev-list --no-walk d8eb85c4d049d9e07509cd1da7c0515dddf73027 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6924596389324111943.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+18.g10ee22c
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
    status = self.run(options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 406, in run
    pycompile=options.compile,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 76, in install_given_reqs
    auto_confirm=True
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 685, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 545, in from_dist
    link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /var/jenkins_home/workspace/nvtab_integration/nvtabular does not match installed location of nvtabular (at /var/jenkins_home/workspace/nvtab_docs/nvtabular)
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins6785374982083974648.sh

@albert17 albert17 requested a review from benfred January 23, 2021 11:05
@albert17
Copy link
Contributor Author

@benfred I applied all the changes.

@jperez999 It looks like CI is broken.

@albert17
Copy link
Contributor Author

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188, no merge conflicts.
Running as SYSTEM
Setting status of 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1519/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188^{commit} # timeout=10
Checking out Revision 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 # timeout=10
Commit message: "Tests inspect-datagen working"
 > git rev-list --no-walk 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins4792261139226930572.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+18.g10ee22c
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
    status = self.run(options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 406, in run
    pycompile=options.compile,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 76, in install_given_reqs
    auto_confirm=True
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 685, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 545, in from_dist
    link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /var/jenkins_home/workspace/nvtab_integration/nvtabular does not match installed location of nvtabular (at /var/jenkins_home/workspace/nvtab_docs/nvtabular)
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins7461381200271302484.sh

Copy link
Member

@benfred benfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor things here - but aside from that looks good.

"""
Parameters
-----------
path: str, list of str, or <dask.dataframe|cudf|pd>.DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the advantage here to taking a path/format and then immediately converting to a Dataset? Can' we just pass the dataset object in directly?

Comment on lines 118 to 122
if col_type != "labels":
data[col_type][col][key_names["min"][col_type]] = output[col]["min"]
data[col_type][col][key_names["max"][col_type]] = output[col]["max"]
data[col_type][col][key_names["mean"][col_type]] = output[col]["mean"]
if col_type == "conts":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: the key_names structure is confusing - and removing it in favour of an if/elif will reduce the number of lines slightly:

Suggested change
if col_type != "labels":
data[col_type][col][key_names["min"][col_type]] = output[col]["min"]
data[col_type][col][key_names["max"][col_type]] = output[col]["max"]
data[col_type][col][key_names["mean"][col_type]] = output[col]["mean"]
if col_type == "conts":
if col_type == "cats":
data[col_type][col]["min_entry_size"] = output[col]["min"]
data[col_type][col]["max_entry_size"] = output[col]["max"]
data[col_type][col]["avg_entry_size"] = output[col]["mean"]
elif col_type == "conts":
data[col_type][col]["min_val"] = output[col]["min"]
data[col_type][col]["max_val"] = output[col]["max"]
data[col_type][col]["mean"] = output[col]["mean"]

Comment on lines 47 to 50
if np.issubdtype(dtype, np.float):
col_type = "cont"
else:
col_type = "cat"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume that all integers are categorical columns? What if the column is an integer representing something like the users age?

Also - do we need a cat/col/label breakdown at all here? Can we just calculate different statistics based off the dtype of the column?

nvtabular/ops/data_stats.py Show resolved Hide resolved
nvtabular/ops/data_stats.py Show resolved Hide resolved
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 71db921df009b9765d69e4d10f0070eb8b99c3b8, no merge conflicts.
Running as SYSTEM
Setting status of 71db921df009b9765d69e4d10f0070eb8b99c3b8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1520/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 71db921df009b9765d69e4d10f0070eb8b99c3b8^{commit} # timeout=10
Checking out Revision 71db921df009b9765d69e4d10f0070eb8b99c3b8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 71db921df009b9765d69e4d10f0070eb8b99c3b8 # timeout=10
Commit message: "Reestructures script and fixes review"
 > git rev-list --no-walk 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8083792694137722375.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+18.g10ee22c
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
    status = self.run(options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 406, in run
    pycompile=options.compile,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 76, in install_given_reqs
    auto_confirm=True
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 685, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 545, in from_dist
    link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /var/jenkins_home/workspace/nvtab_integration/nvtabular does not match installed location of nvtabular (at /var/jenkins_home/workspace/nvtab_docs/nvtabular)
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins4579420982097147350.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit f71fd9d8d9f39ca9e4125f85eda14784a30bcd09, no merge conflicts.
Running as SYSTEM
Setting status of f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1521/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse f71fd9d8d9f39ca9e4125f85eda14784a30bcd09^{commit} # timeout=10
Checking out Revision f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 # timeout=10
Commit message: "All working"
 > git rev-list --no-walk 71db921df009b9765d69e4d10f0070eb8b99c3b8 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins3160668179631641617.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+18.g10ee22c
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
    status = self.run(options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 406, in run
    pycompile=options.compile,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 76, in install_given_reqs
    auto_confirm=True
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 685, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 545, in from_dist
    link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /var/jenkins_home/workspace/nvtab_integration/nvtabular does not match installed location of nvtabular (at /var/jenkins_home/workspace/nvtab_docs/nvtabular)
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins3355861542452888696.sh

@albert17 albert17 requested a review from benfred January 26, 2021 00:47
@albert17
Copy link
Contributor Author

@benfred Changes applied.

I have added more configuration options for the Dask cluster, and I got that part out of the inspector to the script. I think it makes more sense to just pass a client to the Inspector rather than dealing inside wit this.

@benfred
Copy link
Member

benfred commented Jan 26, 2021

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit f71fd9d8d9f39ca9e4125f85eda14784a30bcd09, no merge conflicts.
Running as SYSTEM
Setting status of f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1522/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse f71fd9d8d9f39ca9e4125f85eda14784a30bcd09^{commit} # timeout=10
Checking out Revision f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 # timeout=10
Commit message: "All working"
 > git rev-list --no-walk f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7532762471586864464.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+18.g10ee22c
    Can't uninstall 'nvtabular'. No files were found to uninstall.
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 9%]
........ [ 10%]
tests/unit/test_io.py .................................................. [ 19%]
........................................ssssssss [ 28%]
tests/unit/test_notebooks.py .... [ 28%]
tests/unit/test_ops.py ................................................. [ 37%]
........................................................................ [ 50%]
..................................... [ 57%]
tests/unit/test_s3.py .. [ 57%]
tests/unit/test_tf_dataloader.py ................... [ 60%]
tests/unit/test_tf_layers.py ........................................... [ 68%]
................................... [ 74%]
tests/unit/test_tools.py ...................... [ 78%]
tests/unit/test_torch_dataloader.py .............................. [ 84%]
tests/unit/test_workflow.py ............................................ [ 91%]
.Build timed out (after 20 minutes). Marking the build as failed.
..Build was aborted
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins8151429897635378600.sh

@benfred
Copy link
Member

benfred commented Jan 26, 2021

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit f71fd9d8d9f39ca9e4125f85eda14784a30bcd09, no merge conflicts.
Running as SYSTEM
Setting status of f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1523/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse f71fd9d8d9f39ca9e4125f85eda14784a30bcd09^{commit} # timeout=10
Checking out Revision f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 # timeout=10
Commit message: "All working"
 > git rev-list --no-walk f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8861694820751636797.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+39.gf71fd9d
    Uninstalling nvtabular-0.3.0+39.gf71fd9d:
      Successfully uninstalled nvtabular-0.3.0+39.gf71fd9d
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 9%]
........ [ 10%]
tests/unit/test_io.py .................................................. [ 19%]
........................................ssssssss [ 28%]
tests/unit/test_notebooks.py .... [ 28%]
tests/unit/test_ops.py ................................................. [ 37%]
........................................................................ [ 50%]
..................................... [ 57%]
tests/unit/test_s3.py .. [ 57%]
tests/unit/test_tf_dataloader.py ................... [ 60%]
tests/unit/test_tf_layers.py ........................................... [ 68%]
................................... [ 74%]
tests/unit/test_tools.py .....................F [ 78%]
tests/unit/test_torch_dataloader.py .............................. [ 84%]
tests/unit/test_workflow.py ............................................ [ 91%]
............................................. [100%]

=================================== FAILURES ===================================
____________________ test_inspect_datagen[uniform-parquet] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-23/test_inspect_datagen_uniform_p0')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-23/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-23/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-23/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-23/parquet0')}
engine = 'parquet', dist = 'uniform'

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("dist", ["uniform"])
def test_inspect_datagen(tmpdir, datasets, engine, dist):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]

    # Create inspector and inspect
    output_inspect1 = tmpdir + "/dataset_info1.json"
    dataset = Dataset(paths, engine=engine)
    a = datains.DatasetInspector()
    a.inspect(dataset, columns_dict, output_inspect1)
    assert os.path.isfile(output_inspect1)

    # Generate dataset using data_gen tool
    output_datagen = tmpdir + "/datagen"
    os.mkdir(output_datagen)
    with fsspec.open(output_inspect1) as f:
        output1 = json.load(f)
    cols = datagen._get_cols_from_schema(output1)
    if dist == "uniform":
        df_gen = datagen.DatasetGen(datagen.UniformDistro(), gpu_frac=0.00001)
    else:
        df_gen = datagen.DatasetGen(datagen.PowerLawDistro(0.1), gpu_frac=0.00001)

    output_datagen_files = df_gen.full_df_create(
        output1["num_rows"], cols, entries=True, output=output_datagen
    )

    # Inspect again and check output are the same
    output_inspect2 = tmpdir + "/dataset_info2.json"
    dataset = Dataset(output_datagen_files, engine=engine)
    a.inspect(dataset, columns_dict, output_inspect2)
    assert os.path.isfile(output_inspect2)

    # Compare json outputs
    with fsspec.open(output_inspect2) as f:
        output2 = json.load(f)
    for k1 in output1.keys():
        if k1 == "num_rows":
            assert output1[k1] == output2[k1]
        else:
            for k2 in output1[k1].keys():
                for k3 in output1[k1][k2].keys():
                    if k3 == "dtype":
                        if output1[k1][k2][k3] == "object":
                            assert (
                                output1[k1][k2][k3] == output2[k1][k2][k3]
                                or "int64" == output2[k1][k2][k3]
                            )
                        else:
                            assert output1[k1][k2][k3] == output2[k1][k2][k3]
                    else:
                      assert output1[k1][k2][k3] == pytest.approx(
                            output2[k1][k2][k3], rel=1e-1, abs=1e-1
                        )

E assert 5.279796343439019 == 4.76510067114094 ± 4.8e-01
E + where 4.76510067114094 ± 4.8e-01 = <function approx at 0x7f473a1a2cb0>(4.76510067114094, rel=0.1, abs=0.1)
E + where <function approx at 0x7f473a1a2cb0> = pytest.approx

tests/unit/test_tools.py:220: AssertionError
=============================== warnings summary ===============================
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35563 instead
http_address["port"], self.http_server.port

tests/unit/test_io.py: 5 warnings
tests/unit/test_tf_dataloader.py: 24 warnings
tests/unit/test_tools.py: 1416 warnings
tests/unit/test_torch_dataloader.py: 6 warnings
tests/unit/test_workflow.py: 2 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41379 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py: 708 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
[x.dtype for x in self._data.columns], index=self._data.names

tests/unit/test_tools.py::test_full_df[None-1000]
tests/unit/test_tools.py::test_full_df[None-100000]
tests/unit/test_tools.py::test_full_df[distro1-1000]
tests/unit/test_tools.py::test_full_df[distro1-100000]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.
warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")

tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39229 instead
http_address["port"], self.http_server.port

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 144 19 80 7 85% 53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 13-17, 54-288
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 48->49, 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 22-23, 26-45, 56-69, 72
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 1 12 1 95% 46->47, 47
nvtabular/framework_utils/torch/models.py 38 0 22 0 100%
nvtabular/framework_utils/torch/utils.py 31 4 10 2 85% 51->52, 52, 55->56, 56-58
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 78 78 26 0 0% 16-175
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 82 3 34 7 91% 129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181
nvtabular/io/dataframe_engine.py 12 1 4 1 88% 31->32, 32
nvtabular/io/dataset.py 134 18 56 10 84% 196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516
nvtabular/io/dataset_engine.py 13 0 0 0 100%
nvtabular/io/hugectr.py 45 2 22 2 91% 27->32, 32, 72->95, 99
nvtabular/io/parquet.py 124 2 40 2 98% 54->55, 55-63, 189->191
nvtabular/io/shuffle.py 25 7 10 2 63% 37->40, 38->39, 39-46
nvtabular/io/writer.py 123 9 45 2 92% 30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 260 13 108 7 95% 71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457
nvtabular/loader/tensorflow.py 117 8 52 7 90% 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70
nvtabular/loader/torch.py 41 10 8 0 67% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 45->46, 46, 47->49, 49-52
nvtabular/ops/categorify.py 463 86 260 50 79% 203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986
nvtabular/ops/clip.py 19 2 6 3 80% 43->44, 44, 52->54, 54->55, 55
nvtabular/ops/column_similarity.py 86 22 32 5 69% 79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221
nvtabular/ops/data_stats.py 57 2 24 4 93% 84->86, 86->88, 89->80, 92->80, 100, 103
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 40 4 6 1 89% 75->76, 76, 98, 101, 104
nvtabular/ops/filter.py 21 1 6 1 93% 43->44, 44
nvtabular/ops/hash_bucket.py 31 2 18 2 88% 70->73, 73, 98->102, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51->52, 52, 66->67, 67, 81->82, 81->exit, 82
nvtabular/ops/join_external.py 66 5 28 6 88% 93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165
nvtabular/ops/join_groupby.py 77 5 28 2 93% 99->100, 100, 103->110, 174, 177, 180-181
nvtabular/ops/lambdaop.py 27 4 10 4 78% 60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 62 0 18 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 63->62, 105->107, 107-108, 128, 131-132, 135-136
nvtabular/ops/operator.py 15 1 2 1 88% 22->24, 24
nvtabular/ops/rename.py 18 3 10 3 71% 40->41, 41, 53->54, 54, 55->58, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 12 66 6 90% 139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 168->170, 215->219, 304->307, 305->304, 307
nvtabular/tools/dataset_inspector.py 52 9 18 0 76% 30-39
nvtabular/tools/inspector_script.py 45 45 0 0 0% 17-168
nvtabular/utils.py 27 4 10 3 81% 26->27, 27, 28->31, 31, 37->38, 38, 53
nvtabular/worker.py 65 1 30 3 96% 69->70, 70, 77->97, 80->92
nvtabular/workflow.py 127 10 72 7 91% 32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284

TOTAL 3660 621 1531 180 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.14%
=========================== short test summary info ============================
FAILED tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] - asse...
===== 1 failed, 550 passed, 8 skipped, 2173 warnings in 497.58s (0:08:17) ======
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5598061608533875040.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 1d4b4e16944707581c3a3ab0fac91b1c0f0ce466, no merge conflicts.
Running as SYSTEM
Setting status of 1d4b4e16944707581c3a3ab0fac91b1c0f0ce466 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1524/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 1d4b4e16944707581c3a3ab0fac91b1c0f0ce466^{commit} # timeout=10
Checking out Revision 1d4b4e16944707581c3a3ab0fac91b1c0f0ce466 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 1d4b4e16944707581c3a3ab0fac91b1c0f0ce466 # timeout=10
Commit message: "Increases error tolerance"
 > git rev-list --no-walk f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7319059244865873169.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 9%]
........ [ 10%]
tests/unit/test_io.py .................................................. [ 19%]
........................................ssssssss [ 28%]
tests/unit/test_notebooks.py .... [ 28%]
tests/unit/test_ops.py ................................................. [ 37%]
........................................................................ [ 50%]
..................................... [ 57%]
tests/unit/test_s3.py .. [ 57%]
tests/unit/test_tf_dataloader.py ................... [ 60%]
tests/unit/test_tf_layers.py ........................................... [ 68%]
................................... [ 74%]
tests/unit/test_tools.py ...................... [ 78%]
tests/unit/test_torch_dataloader.py .............................. [ 84%]
tests/unit/test_workflow.py ............................................ [ 91%]
............................................. [100%]

=============================== warnings summary ===============================
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35811 instead
http_address["port"], self.http_server.port

tests/unit/test_io.py: 5 warnings
tests/unit/test_tf_dataloader.py: 24 warnings
tests/unit/test_tools.py: 1416 warnings
tests/unit/test_torch_dataloader.py: 6 warnings
tests/unit/test_workflow.py: 2 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43095 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py: 708 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
[x.dtype for x in self._data.columns], index=self._data.names

tests/unit/test_tools.py::test_full_df[None-1000]
tests/unit/test_tools.py::test_full_df[None-100000]
tests/unit/test_tools.py::test_full_df[distro1-1000]
tests/unit/test_tools.py::test_full_df[distro1-100000]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.
warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")

tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46369 instead
http_address["port"], self.http_server.port

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 144 19 80 7 85% 53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 13-17, 54-288
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 48->49, 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 22-23, 26-45, 56-69, 72
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 1 12 1 95% 46->47, 47
nvtabular/framework_utils/torch/models.py 38 0 22 0 100%
nvtabular/framework_utils/torch/utils.py 31 4 10 2 85% 51->52, 52, 55->56, 56-58
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 78 78 26 0 0% 16-175
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 82 3 34 7 91% 129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181
nvtabular/io/dataframe_engine.py 12 1 4 1 88% 31->32, 32
nvtabular/io/dataset.py 134 18 56 10 84% 196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516
nvtabular/io/dataset_engine.py 13 0 0 0 100%
nvtabular/io/hugectr.py 45 2 22 2 91% 27->32, 32, 72->95, 99
nvtabular/io/parquet.py 124 2 40 2 98% 54->55, 55-63, 189->191
nvtabular/io/shuffle.py 25 7 10 2 63% 37->40, 38->39, 39-46
nvtabular/io/writer.py 123 9 45 2 92% 30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 260 13 108 7 95% 71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457
nvtabular/loader/tensorflow.py 117 8 52 7 90% 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70
nvtabular/loader/torch.py 41 10 8 0 67% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 45->46, 46, 47->49, 49-52
nvtabular/ops/categorify.py 463 86 260 50 79% 203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986
nvtabular/ops/clip.py 19 2 6 3 80% 43->44, 44, 52->54, 54->55, 55
nvtabular/ops/column_similarity.py 86 22 32 5 69% 79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221
nvtabular/ops/data_stats.py 57 2 24 4 93% 84->86, 86->88, 89->80, 92->80, 100, 103
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 40 4 6 1 89% 75->76, 76, 98, 101, 104
nvtabular/ops/filter.py 21 1 6 1 93% 43->44, 44
nvtabular/ops/hash_bucket.py 31 2 18 2 88% 70->73, 73, 98->102, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51->52, 52, 66->67, 67, 81->82, 81->exit, 82
nvtabular/ops/join_external.py 66 5 28 6 88% 93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165
nvtabular/ops/join_groupby.py 77 5 28 2 93% 99->100, 100, 103->110, 174, 177, 180-181
nvtabular/ops/lambdaop.py 27 4 10 4 78% 60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 62 0 18 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 63->62, 105->107, 107-108, 128, 131-132, 135-136
nvtabular/ops/operator.py 15 1 2 1 88% 22->24, 24
nvtabular/ops/rename.py 18 3 10 3 71% 40->41, 41, 53->54, 54, 55->58, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 12 66 6 90% 139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 168->170, 215->219, 304->307, 305->304, 307
nvtabular/tools/dataset_inspector.py 52 9 18 0 76% 30-39
nvtabular/tools/inspector_script.py 45 45 0 0 0% 17-168
nvtabular/utils.py 27 4 10 3 81% 26->27, 27, 28->31, 31, 37->38, 38, 53
nvtabular/worker.py 65 1 30 3 96% 69->70, 70, 77->97, 80->92
nvtabular/workflow.py 127 10 72 7 91% 32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284

TOTAL 3660 621 1531 180 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.14%
========== 551 passed, 8 skipped, 2173 warnings in 457.01s (0:07:37) ===========
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins6824405630993703858.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #521 of commit 635376ececa34948c08dc6c133e33bc3c5d097ee, no merge conflicts.
Running as SYSTEM
Setting status of 635376ececa34948c08dc6c133e33bc3c5d097ee to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1525/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 635376ececa34948c08dc6c133e33bc3c5d097ee^{commit} # timeout=10
Checking out Revision 635376ececa34948c08dc6c133e33bc3c5d097ee (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 635376ececa34948c08dc6c133e33bc3c5d097ee # timeout=10
Commit message: "All Working"
 > git rev-list --no-walk 1d4b4e16944707581c3a3ab0fac91b1c0f0ce466 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins3994755179524787056.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+39.g635376e
    Uninstalling nvtabular-0.3.0+39.g635376e:
      Successfully uninstalled nvtabular-0.3.0+39.g635376e
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 9%]
........ [ 10%]
tests/unit/test_io.py .................................................. [ 19%]
........................................ssssssss [ 28%]
tests/unit/test_notebooks.py .... [ 28%]
tests/unit/test_ops.py ................................................. [ 37%]
........................................................................ [ 50%]
..................................... [ 57%]
tests/unit/test_s3.py .. [ 57%]
tests/unit/test_tf_dataloader.py ................... [ 60%]
tests/unit/test_tf_layers.py ........................................... [ 68%]
................................... [ 74%]
tests/unit/test_tools.py ...................... [ 78%]
tests/unit/test_torch_dataloader.py .............................. [ 84%]
tests/unit/test_workflow.py ............................................ [ 91%]
............................................. [100%]

=============================== warnings summary ===============================
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219
/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_column_group.py::test_nested_column_group
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))

tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40037 instead
http_address["port"], self.http_server.port

tests/unit/test_io.py: 5 warnings
tests/unit/test_tf_dataloader.py: 24 warnings
tests/unit/test_tools.py: 1416 warnings
tests/unit/test_torch_dataloader.py: 6 warnings
tests/unit/test_workflow.py: 2 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46529 instead
http_address["port"], self.http_server.port

tests/unit/test_tools.py: 708 warnings
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
[x.dtype for x in self._data.columns], index=self._data.names

tests/unit/test_tools.py::test_full_df[None-1000]
tests/unit/test_tools.py::test_full_df[None-100000]
tests/unit/test_tools.py::test_full_df[distro1-1000]
tests/unit/test_tools.py::test_full_df[distro1-100000]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.
warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")

tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40585 instead
http_address["port"], self.http_server.port

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 144 19 80 7 85% 53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 13-17, 54-288
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 48->49, 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 22-23, 26-45, 56-69, 72
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 1 12 1 95% 46->47, 47
nvtabular/framework_utils/torch/models.py 38 0 22 0 100%
nvtabular/framework_utils/torch/utils.py 31 4 10 2 85% 51->52, 52, 55->56, 56-58
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 78 78 26 0 0% 16-175
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 82 3 34 7 91% 129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181
nvtabular/io/dataframe_engine.py 12 1 4 1 88% 31->32, 32
nvtabular/io/dataset.py 134 18 56 10 84% 196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516
nvtabular/io/dataset_engine.py 13 0 0 0 100%
nvtabular/io/hugectr.py 45 2 22 2 91% 27->32, 32, 72->95, 99
nvtabular/io/parquet.py 124 2 40 2 98% 54->55, 55-63, 189->191
nvtabular/io/shuffle.py 25 7 10 2 63% 37->40, 38->39, 39-46
nvtabular/io/writer.py 123 9 45 2 92% 30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 260 13 108 7 95% 71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457
nvtabular/loader/tensorflow.py 117 8 52 7 90% 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70
nvtabular/loader/torch.py 41 10 8 0 67% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 45->46, 46, 47->49, 49-52
nvtabular/ops/categorify.py 463 86 260 50 79% 203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986
nvtabular/ops/clip.py 19 2 6 3 80% 43->44, 44, 52->54, 54->55, 55
nvtabular/ops/column_similarity.py 86 22 32 5 69% 79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221
nvtabular/ops/data_stats.py 57 2 24 4 93% 84->86, 86->88, 89->80, 92->80, 100, 103
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 40 4 6 1 89% 75->76, 76, 98, 101, 104
nvtabular/ops/filter.py 21 1 6 1 93% 43->44, 44
nvtabular/ops/hash_bucket.py 31 2 18 2 88% 70->73, 73, 98->102, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51->52, 52, 66->67, 67, 81->82, 81->exit, 82
nvtabular/ops/join_external.py 66 5 28 6 88% 93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165
nvtabular/ops/join_groupby.py 77 5 28 2 93% 99->100, 100, 103->110, 174, 177, 180-181
nvtabular/ops/lambdaop.py 27 4 10 4 78% 60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 62 0 18 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 63->62, 105->107, 107-108, 128, 131-132, 135-136
nvtabular/ops/operator.py 15 1 2 1 88% 22->24, 24
nvtabular/ops/rename.py 18 3 10 3 71% 40->41, 41, 53->54, 54, 55->58, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 12 66 6 90% 139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 168->170, 215->219, 304->307, 305->304, 307
nvtabular/tools/dataset_inspector.py 52 9 18 0 76% 30-39
nvtabular/tools/inspector_script.py 45 45 0 0 0% 17-168
nvtabular/utils.py 27 4 10 3 81% 26->27, 27, 28->31, 31, 37->38, 38, 53
nvtabular/worker.py 65 1 30 3 96% 69->70, 70, 77->97, 80->92
nvtabular/workflow.py 127 10 72 7 91% 32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284

TOTAL 3660 621 1531 180 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.14%
========== 551 passed, 8 skipped, 2173 warnings in 454.22s (0:07:34) ===========
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop
loop.start()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit
logger.warning("Restarting worker")
Message: 'Restarting worker'
Arguments: ()
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5330818293151064816.sh

@albert17 albert17 merged commit db22a41 into NVIDIA-Merlin:main Jan 26, 2021
benfred added a commit to benfred/NVTabular that referenced this pull request Jan 30, 2021
It looks like some files got commited recently (in NVIDIA-Merlin#521) with windows style linefeed/carriage
returns instead of just the linefeed in the rest of the codebase. Fix so we don't
generate massive whitespace diffs on every commit.
benfred added a commit that referenced this pull request Jan 30, 2021
It looks like some files got commited recently (in #521) with windows style linefeed/carriage
returns instead of just the linefeed in the rest of the codebase. Fix so we don't
generate massive whitespace diffs on every commit.
@albert17 albert17 deleted the data-inspect branch April 20, 2021 16:57
mikemckiernan pushed a commit that referenced this pull request Nov 24, 2022
* Initial commit in new branch

* Adds unit test

* Updates json output and multihot calculation

* Updates list processing

* Updates test

* Adds cudf issue

* Data inspector ready

* Test works

* Dataset inspect read - Tests passing

* Moves dataset inspector script

* Initial inspect-datagent test

* Data gen and data inspect work together

* Initial Stats computation as an operator

* Improves but still error

* Removes list support to simplify

* Different Series type for computations

* Cleans and use attributes

* Data Stats Operator working

* Tests inspect-datagen working

* Reestructures script and fixes review

* All Working
mikemckiernan pushed a commit that referenced this pull request Nov 24, 2022
It looks like some files got commited recently (in #521) with windows style linefeed/carriage
returns instead of just the linefeed in the rest of the codebase. Fix so we don't
generate massive whitespace diffs on every commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Dataset Generation tool
4 participants