Data inspect #521

albert17 · 2021-01-08T17:34:38Z

Opening because #510 was closed after new_api branch was merged into main.

@jperez999 I applied the feedback you told me, and I also added unit test. I have seen your data generation branch, I can follow your packaging style when you get that PR Merged.

Pending for the future:

Fix dask-cudf dtypes for lists: We need cudf support, there is an issue created ([FEA] Add list len support rapidsai/cudf#7157)
When list are supported, test them in the unit tests. @benfred can I modify the testing dataset to add one column that is a list?

nvidia-merlin-bot · 2021-01-08T17:42:08Z

Click to view CI Results

GitHub pull request #521 of commit 3e29ef32bd211f40bca067ded04651caa09409ef, no merge conflicts.
Running as SYSTEM
Setting status of 3e29ef32bd211f40bca067ded04651caa09409ef to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1449/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 3e29ef32bd211f40bca067ded04651caa09409ef^{commit} # timeout=10
Checking out Revision 3e29ef32bd211f40bca067ded04651caa09409ef (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3e29ef32bd211f40bca067ded04651caa09409ef # timeout=10
Commit message: "Adds unit test"
 > git rev-list --no-walk bc6dc7c51f0acd5514888b6be647907efde89d10 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins3274625358681975311.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
81 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 537 items
tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [  9%]

........                                                                 [ 10%]

tests/unit/test_io.py .................................................. [ 20%]

........................................ssssssss                         [ 29%]

tests/unit/test_notebooks.py ....                                        [ 29%]

tests/unit/test_ops.py ................................................. [ 38%]

........................................................................ [ 52%]

....................................                                     [ 59%]

tests/unit/test_s3.py ..                                                 [ 59%]

tests/unit/test_tf_dataloader.py ...................                     [ 62%]

tests/unit/test_tf_layers.py ........................................... [ 70%]

...................................                                      [ 77%]

tests/unit/test_tools.py FF                                              [ 77%]

tests/unit/test_torch_dataloader.py ..............................       [ 83%]

tests/unit/test_workflow.py ............................................ [ 91%]

.............................................                            [100%]
=================================== FAILURES ===================================

______________________________ test_inspect[csv] _______________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_inspect_csv_0')

datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-1/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-1/csv0'), 'csv-no... local('/tmp/pytest-of-jenkins/pytest-1/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-1/parquet0')}

engine = 'csv'
@pytest.mark.parametrize("engine", ["csv", "parquet"])
def test_inspect(tmpdir, datasets, engine):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    output_file = tmpdir + "/dataset_info.json"

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["cats_mh"] = []
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]
    all_cols = (
        columns_dict["cats"]
        + columns_dict["cats_mh"]
        + columns_dict["conts"]
        + columns_dict["labels"]
    )

    # Create inspector and inspect


  a = nvt.tools.DatasetInspector()


E       AttributeError: module 'nvtabular' has no attribute 'tools'
tests/unit/test_tools.py:34: AttributeError

____________________________ test_inspect[parquet] _____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_inspect_parquet_0')

datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-1/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-1/csv0'), 'csv-no... local('/tmp/pytest-of-jenkins/pytest-1/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-1/parquet0')}

engine = 'parquet'
@pytest.mark.parametrize("engine", ["csv", "parquet"])
def test_inspect(tmpdir, datasets, engine):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    output_file = tmpdir + "/dataset_info.json"

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["cats_mh"] = []
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]
    all_cols = (
        columns_dict["cats"]
        + columns_dict["cats_mh"]
        + columns_dict["conts"]
        + columns_dict["labels"]
    )

    # Create inspector and inspect


  a = nvt.tools.DatasetInspector()


E       AttributeError: module 'nvtabular' has no attribute 'tools'
tests/unit/test_tools.py:34: AttributeError

=============================== warnings summary ===============================

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

return f(*args, **kwds)
tests/unit/test_column_similarity.py::test_column_similarity[tfidf-True]

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_column_similarity.py::test_column_similarity[tfidf-True]

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43315 instead

http_address["port"], self.http_server.port
tests/unit/test_io.py: 5 warnings

tests/unit/test_tf_dataloader.py: 24 warnings

tests/unit/test_torch_dataloader.py: 6 warnings

tests/unit/test_workflow.py: 2 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39161 instead

http_address["port"], self.http_server.port
tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45071 instead

http_address["port"], self.http_server.port
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        133     18     70      6    84%   66->67, 67, 77->78, 78, 81->83, 107->108, 108, 131-144, 168->171, 171, 255->258, 258

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   13-17, 54-288

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   48->49, 49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   22-23, 26-45, 56-69, 72

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27      1     12      1    95%   46->47, 47

nvtabular/framework_utils/torch/models.py                         38      0     22      0   100%

nvtabular/framework_utils/torch/utils.py                          31      4     10      2    85%   51->52, 52, 55->56, 56-58

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              78     78     26      0     0%   16-175

nvtabular/io/csv.py                                               14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                                              82      3     34      7    91%   129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181

nvtabular/io/dataframe_engine.py                                  12      1      4      1    88%   31->32, 32

nvtabular/io/dataset.py                                          134     18     56     10    84%   196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516

nvtabular/io/dataset_engine.py                                    13      0      0      0   100%

nvtabular/io/hugectr.py                                           45      2     22      2    91%   27->32, 32, 72->95, 99

nvtabular/io/parquet.py                                          124      2     40      2    98%   54->55, 55-63, 187->189

nvtabular/io/shuffle.py                                           25      7     10      2    63%   37->40, 38->39, 39-46

nvtabular/io/writer.py                                           123      9     45      2    92%   30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205

nvtabular/io/writer_factory.py                                    16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      260     13    108      7    95%   71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457

nvtabular/loader/tensorflow.py                                   117      8     52      7    90%   51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70

nvtabular/loader/torch.py                                         41     10      8      0    67%   25-27, 30-36

nvtabular/ops/init.py                                         18      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   45->46, 46, 47->49, 49-52

nvtabular/ops/categorify.py                                      463     86    260     50    79%   203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986

nvtabular/ops/clip.py                                             19      2      6      3    80%   43->44, 44, 52->54, 54->55, 55

nvtabular/ops/column_similarity.py                                86     22     32      5    69%   79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             40      4      6      1    89%   75->76, 76, 98, 101, 104

nvtabular/ops/filter.py                                           21      1      6      1    93%   43->44, 44

nvtabular/ops/hash_bucket.py                                      31      2     18      2    88%   70->73, 73, 98->102, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51->52, 52, 66->67, 67, 81->82, 81->exit, 82

nvtabular/ops/join_external.py                                    66      5     28      6    88%   93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165

nvtabular/ops/join_groupby.py                                     77      5     28      2    93%   99->100, 100, 103->110, 174, 177, 180-181

nvtabular/ops/lambdaop.py                                         27      4     10      4    78%   60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          62      0     18      0   100%

nvtabular/ops/normalize.py                                        70      7     14      2    87%   63->62, 105->107, 107-108, 128, 131-132, 135-136

nvtabular/ops/operator.py                                         15      1      2      1    88%   22->24, 24

nvtabular/ops/rename.py                                           18      3     10      3    71%   40->41, 41, 53->54, 54, 55->58, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     12     66      6    90%   139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/dataset_inspector.py                              65     65     32      0     0%   17-135

nvtabular/utils.py                                                27      6     10      5    70%   26->27, 27, 28->31, 31, 37->38, 38, 40->41, 41, 45->47, 47, 53

nvtabular/worker.py                                               65      1     30      3    96%   69->70, 70, 77->97, 80->92

nvtabular/workflow.py                                            127     10     72      7    91%   32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284
TOTAL                                                           3324    630   1449    173    78%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 77.85%

=========================== short test summary info ============================

FAILED tests/unit/test_tools.py::test_inspect[csv] - AttributeError: module '...

FAILED tests/unit/test_tools.py::test_inspect[parquet] - AttributeError: modu...

====== 2 failed, 527 passed, 8 skipped, 44 warnings in 419.18s (0:06:59) =======

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins6213806168292701879.sh

nvidia-merlin-bot · 2021-01-12T20:08:30Z

Click to view CI Results

GitHub pull request #521 of commit a3c8722eeb1eac7202aff3170c4f218da3426b51, no merge conflicts.
Running as SYSTEM
Setting status of a3c8722eeb1eac7202aff3170c4f218da3426b51 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1468/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse a3c8722eeb1eac7202aff3170c4f218da3426b51^{commit} # timeout=10
Checking out Revision a3c8722eeb1eac7202aff3170c4f218da3426b51 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a3c8722eeb1eac7202aff3170c4f218da3426b51 # timeout=10
Commit message: "Updates json output and multihot calculation"
 > git rev-list --no-walk ad83c7ae8a5d34adf9d21127946f98ae40795364 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins8836009779784977150.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+15.ga3c8722
    Uninstalling nvtabular-0.3.0+15.ga3c8722:
      Successfully uninstalled nvtabular-0.3.0+15.ga3c8722
  Running setup.py develop for nvtabular
Successfully installed nvtabular
error: cannot format /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/tools/dataset_inspector.py: Cannot parse: 69:16:                 if ddf[col].dtype.leaf_type == "string":
Oh no! 💥 💔 💥
80 files would be left unchanged, 1 file would fail to reformat.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins8797665715078067190.sh

nvidia-merlin-bot · 2021-01-12T22:35:10Z

Click to view CI Results

GitHub pull request #521 of commit 2a52415d4f25ba13d8405b1e5496a0751958eb7b, no merge conflicts.
Running as SYSTEM
Setting status of 2a52415d4f25ba13d8405b1e5496a0751958eb7b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1470/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 2a52415d4f25ba13d8405b1e5496a0751958eb7b^{commit} # timeout=10
Checking out Revision 2a52415d4f25ba13d8405b1e5496a0751958eb7b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 2a52415d4f25ba13d8405b1e5496a0751958eb7b # timeout=10
Commit message: "Updates list processing"
 > git rev-list --no-walk fcfd38534871827d001d98c9db23af626749e375 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins5268918968425135953.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+16.g2a52415
    Uninstalling nvtabular-0.3.0+16.g2a52415:
      Successfully uninstalled nvtabular-0.3.0+16.g2a52415
  Running setup.py develop for nvtabular
Successfully installed nvtabular
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/tools/dataset_inspector.py
Oh no! 💥 💔 💥
1 file would be reformatted, 80 files would be left unchanged.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins3675556177946569106.sh

nvidia-merlin-bot · 2021-01-12T23:39:02Z

Click to view CI Results

GitHub pull request #521 of commit e9b87734ff56c66d4fee7f1060b1d9af7764167f, no merge conflicts.
Running as SYSTEM
Setting status of e9b87734ff56c66d4fee7f1060b1d9af7764167f to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1471/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse e9b87734ff56c66d4fee7f1060b1d9af7764167f^{commit} # timeout=10
Checking out Revision e9b87734ff56c66d4fee7f1060b1d9af7764167f (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e9b87734ff56c66d4fee7f1060b1d9af7764167f # timeout=10
Commit message: "Updates test"
 > git rev-list --no-walk 2a52415d4f25ba13d8405b1e5496a0751958eb7b # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins4430635498173288458.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+17.ge9b8773
    Uninstalling nvtabular-0.3.0+17.ge9b8773:
      Successfully uninstalled nvtabular-0.3.0+17.ge9b8773
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
81 files would be left unchanged.
./nvtabular/tools/dataset_inspector.py:19:1: F401 'cudf' imported but unused
./nvtabular/tools/__init__.py:16:1: F401 '.dataset_inspector.DatasetInspector' imported but unused
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins8051002341542512124.sh

nvidia-merlin-bot · 2021-01-15T21:12:05Z

Click to view CI Results

GitHub pull request #521 of commit 199e818570a49d20c1f23132e2299aa595b16d2b, no merge conflicts.
Running as SYSTEM
Setting status of 199e818570a49d20c1f23132e2299aa595b16d2b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1483/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 199e818570a49d20c1f23132e2299aa595b16d2b^{commit} # timeout=10
Checking out Revision 199e818570a49d20c1f23132e2299aa595b16d2b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 199e818570a49d20c1f23132e2299aa595b16d2b # timeout=10
Commit message: "Adds cudf issue"
 > git rev-list --no-walk 7ea69ebc5bb4c9e588af5f0f37675b0a5f80ba72 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins3691191528921297468.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+18.g199e818
    Uninstalling nvtabular-0.3.0+18.g199e818:
      Successfully uninstalled nvtabular-0.3.0+18.g199e818
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
81 files would be left unchanged.
./nvtabular/tools/dataset_inspector.py:88:101: E501 line too long (103 > 100 characters)
./nvtabular/tools/dataset_inspector.py:89:101: E501 line too long (107 > 100 characters)
./nvtabular/tools/__init__.py:16:1: F401 '.dataset_inspector.DatasetInspector' imported but unused
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins5340054830216335842.sh

nvidia-merlin-bot · 2021-01-19T17:16:53Z

Click to view CI Results

GitHub pull request #521 of commit e3a807a231113f9d5b371b808c49f6f1bd80e98b, no merge conflicts.
Running as SYSTEM
Setting status of e3a807a231113f9d5b371b808c49f6f1bd80e98b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1485/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse e3a807a231113f9d5b371b808c49f6f1bd80e98b^{commit} # timeout=10
Checking out Revision e3a807a231113f9d5b371b808c49f6f1bd80e98b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e3a807a231113f9d5b371b808c49f6f1bd80e98b # timeout=10
Commit message: "Data inspector ready"
 > git rev-list --no-walk b08f8781935a592601320e5beedec1ced0a1e113 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins7470636000492385073.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [  9%]

........                                                                 [ 10%]

tests/unit/test_datagen.py .....................                         [ 14%]

tests/unit/test_io.py .................................................. [ 23%]

........................................ssssssss                         [ 31%]

tests/unit/test_notebooks.py ....                                        [ 32%]

tests/unit/test_ops.py ................................................. [ 41%]

........................................................................ [ 54%]

....................................                                     [ 60%]

tests/unit/test_s3.py ..                                                 [ 61%]

tests/unit/test_tf_dataloader.py ...................                     [ 64%]

tests/unit/test_tf_layers.py ........................................... [ 72%]

...................................                                      [ 78%]

tests/unit/test_tools.py FF                                              [ 78%]

tests/unit/test_torch_dataloader.py ..............................       [ 84%]

tests/unit/test_workflow.py ............................................ [ 91%]

.............................................                            [100%]
=================================== FAILURES ===================================

______________________________ test_inspect[csv] _______________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_inspect_csv_0')

datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-1/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-1/csv0'), 'csv-no... local('/tmp/pytest-of-jenkins/pytest-1/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-1/parquet0')}

engine = 'csv'
@pytest.mark.parametrize("engine", ["csv", "parquet"])
def test_inspect(tmpdir, datasets, engine):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    output_file = tmpdir + "/dataset_info.json"

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]
    all_cols = columns_dict["cats"] + columns_dict["conts"] + columns_dict["labels"]

    # Create inspector and inspect


  a = nvt.tools.DatasetInspector()


E       AttributeError: module 'nvtabular.tools' has no attribute 'DatasetInspector'
tests/unit/test_tools.py:28: AttributeError

____________________________ test_inspect[parquet] _____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_inspect_parquet_0')

datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-1/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-1/csv0'), 'csv-no... local('/tmp/pytest-of-jenkins/pytest-1/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-1/parquet0')}

engine = 'parquet'
@pytest.mark.parametrize("engine", ["csv", "parquet"])
def test_inspect(tmpdir, datasets, engine):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    output_file = tmpdir + "/dataset_info.json"

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]
    all_cols = columns_dict["cats"] + columns_dict["conts"] + columns_dict["labels"]

    # Create inspector and inspect


  a = nvt.tools.DatasetInspector()


E       AttributeError: module 'nvtabular.tools' has no attribute 'DatasetInspector'
tests/unit/test_tools.py:28: AttributeError

=============================== warnings summary ===============================

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

return f(*args, **kwds)
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_datagen.py: 1392 warnings

tests/unit/test_io.py: 5 warnings

tests/unit/test_tf_dataloader.py: 24 warnings

tests/unit/test_torch_dataloader.py: 6 warnings

tests/unit/test_workflow.py: 2 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_datagen.py: 696 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

[x.dtype for x in self._data.columns], index=self._data.names
tests/unit/test_datagen.py::test_full_df[None-1000]

tests/unit/test_datagen.py::test_full_df[None-100000]

tests/unit/test_datagen.py::test_full_df[distro1-1000]

tests/unit/test_datagen.py::test_full_df[distro1-100000]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.

warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")
tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42165 instead

http_address["port"], self.http_server.port
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 44197 instead

http_address["port"], self.http_server.port
tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45623 instead

http_address["port"], self.http_server.port
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        144     19     80      7    85%   53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   13-17, 54-288

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   48->49, 49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   22-23, 26-45, 56-69, 72

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27      1     12      1    95%   46->47, 47

nvtabular/framework_utils/torch/models.py                         38      0     22      0   100%

nvtabular/framework_utils/torch/utils.py                          31      4     10      2    85%   51->52, 52, 55->56, 56-58

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              78     78     26      0     0%   16-175

nvtabular/io/csv.py                                               14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                                              82      3     34      7    91%   129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181

nvtabular/io/dataframe_engine.py                                  12      1      4      1    88%   31->32, 32

nvtabular/io/dataset.py                                          134     18     56     10    84%   196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516

nvtabular/io/dataset_engine.py                                    13      0      0      0   100%

nvtabular/io/hugectr.py                                           45      2     22      2    91%   27->32, 32, 72->95, 99

nvtabular/io/parquet.py                                          124      2     40      2    98%   54->55, 55-63, 189->191

nvtabular/io/shuffle.py                                           25      7     10      2    63%   37->40, 38->39, 39-46

nvtabular/io/writer.py                                           123      9     45      2    92%   30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205

nvtabular/io/writer_factory.py                                    16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      260     13    108      7    95%   71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457

nvtabular/loader/tensorflow.py                                   117      8     52      7    90%   51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70

nvtabular/loader/torch.py                                         41     10      8      0    67%   25-27, 30-36

nvtabular/ops/init.py                                         18      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   45->46, 46, 47->49, 49-52

nvtabular/ops/categorify.py                                      463     86    260     50    79%   203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986

nvtabular/ops/clip.py                                             19      2      6      3    80%   43->44, 44, 52->54, 54->55, 55

nvtabular/ops/column_similarity.py                                86     22     32      5    69%   79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             40      4      6      1    89%   75->76, 76, 98, 101, 104

nvtabular/ops/filter.py                                           21      1      6      1    93%   43->44, 44

nvtabular/ops/hash_bucket.py                                      31      2     18      2    88%   70->73, 73, 98->102, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51->52, 52, 66->67, 67, 81->82, 81->exit, 82

nvtabular/ops/join_external.py                                    66      5     28      6    88%   93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165

nvtabular/ops/join_groupby.py                                     77      5     28      2    93%   99->100, 100, 103->110, 174, 177, 180-181

nvtabular/ops/lambdaop.py                                         27      4     10      4    78%   60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          62      0     18      0   100%

nvtabular/ops/normalize.py                                        70      7     14      2    87%   63->62, 105->107, 107-108, 128, 131-132, 135-136

nvtabular/ops/operator.py                                         15      1      2      1    88%   22->24, 24

nvtabular/ops/rename.py                                           18      3     10      3    71%   40->41, 41, 53->54, 54, 55->58, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     12     66      6    90%   139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      233      1     60      4    98%   170->172, 217->221, 306->309, 307->306, 309

nvtabular/tools/dataset_inspector.py                              77     77     34      0     0%   16-186

nvtabular/utils.py                                                27      4     10      3    81%   26->27, 27, 28->31, 31, 37->38, 38, 53

nvtabular/worker.py                                               65      1     30      3    96%   69->70, 70, 77->97, 80->92

nvtabular/workflow.py                                            127     10     72      7    91%   32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284
TOTAL                                                           3580    642   1521    176    79%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 78.95%

=========================== short test summary info ============================

FAILED tests/unit/test_tools.py::test_inspect[csv] - AttributeError: module '...

FAILED tests/unit/test_tools.py::test_inspect[parquet] - AttributeError: modu...

===== 2 failed, 549 passed, 8 skipped, 2138 warnings in 454.22s (0:07:34) ======

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins1325219421683715633.sh

nvidia-merlin-bot · 2021-01-19T17:40:10Z

Click to view CI Results

GitHub pull request #521 of commit 6bee99fcdb5d675c2c56ff59825fe34f8c2f1350, no merge conflicts.
Running as SYSTEM
Setting status of 6bee99fcdb5d675c2c56ff59825fe34f8c2f1350 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1486/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 6bee99fcdb5d675c2c56ff59825fe34f8c2f1350^{commit} # timeout=10
Checking out Revision 6bee99fcdb5d675c2c56ff59825fe34f8c2f1350 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6bee99fcdb5d675c2c56ff59825fe34f8c2f1350 # timeout=10
Commit message: "Test works"
 > git rev-list --no-walk e3a807a231113f9d5b371b808c49f6f1bd80e98b # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins2320933489149017601.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+23.g6bee99f
    Uninstalling nvtabular-0.3.0+23.g6bee99f:
      Successfully uninstalled nvtabular-0.3.0+23.g6bee99f
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 557 items / 1 error / 556 selected
==================================== ERRORS ====================================

__________________ ERROR collecting tests/unit/test_tools.py ___________________

ImportError while importing test module '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/tests/unit/test_tools.py'.

Hint: make sure your test modules/packages have valid Python names.

Traceback:

/opt/conda/envs/rapids/lib/python3.7/importlib/init.py:127: in import_module

return _bootstrap._gcd_import(name[level:], package, level)

tests/unit/test_tools.py:10: in 

import nvtabular.tools.data_inspector as datains

E   ModuleNotFoundError: No module named 'nvtabular.tools.data_inspector'

=============================== warnings summary ===============================

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

return f(*args, **kwds)
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        144    126     80      0     8%   38-66, 79-107, 120-135, 151-164, 172, 177-178, 184-191, 195-198, 202, 206-212, 217-235, 241-269, 273-278

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   13-17, 54-288

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153    130     89      0    10%   23, 27-28, 34-35, 44-71, 75-89, 98-126, 130-133, 182-194, 197-220, 223-237, 240-248, 251, 258-274, 316-320, 323-353, 356-369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     39     20      0    12%   48-52, 55-71, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   22-23, 26-45, 56-69, 72

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     20     12      0    18%   36-43, 46-51, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         38     33     22      0     8%   52-79, 82-99

nvtabular/framework_utils/torch/utils.py                          31     29     10      0     5%   47-78

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              78     78     26      0     0%   16-175

nvtabular/io/csv.py                                               14      9      4      0    28%   28-36, 39-43

nvtabular/io/dask.py                                              82     68     34      0    12%   43-65, 82-115, 120-150, 154-160, 167-182

nvtabular/io/dataframe_engine.py                                  12      7      4      0    31%   26, 29-33, 37

nvtabular/io/dataset.py                                          134    101     56      0    17%   183-250, 271-297, 325-328, 380-412, 462-473, 489, 493-500, 505-507, 510, 513-519

nvtabular/io/dataset_engine.py                                    13      7      0      0    46%   25-31

nvtabular/io/hugectr.py                                           45     35     22      0    15%   26-39, 43-56, 59-68, 71-96, 99

nvtabular/io/parquet.py                                          124     89     40      0    21%   49-67, 72-83, 90-99, 102, 115-119, 122-127, 131-144, 147-155, 158-167, 173-176, 179-183, 186-192, 197-202, 207, 214-222

nvtabular/io/shuffle.py                                           25     16     10      0    26%   34-47, 53-56

nvtabular/io/writer.py                                           123     95     45      0    17%   30, 47, 65-101, 104-107, 110, 113, 118-153, 156-177, 181-196, 200, 203-205, 208-224

nvtabular/io/writer_factory.py                                    16     11      6      0    23%   31-35, 47-55

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      260    219    108      0    11%   29, 52-56, 60, 64, 67, 70-78, 85-97, 100-132, 136-140, 143, 147-154, 173-187, 190, 194-196, 201-207, 210-213, 216-233, 236, 239-243, 258-280, 283-364, 372-375, 406-412, 420-454, 457

nvtabular/loader/tensorflow.py                                   117     92     52      0    15%   34-66, 70-83, 206-218, 236, 244-255, 267, 270, 274, 278, 281-313, 316-343, 351, 354-363

nvtabular/loader/tf_utils.py                                      55     27     20      5    44%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70, 85-90, 100-113

nvtabular/loader/torch.py                                         41     23      8      0    37%   25-27, 30-36, 74, 87, 90, 93-95, 98, 102, 106, 109-111, 121

nvtabular/ops/init.py                                         18      0      0      0   100%

nvtabular/ops/bucketize.py                                        25     16     16      0    22%   45-55, 59-67

nvtabular/ops/categorify.py                                      463    407    260      0     8%   193-233, 242-275, 278-282, 285, 288, 291-292, 296-350, 353-356, 359, 362, 377, 383-406, 410-421, 425, 429, 436-503, 510-577, 581-586, 591-627, 632-666, 670, 687-781, 798-820, 849-924, 928-933, 937-949, 953-956, 960-961, 970-987

nvtabular/ops/clip.py                                             19     11      6      0    32%   43-47, 51-56

nvtabular/ops/column_similarity.py                                86     62     32      0    20%   62-70, 74-87, 92, 118-147, 154-155, 164-166, 174-190, 199-224, 228-231, 235-236

nvtabular/ops/difference_lag.py                                   26     15      8      0    32%   56-58, 64-73, 78, 81, 84

nvtabular/ops/dropna.py                                            9      3      0      0    67%   39-41

nvtabular/ops/fill.py                                             40     17      6      0    50%   42-43, 47, 70-71, 75-80, 85-86, 90-91, 98, 101, 104

nvtabular/ops/filter.py                                           21     12      6      0    33%   42-45, 49-58

nvtabular/ops/hash_bucket.py                                      31     19     18      0    24%   68-78, 82-92, 97-102

nvtabular/ops/hashed_cross.py                                     29     18     13      0    26%   50-56, 60-71, 76, 81-84

nvtabular/ops/join_external.py                                    66     53     28      0    14%   83-96, 101-133, 136-143, 148-150, 156-166

nvtabular/ops/join_groupby.py                                     77     57     28      0    19%   85-100, 103-122, 125-126, 129-153, 156, 161-171, 174, 177, 180-181

nvtabular/ops/lambdaop.py                                         27     17     10      0    27%   59-66, 70-79, 84

nvtabular/ops/logop.py                                             9      1      0      0    89%   38

nvtabular/ops/moments.py                                          62     50     18      0    15%   30-62, 66-77, 81-86, 90-112

nvtabular/ops/normalize.py                                        70     38     14      0    38%   46-48, 52, 55-57, 61-66, 69, 72-73, 76-77, 95-96, 102-110, 116, 123-125, 128, 131-132, 135-136

nvtabular/ops/operator.py                                         15      5      2      1    65%   22->24, 24, 65, 76, 79-81

nvtabular/ops/rename.py                                           18     11     10      0    25%   40-44, 47-48, 53-58

nvtabular/ops/stat_operator.py                                    11      1      0      0    91%   30

nvtabular/ops/target_encoding.py                                 151    126     66      0    12%   134-156, 159-196, 199-202, 205, 208-216, 219, 222-223, 226-227, 230-231, 235-313, 317-347, 357-366

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      233    189     60      0    15%   18-19, 22, 27, 30-42, 45, 55-57, 64-73, 83-109, 112-120, 125-133, 136-143, 150-158, 166-182, 191-221, 225-249, 253-271, 274-280, 284-286, 292-303, 306-309, 317-319, 335-341, 359-367, 372-373, 410-422

nvtabular/tools/dataset_inspector.py                              77     77     34      0     0%   16-186

nvtabular/utils.py                                                27      6     10      5    70%   26->27, 27, 28->31, 31, 37->38, 38, 40->41, 41, 45->47, 47, 53

nvtabular/worker.py                                               65     54     30      0    12%   34-35, 44-57, 67-97, 105-122

nvtabular/workflow.py                                            127    102     72      1    13%   32->33, 33, 66-67, 85-87, 98-141, 155-156, 159-188, 191-214, 217-218, 221-222, 226-229, 233-241, 249, 254-289
TOTAL                                                           3580   2782   1521     12    16%

Coverage XML written to file coverage.xml
FAIL Required test coverage of 70% not reached. Total coverage: 15.88%

=========================== short test summary info ============================

ERROR tests/unit/test_tools.py

!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!

========================= 4 warnings, 1 error in 6.44s =========================

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins7784097497585232826.sh

nvidia-merlin-bot · 2021-01-19T18:32:18Z

Click to view CI Results

GitHub pull request #521 of commit d8a564333a67bb474d767f4d9d9baff212014e33, no merge conflicts.
Running as SYSTEM
Setting status of d8a564333a67bb474d767f4d9d9baff212014e33 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1488/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse d8a564333a67bb474d767f4d9d9baff212014e33^{commit} # timeout=10
Checking out Revision d8a564333a67bb474d767f4d9d9baff212014e33 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d8a564333a67bb474d767f4d9d9baff212014e33 # timeout=10
Commit message: "Dataset inspect read - Tests passing"
 > git rev-list --no-walk c0ae28de1e60c4fd14e4dc4e04e83775db55b7f5 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6307068742077306661.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+24.gd8a5643
    Uninstalling nvtabular-0.3.0+24.gd8a5643:
      Successfully uninstalled nvtabular-0.3.0+24.gd8a5643
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [  9%]

........                                                                 [ 10%]

tests/unit/test_datagen.py .....................                         [ 14%]

tests/unit/test_io.py .................................................. [ 23%]

........................................ssssssss                         [ 31%]

tests/unit/test_notebooks.py ....                                        [ 32%]

tests/unit/test_ops.py ................................................. [ 41%]

........................................................................ [ 54%]

....................................                                     [ 60%]

tests/unit/test_s3.py ..                                                 [ 61%]

tests/unit/test_tf_dataloader.py ...................                     [ 64%]

tests/unit/test_tf_layers.py ........................................... [ 72%]

...................................                                      [ 78%]

tests/unit/test_tools.py ..                                              [ 78%]

tests/unit/test_torch_dataloader.py ..............................       [ 84%]

tests/unit/test_workflow.py ............................................ [ 91%]

.............................................                            [100%]
=============================== warnings summary ===============================

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

return f(*args, **kwds)
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_datagen.py: 1392 warnings

tests/unit/test_io.py: 5 warnings

tests/unit/test_tf_dataloader.py: 24 warnings

tests/unit/test_torch_dataloader.py: 6 warnings

tests/unit/test_workflow.py: 2 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_datagen.py: 696 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

[x.dtype for x in self._data.columns], index=self._data.names
tests/unit/test_datagen.py::test_full_df[None-1000]

tests/unit/test_datagen.py::test_full_df[None-100000]

tests/unit/test_datagen.py::test_full_df[distro1-1000]

tests/unit/test_datagen.py::test_full_df[distro1-100000]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.

warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")
tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 36353 instead

http_address["port"], self.http_server.port
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43973 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[csv]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33719 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[csv]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 36473 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41465 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 38073 instead

http_address["port"], self.http_server.port
tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 32995 instead

http_address["port"], self.http_server.port
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        144     19     80      7    85%   53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   13-17, 54-288

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   48->49, 49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   22-23, 26-45, 56-69, 72

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27      1     12      1    95%   46->47, 47

nvtabular/framework_utils/torch/models.py                         38      0     22      0   100%

nvtabular/framework_utils/torch/utils.py                          31      4     10      2    85%   51->52, 52, 55->56, 56-58

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              78     78     26      0     0%   16-175

nvtabular/io/csv.py                                               14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                                              82      3     34      7    91%   129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181

nvtabular/io/dataframe_engine.py                                  12      1      4      1    88%   31->32, 32

nvtabular/io/dataset.py                                          134     18     56     10    84%   196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516

nvtabular/io/dataset_engine.py                                    13      0      0      0   100%

nvtabular/io/hugectr.py                                           45      2     22      2    91%   27->32, 32, 72->95, 99

nvtabular/io/parquet.py                                          124      2     40      2    98%   54->55, 55-63, 189->191

nvtabular/io/shuffle.py                                           25      7     10      2    63%   37->40, 38->39, 39-46

nvtabular/io/writer.py                                           123      9     45      2    92%   30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205

nvtabular/io/writer_factory.py                                    16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      260     13    108      7    95%   71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457

nvtabular/loader/tensorflow.py                                   117      8     52      7    90%   51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70

nvtabular/loader/torch.py                                         41     10      8      0    67%   25-27, 30-36

nvtabular/ops/init.py                                         18      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   45->46, 46, 47->49, 49-52

nvtabular/ops/categorify.py                                      463     86    260     50    79%   203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986

nvtabular/ops/clip.py                                             19      2      6      3    80%   43->44, 44, 52->54, 54->55, 55

nvtabular/ops/column_similarity.py                                86     22     32      5    69%   79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             40      4      6      1    89%   75->76, 76, 98, 101, 104

nvtabular/ops/filter.py                                           21      1      6      1    93%   43->44, 44

nvtabular/ops/hash_bucket.py                                      31      2     18      2    88%   70->73, 73, 98->102, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51->52, 52, 66->67, 67, 81->82, 81->exit, 82

nvtabular/ops/join_external.py                                    66      5     28      6    88%   93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165

nvtabular/ops/join_groupby.py                                     77      5     28      2    93%   99->100, 100, 103->110, 174, 177, 180-181

nvtabular/ops/lambdaop.py                                         27      4     10      4    78%   60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          62      0     18      0   100%

nvtabular/ops/normalize.py                                        70      7     14      2    87%   63->62, 105->107, 107-108, 128, 131-132, 135-136

nvtabular/ops/operator.py                                         15      1      2      1    88%   22->24, 24

nvtabular/ops/rename.py                                           18      3     10      3    71%   40->41, 41, 53->54, 54, 55->58, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     12     66      6    90%   139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      233      1     60      4    98%   170->172, 217->221, 306->309, 307->306, 309

nvtabular/tools/dataset_inspector.py                              77     15     34      2    72%   30->32, 32-39, 74->75, 75-91

nvtabular/utils.py                                                27      4     10      3    81%   26->27, 27, 28->31, 31, 37->38, 38, 53

nvtabular/worker.py                                               65      1     30      3    96%   69->70, 70, 77->97, 80->92

nvtabular/workflow.py                                            127     10     72      7    91%   32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284
TOTAL                                                           3580    580   1521    178    81%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 80.51%

========== 551 passed, 8 skipped, 2142 warnings in 487.53s (0:08:07) ===========

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins578013672945747349.sh

jperez999

Remove all of those examples/dataset_inspector assets. put the inspector example as a comment within the command line tool when you move it into the code base. This will allow the docs to pick that up and turn it into an example within the documentation that users can reference.

jperez999 · 2021-01-19T19:59:18Z

examples/dataset_inspector/inspector_script.py

@@ -0,0 +1,73 @@
+#


This file is actually the command line tool for the dataset Inspector class... It's not really an example. We should move this into the code base. We wont be able to actually have examples of this... In this format. It will have to be more like a readme type with examples on how to use. We wont actually have notebooks.

nvidia-merlin-bot · 2021-01-19T21:03:34Z

Click to view CI Results

GitHub pull request #521 of commit caa312cd357e2d53925db9cd53d9b2420b551221, no merge conflicts.
Running as SYSTEM
Setting status of caa312cd357e2d53925db9cd53d9b2420b551221 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1489/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse caa312cd357e2d53925db9cd53d9b2420b551221^{commit} # timeout=10
Checking out Revision caa312cd357e2d53925db9cd53d9b2420b551221 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f caa312cd357e2d53925db9cd53d9b2420b551221 # timeout=10
Commit message: "Moves dataset inspector script"
 > git rev-list --no-walk d8a564333a67bb474d767f4d9d9baff212014e33 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7688479711240999931.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+25.gcaa312c
    Uninstalling nvtabular-0.3.0+25.gcaa312c:
      Successfully uninstalled nvtabular-0.3.0+25.gcaa312c
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [  9%]

........                                                                 [ 10%]

tests/unit/test_datagen.py .....................                         [ 14%]

tests/unit/test_io.py .................................................. [ 23%]

........................................ssssssss                         [ 31%]

tests/unit/test_notebooks.py ....                                        [ 32%]

tests/unit/test_ops.py ................................................. [ 41%]

........................................................................ [ 54%]

....................................                                     [ 60%]

tests/unit/test_s3.py ..                                                 [ 61%]

tests/unit/test_tf_dataloader.py ...................                     [ 64%]

tests/unit/test_tf_layers.py .............F............................. [ 72%]

...................................                                      [ 78%]

tests/unit/test_tools.py ..                                              [ 78%]

tests/unit/test_torch_dataloader.py ..............................       [ 84%]

tests/unit/test_workflow.py ............................................ [ 91%]

.............................................                            [100%]
=================================== FAILURES ===================================

_____________ test_dot_product_interaction_layer[True-None-64-16] ______________
embedding_dim = 16, num_features = 64, interaction_type = None

self_interaction = True
@pytest.mark.parametrize("embedding_dim", [1, 4, 16])
@pytest.mark.parametrize("num_features", [1, 16, 64])
@pytest.mark.parametrize("interaction_type", [None, "field_all", "field_each", "field_interaction"])
@pytest.mark.parametrize("self_interaction", [True, False])
def test_dot_product_interaction_layer(
    embedding_dim, num_features, interaction_type, self_interaction
):
    if num_features == 1 and not self_interaction:
        return

    input = tf.keras.Input(name="x", shape=(num_features, embedding_dim), dtype=tf.float32)
    interaction_layer = layers.DotProductInteraction(interaction_type, self_interaction)
    output = interaction_layer(input)
    model = tf.keras.Model(inputs=input, outputs=output)
    model.compile("sgd", "mse")

    x = np.random.randn(8, num_features, embedding_dim).astype(np.float32)
    y_hat = model.predict(x)

    if self_interaction:
        expected_dim = num_features * (num_features + 1) // 2
    else:
        expected_dim = num_features * (num_features - 1) // 2
    assert y_hat.shape[1] == expected_dim

    if interaction_type is not None:
        W = interaction_layer.kernel.numpy()
    expected_outputs = []
    for i in range(num_features):
        j_start = i if self_interaction else i + 1
        for j in range(j_start, num_features):
            x_i = x[:, i]
            x_j = x[:, j]
            if interaction_type == "field_all":
                W_ij = W
            elif interaction_type == "field_each":
                W_ij = W[i].T
            elif interaction_type == "field_interaction":
                W_ij = W[i, j]

            if interaction_type is not None:
                x_i = x_i @ W_ij
            expected_outputs.append((x_i * x_j).sum(axis=1))
    expected_output = np.stack(expected_outputs).T

    rtol = 1e-3
    atol = 1e-6
    frac_correct = 1.0
    match = np.isclose(expected_output, y_hat, rtol=rtol, atol=atol)


  assert match.mean() >= frac_correct


E       assert 0.9999399038461538 >= 1.0

E        +  where 0.9999399038461538 = <built-in method mean of numpy.ndarray object at 0x7fb8c876cda0>()

E        +    where <built-in method mean of numpy.ndarray object at 0x7fb8c876cda0> = array([[ True,  True,  True, ...,  True,  True,  True],\n       [ True,  True,  True, ...,  True,  True,  True],\n      ...True],\n       [ True,  True,  True, ...,  True,  True,  True],\n       [ True,  True,  True, ...,  True,  True,  True]]).mean
tests/unit/test_tf_layers.py:291: AssertionError

=============================== warnings summary ===============================

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

return f(*args, **kwds)
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_datagen.py: 1392 warnings

tests/unit/test_io.py: 5 warnings

tests/unit/test_tf_dataloader.py: 24 warnings

tests/unit/test_torch_dataloader.py: 6 warnings

tests/unit/test_workflow.py: 2 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_datagen.py: 696 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

[x.dtype for x in self._data.columns], index=self._data.names
tests/unit/test_datagen.py::test_full_df[None-1000]

tests/unit/test_datagen.py::test_full_df[None-100000]

tests/unit/test_datagen.py::test_full_df[distro1-1000]

tests/unit/test_datagen.py::test_full_df[distro1-100000]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.

warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")
tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34173 instead

http_address["port"], self.http_server.port
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 44355 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[csv]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45053 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[csv]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40299 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 44719 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 44621 instead

http_address["port"], self.http_server.port
tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43463 instead

http_address["port"], self.http_server.port
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        144     19     80      7    85%   53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   13-17, 54-288

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   48->49, 49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   22-23, 26-45, 56-69, 72

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27      1     12      1    95%   46->47, 47

nvtabular/framework_utils/torch/models.py                         38      0     22      0   100%

nvtabular/framework_utils/torch/utils.py                          31      4     10      2    85%   51->52, 52, 55->56, 56-58

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              78     78     26      0     0%   16-175

nvtabular/io/csv.py                                               14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                                              82      3     34      7    91%   129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181

nvtabular/io/dataframe_engine.py                                  12      1      4      1    88%   31->32, 32

nvtabular/io/dataset.py                                          134     18     56     10    84%   196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516

nvtabular/io/dataset_engine.py                                    13      0      0      0   100%

nvtabular/io/hugectr.py                                           45      2     22      2    91%   27->32, 32, 72->95, 99

nvtabular/io/parquet.py                                          124      2     40      2    98%   54->55, 55-63, 189->191

nvtabular/io/shuffle.py                                           25      7     10      2    63%   37->40, 38->39, 39-46

nvtabular/io/writer.py                                           123      9     45      2    92%   30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205

nvtabular/io/writer_factory.py                                    16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      260     13    108      7    95%   71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457

nvtabular/loader/tensorflow.py                                   117      8     52      7    90%   51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70

nvtabular/loader/torch.py                                         41     10      8      0    67%   25-27, 30-36

nvtabular/ops/init.py                                         18      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   45->46, 46, 47->49, 49-52

nvtabular/ops/categorify.py                                      463     86    260     50    79%   203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986

nvtabular/ops/clip.py                                             19      2      6      3    80%   43->44, 44, 52->54, 54->55, 55

nvtabular/ops/column_similarity.py                                86     22     32      5    69%   79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             40      4      6      1    89%   75->76, 76, 98, 101, 104

nvtabular/ops/filter.py                                           21      1      6      1    93%   43->44, 44

nvtabular/ops/hash_bucket.py                                      31      2     18      2    88%   70->73, 73, 98->102, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51->52, 52, 66->67, 67, 81->82, 81->exit, 82

nvtabular/ops/join_external.py                                    66      5     28      6    88%   93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165

nvtabular/ops/join_groupby.py                                     77      5     28      2    93%   99->100, 100, 103->110, 174, 177, 180-181

nvtabular/ops/lambdaop.py                                         27      4     10      4    78%   60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          62      0     18      0   100%

nvtabular/ops/normalize.py                                        70      7     14      2    87%   63->62, 105->107, 107-108, 128, 131-132, 135-136

nvtabular/ops/operator.py                                         15      1      2      1    88%   22->24, 24

nvtabular/ops/rename.py                                           18      3     10      3    71%   40->41, 41, 53->54, 54, 55->58, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     12     66      6    90%   139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      233      1     60      4    98%   170->172, 217->221, 306->309, 307->306, 309

nvtabular/tools/dataset_inspector.py                              77     15     34      2    72%   30->32, 32-39, 74->75, 75-91

nvtabular/tools/inspector_script.py                               17     17      0      0     0%   17-75

nvtabular/utils.py                                                27      4     10      3    81%   26->27, 27, 28->31, 31, 37->38, 38, 53

nvtabular/worker.py                                               65      1     30      3    96%   69->70, 70, 77->97, 80->92

nvtabular/workflow.py                                            127     10     72      7    91%   32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284
TOTAL                                                           3597    597   1521    178    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 80.25%

=========================== short test summary info ============================

FAILED tests/unit/test_tf_layers.py::test_dot_product_interaction_layer[True-None-64-16]

===== 1 failed, 550 passed, 8 skipped, 2142 warnings in 489.03s (0:08:09) ======

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins5745337491396734923.sh

albert17 · 2021-01-19T21:03:55Z

rerun tests

nvidia-merlin-bot · 2021-01-19T21:12:34Z

Click to view CI Results

GitHub pull request #521 of commit caa312cd357e2d53925db9cd53d9b2420b551221, no merge conflicts.
Running as SYSTEM
Setting status of caa312cd357e2d53925db9cd53d9b2420b551221 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1490/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse caa312cd357e2d53925db9cd53d9b2420b551221^{commit} # timeout=10
Checking out Revision caa312cd357e2d53925db9cd53d9b2420b551221 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f caa312cd357e2d53925db9cd53d9b2420b551221 # timeout=10
Commit message: "Moves dataset inspector script"
 > git rev-list --no-walk caa312cd357e2d53925db9cd53d9b2420b551221 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7652535698069483529.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+25.gcaa312c
    Uninstalling nvtabular-0.3.0+25.gcaa312c:
      Successfully uninstalled nvtabular-0.3.0+25.gcaa312c
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [  9%]

........                                                                 [ 10%]

tests/unit/test_datagen.py .....................                         [ 14%]

tests/unit/test_io.py .................................................. [ 23%]

........................................ssssssss                         [ 31%]

tests/unit/test_notebooks.py ....                                        [ 32%]

tests/unit/test_ops.py ................................................. [ 41%]

........................................................................ [ 54%]

....................................                                     [ 60%]

tests/unit/test_s3.py ..                                                 [ 61%]

tests/unit/test_tf_dataloader.py ...................                     [ 64%]

tests/unit/test_tf_layers.py ........................................... [ 72%]

...................................                                      [ 78%]

tests/unit/test_tools.py ..                                              [ 78%]

tests/unit/test_torch_dataloader.py ..............................       [ 84%]

tests/unit/test_workflow.py ............................................ [ 91%]

.............................................                            [100%]
=============================== warnings summary ===============================

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

return f(*args, **kwds)
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_datagen.py: 1392 warnings

tests/unit/test_io.py: 5 warnings

tests/unit/test_tf_dataloader.py: 24 warnings

tests/unit/test_torch_dataloader.py: 6 warnings

tests/unit/test_workflow.py: 2 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_datagen.py: 696 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

[x.dtype for x in self._data.columns], index=self._data.names
tests/unit/test_datagen.py::test_full_df[None-1000]

tests/unit/test_datagen.py::test_full_df[None-100000]

tests/unit/test_datagen.py::test_full_df[distro1-1000]

tests/unit/test_datagen.py::test_full_df[distro1-100000]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.

warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")
tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43873 instead

http_address["port"], self.http_server.port
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35987 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[csv]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40693 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[csv]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 44211 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39353 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42403 instead

http_address["port"], self.http_server.port
tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 38443 instead

http_address["port"], self.http_server.port
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        144     19     80      7    85%   53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   13-17, 54-288

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   48->49, 49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   22-23, 26-45, 56-69, 72

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27      1     12      1    95%   46->47, 47

nvtabular/framework_utils/torch/models.py                         38      0     22      0   100%

nvtabular/framework_utils/torch/utils.py                          31      4     10      2    85%   51->52, 52, 55->56, 56-58

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              78     78     26      0     0%   16-175

nvtabular/io/csv.py                                               14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                                              82      3     34      7    91%   129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181

nvtabular/io/dataframe_engine.py                                  12      1      4      1    88%   31->32, 32

nvtabular/io/dataset.py                                          134     18     56     10    84%   196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516

nvtabular/io/dataset_engine.py                                    13      0      0      0   100%

nvtabular/io/hugectr.py                                           45      2     22      2    91%   27->32, 32, 72->95, 99

nvtabular/io/parquet.py                                          124      2     40      2    98%   54->55, 55-63, 189->191

nvtabular/io/shuffle.py                                           25      7     10      2    63%   37->40, 38->39, 39-46

nvtabular/io/writer.py                                           123      9     45      2    92%   30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205

nvtabular/io/writer_factory.py                                    16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      260     13    108      7    95%   71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457

nvtabular/loader/tensorflow.py                                   117      8     52      7    90%   51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70

nvtabular/loader/torch.py                                         41     10      8      0    67%   25-27, 30-36

nvtabular/ops/init.py                                         18      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   45->46, 46, 47->49, 49-52

nvtabular/ops/categorify.py                                      463     86    260     50    79%   203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986

nvtabular/ops/clip.py                                             19      2      6      3    80%   43->44, 44, 52->54, 54->55, 55

nvtabular/ops/column_similarity.py                                86     22     32      5    69%   79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             40      4      6      1    89%   75->76, 76, 98, 101, 104

nvtabular/ops/filter.py                                           21      1      6      1    93%   43->44, 44

nvtabular/ops/hash_bucket.py                                      31      2     18      2    88%   70->73, 73, 98->102, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51->52, 52, 66->67, 67, 81->82, 81->exit, 82

nvtabular/ops/join_external.py                                    66      5     28      6    88%   93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165

nvtabular/ops/join_groupby.py                                     77      5     28      2    93%   99->100, 100, 103->110, 174, 177, 180-181

nvtabular/ops/lambdaop.py                                         27      4     10      4    78%   60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          62      0     18      0   100%

nvtabular/ops/normalize.py                                        70      7     14      2    87%   63->62, 105->107, 107-108, 128, 131-132, 135-136

nvtabular/ops/operator.py                                         15      1      2      1    88%   22->24, 24

nvtabular/ops/rename.py                                           18      3     10      3    71%   40->41, 41, 53->54, 54, 55->58, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     12     66      6    90%   139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      233      1     60      4    98%   170->172, 217->221, 306->309, 307->306, 309

nvtabular/tools/dataset_inspector.py                              77     15     34      2    72%   30->32, 32-39, 74->75, 75-91

nvtabular/tools/inspector_script.py                               17     17      0      0     0%   17-75

nvtabular/utils.py                                                27      4     10      3    81%   26->27, 27, 28->31, 31, 37->38, 38, 53

nvtabular/worker.py                                               65      1     30      3    96%   69->70, 70, 77->97, 80->92

nvtabular/workflow.py                                            127     10     72      7    91%   32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284
TOTAL                                                           3597    597   1521    178    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 80.25%

========== 551 passed, 8 skipped, 2142 warnings in 488.86s (0:08:08) ===========

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins173310810183650036.sh

nvidia-merlin-bot · 2021-01-20T16:05:57Z

Click to view CI Results

GitHub pull request #521 of commit 1d2fb521abb4f3ad9fcff980eefbf3349626a53f, no merge conflicts.
Running as SYSTEM
Setting status of 1d2fb521abb4f3ad9fcff980eefbf3349626a53f to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1491/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 1d2fb521abb4f3ad9fcff980eefbf3349626a53f^{commit} # timeout=10
Checking out Revision 1d2fb521abb4f3ad9fcff980eefbf3349626a53f (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 1d2fb521abb4f3ad9fcff980eefbf3349626a53f # timeout=10
Commit message: "Initial inspect-datagent test"
 > git rev-list --no-walk caa312cd357e2d53925db9cd53d9b2420b551221 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7399859435030933647.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
83 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 560 items
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [  9%]

........                                                                 [ 10%]

tests/unit/test_io.py .................................................. [ 19%]

........................................ssssssss                         [ 28%]

tests/unit/test_notebooks.py ....                                        [ 28%]

tests/unit/test_ops.py ................................................. [ 37%]

........................................................................ [ 50%]

....................................                                     [ 56%]

tests/unit/test_s3.py ..                                                 [ 57%]

tests/unit/test_tf_dataloader.py ...................                     [ 60%]

tests/unit/test_tf_layers.py ........................................... [ 68%]

...................................                                      [ 74%]

tests/unit/test_tools.py FFFFFFFF....FFFFFFFFFFF.                        [ 78%]

tests/unit/test_torch_dataloader.py .............................F       [ 84%]

tests/unit/test_workflow.py ............................................ [ 91%]

.............................................                            [100%]
=================================== FAILURES ===================================

___________________________ test_powerlaw[None-1000] ___________________________
num_rows = 1000, distro = None
@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_powerlaw(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:56:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}

distros = None
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

__________________________ test_powerlaw[None-10000] ___________________________
num_rows = 10000, distro = None
@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_powerlaw(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:56:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}

distros = None
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

_________________________ test_powerlaw[distro1-1000] __________________________
num_rows = 1000

distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_powerlaw(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:56:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}

distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

_________________________ test_powerlaw[distro1-10000] _________________________
num_rows = 10000

distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_powerlaw(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:56:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}

distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

___________________________ test_uniform[None-1000] ____________________________
num_rows = 1000, distro = None
@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_uniform(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:72:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}

distros = None
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

___________________________ test_uniform[None-10000] ___________________________
num_rows = 10000, distro = None
@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_uniform(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:72:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}

distros = None
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

__________________________ test_uniform[distro1-1000] __________________________
num_rows = 1000

distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_uniform(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:72:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}

distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

_________________________ test_uniform[distro1-10000] __________________________
num_rows = 10000

distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_uniform(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())[1:]


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:72:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}

distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

___________________________ test_cat_rep[None-1000] ____________________________
num_rows = 1000, distro = None
@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_cat_rep(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:101:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}

distros = None
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

___________________________ test_cat_rep[None-10000] ___________________________
num_rows = 10000, distro = None
@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_cat_rep(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:101:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}

distros = None
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

__________________________ test_cat_rep[distro1-1000] __________________________
num_rows = 1000

distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_cat_rep(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:101:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}

distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

_________________________ test_cat_rep[distro1-10000] __________________________
num_rows = 10000

distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
@pytest.mark.parametrize("num_rows", [1000, 10000])
@pytest.mark.parametrize("distro", [None, distros])
def test_cat_rep(num_rows, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:101:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}

distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

______________________________ test_json_convert _______________________________
def test_json_convert():


  cols = datagen._get_cols_from_schema(json_sample)


tests/unit/test_tools.py:120:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi....float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 10000}

distros = None
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

___________________________ test_full_df[None-1000] ____________________________
num_rows = 1000

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_full_df_None_1000_0')

distro = None
@pytest.mark.parametrize("num_rows", [1000, 100000])
@pytest.mark.parametrize("distro", [None, distros])
def test_full_df(num_rows, tmpdir, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:131:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}

distros = None
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

__________________________ test_full_df[None-100000] ___________________________
num_rows = 100000

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_full_df_None_100000_0')

distro = None
@pytest.mark.parametrize("num_rows", [1000, 100000])
@pytest.mark.parametrize("distro", [None, distros])
def test_full_df(num_rows, tmpdir, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:131:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 100000}

distros = None
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

__________________________ test_full_df[distro1-1000] __________________________
num_rows = 1000

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_full_df_distro1_1000_0')

distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
@pytest.mark.parametrize("num_rows", [1000, 100000])
@pytest.mark.parametrize("distro", [None, distros])
def test_full_df(num_rows, tmpdir, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:131:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...y.float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 1000}

distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

_________________________ test_full_df[distro1-100000] _________________________
num_rows = 100000

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_full_df_distro1_100000_0')

distro = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
@pytest.mark.parametrize("num_rows", [1000, 100000])
@pytest.mark.parametrize("distro", [None, distros])
def test_full_df(num_rows, tmpdir, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())


  cols = datagen._get_cols_from_schema(json_sample, distros=distro)


tests/unit/test_tools.py:131:

schema = {'cats': {'cat_1': {'cardinality': 50, 'dtype': None, 'max_entry_size': 5, 'min_entry_size': 1, ...}, 'cat_2': {'cardi...float32'>, 'max_val': 1, 'min_val': 0}, ...}, 'labs': {'lab_1': {'cardinality': 2, 'dtype': None}}, 'num_rows': 100000}

distros = {'cat_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cat_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}, 'cont_1': {'name': 'powerlaw', 'params': {'alpha': 0.1}}, 'cont_2': {'name': 'powerlaw', 'params': {'alpha': 0.2}}}
def _get_cols_from_schema(schema, distros=None):
    """
    schema = a dictionary comprised of column information,
             where keys are column names, and the value
             contains spec info about column.

    Schema example

    conts:
        col_name:
            dtype:
            min_val:
            max_val:
            mean:
            std:
            per_nan:
    cats:
        col_name:
            dtype:
            cardinality:
            min_entry_size:
            max_entry_size:
            avg_entry_size:
            per_nan:
            multi_min:
            multi_max:
            multi_avg:

    labels:
        col_name:
            dtype:
            cardinality:
            per_nan:
    """
    cols = {}
    executor = {"conts": ContCol, "cats": CatCol, "labels": LabelCol}
    for section, vals in schema.items():
        if section == "num_rows":
            continue
        cols[section] = []
        for col_name, val in vals.items():
            v_dict = {"name": col_name}
            v_dict.update(val)
            if distros and col_name in distros:
                dis = distros[col_name]
                new_distr = DISTRO_TYPES[dis["name"]](**dis["params"])
                v_dict.update({"distro": new_distr})


          cols[section].append(executor[section](**v_dict))


E               KeyError: 'labs'
nvtabular/tools/data_gen.py:424: KeyError

______________________________ test_inspect[csv] _______________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_inspect_csv_0')

datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-1/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-1/csv0'), 'csv-no... local('/tmp/pytest-of-jenkins/pytest-1/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-1/parquet0')}

engine = 'csv'
@pytest.mark.parametrize("engine", ["csv", "parquet"])
def test_inspect(tmpdir, datasets, engine):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    output_file = tmpdir + "/dataset_info.json"

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]
    all_cols = columns_dict["cats"] + columns_dict["conts"] + columns_dict["labels"]

    # Create inspector and inspect
    a = datains.DatasetInspector()
    a.inspect(paths, engine, columns_dict, output_file)

    # Check output_file was created
    assert os.path.isfile(output_file)

    # Read output file
    with fsspec.open(output_file) as f:
        output = json.load(f)

    # Get ddf and cluster to check
    dataset = Dataset(paths, engine=engine)
    ddf = dataset.to_ddf()
    cluster = LocalCUDACluster()
    client = Client(cluster)

    # Dictionary with json output key names
    key_names = {}
    key_names["min"] = {}
    key_names["min"]["cat"] = "min_entry_size"
    key_names["min"]["cont"] = "min_val"
    key_names["max"] = {}
    key_names["max"]["cat"] = "max_entry_size"
    key_names["max"]["cont"] = "max_val"
    key_names["mean"] = {}
    key_names["mean"]["cat"] = "avg_entry_size"
    key_names["mean"]["cont"] = "mean"
    # Correct dtypes
    ddf_dtypes = ddf.head(1)

    # Check output
    for col in all_cols:
        # Check dtype for all


      assert output[col]["dtype"] == str(ddf_dtypes[col].dtype)


E           KeyError: 'name-string'
tests/unit/test_tools.py:211: KeyError

____________________________ test_inspect[parquet] _____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_inspect_parquet_0')

datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-1/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-1/csv0'), 'csv-no... local('/tmp/pytest-of-jenkins/pytest-1/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-1/parquet0')}

engine = 'parquet'
@pytest.mark.parametrize("engine", ["csv", "parquet"])
def test_inspect(tmpdir, datasets, engine):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    output_file = tmpdir + "/dataset_info.json"

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]
    all_cols = columns_dict["cats"] + columns_dict["conts"] + columns_dict["labels"]

    # Create inspector and inspect
    a = datains.DatasetInspector()
    a.inspect(paths, engine, columns_dict, output_file)

    # Check output_file was created
    assert os.path.isfile(output_file)

    # Read output file
    with fsspec.open(output_file) as f:
        output = json.load(f)

    # Get ddf and cluster to check
    dataset = Dataset(paths, engine=engine)
    ddf = dataset.to_ddf()
    cluster = LocalCUDACluster()
    client = Client(cluster)

    # Dictionary with json output key names
    key_names = {}
    key_names["min"] = {}
    key_names["min"]["cat"] = "min_entry_size"
    key_names["min"]["cont"] = "min_val"
    key_names["max"] = {}
    key_names["max"]["cat"] = "max_entry_size"
    key_names["max"]["cont"] = "max_val"
    key_names["mean"] = {}
    key_names["mean"]["cat"] = "avg_entry_size"
    key_names["mean"]["cont"] = "mean"
    # Correct dtypes
    ddf_dtypes = ddf.head(1)

    # Check output
    for col in all_cols:
        # Check dtype for all


      assert output[col]["dtype"] == str(ddf_dtypes[col].dtype)


E           KeyError: 'name-cat'
tests/unit/test_tools.py:211: KeyError

----------------------------- Captured stderr call -----------------------------

distributed.client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client

------------------------------ Captured log call -------------------------------

ERROR    asyncio:base_events.py:1619 _GatheringFuture exception was never retrieved

future: <_GatheringFuture finished exception=CancelledError()>

concurrent.futures._base.CancelledError

____________________________ test_mh_model_support _____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_mh_model_support0')
def test_mh_model_support(tmpdir):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Reviewers": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Null User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
            "Cont1": [0.3, 0.4, 0.5, 0.6],
            "Cont2": [0.3, 0.4, 0.5, 0.6],
            "Cat1": ["A", "B", "A", "C"],
        }
    )
    cat_names = ["Cat1", "Null User", "Authors", "Reviewers"]  # , "Engaging User"]
    cont_names = ["Cont1", "Cont2"]
    label_name = ["Post"]
    out_path = os.path.join(tmpdir, "train/")
    os.mkdir(out_path)

    cats = cat_names >> ops.Categorify()
    conts = cont_names >> ops.Normalize()

    processor = nvt.Workflow(cats + conts + label_name)


  df_out = processor.fit_transform(nvt.Dataset(df)).to_ddf().compute()


tests/unit/test_torch_dataloader.py:279:

/opt/conda/envs/rapids/lib/python3.7/site-packages/dask/base.py:167: in compute

(result,) = compute(self, traverse=False, **kwargs)

/opt/conda/envs/rapids/lib/python3.7/site-packages/dask/base.py:452: in compute

results = schedule(dsk, keys, **kwargs)

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py:2725: in get

results = self.gather(packed, asynchronous=asynchronous, direct=direct)

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py:1992: in gather

asynchronous=asynchronous,

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py:833: in sync

self.loop, func, *args, callback_timeout=callback_timeout, **kwargs

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py:340: in sync

raise exc.with_traceback(tb)

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py:324: in f

result[0] = yield future

self = <tornado.gen.Runner object at 0x7f29a01b6a10>
def run(self) -> None:
    """Starts or resumes the generator, running until it reaches a
    yield point that is not ready.
    """
    if self.running or self.finished:
        return
    try:
        self.running = True
        while True:
            future = self.future
            if future is None:
                raise Exception("No pending future")
            if not future.done():
                return
            self.future = None
            try:
                exc_info = None

                try:


                  value = future.result()


E                       concurrent.futures._base.CancelledError
/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/gen.py:735: CancelledError

----------------------------- Captured stderr call -----------------------------

distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors

yield

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object

return x.host_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize

header, frames = self.device_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize

header, frames = self.serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize

column_header, column_frames = column.serialize_columns(self._columns)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in 

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize

header["dtype"] = self.dtype.str

AttributeError: 'ListDtype' object has no attribute 'str'

distributed.protocol.core - CRITICAL - Failed to Serialize

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames

msg, serializers=serializers, on_error=on_error, context=context

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.batched - WARNING - Error in batched write, retrying

distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors

yield

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object

return x.host_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize

header, frames = self.device_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize

header, frames = self.serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize

column_header, column_frames = column.serialize_columns(self._columns)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in 

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize

header["dtype"] = self.dtype.str

AttributeError: 'ListDtype' object has no attribute 'str'

distributed.protocol.core - CRITICAL - Failed to Serialize

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames

msg, serializers=serializers, on_error=on_error, context=context

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.batched - WARNING - Error in batched write, retrying

distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors

yield

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object

return x.host_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize

header, frames = self.device_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize

header, frames = self.serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize

column_header, column_frames = column.serialize_columns(self._columns)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in 

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize

header["dtype"] = self.dtype.str

AttributeError: 'ListDtype' object has no attribute 'str'

distributed.protocol.core - CRITICAL - Failed to Serialize

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames

msg, serializers=serializers, on_error=on_error, context=context

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.batched - WARNING - Error in batched write, retrying

distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors

yield

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object

return x.host_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize

header, frames = self.device_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize

header, frames = self.serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize

column_header, column_frames = column.serialize_columns(self._columns)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in 

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize

header["dtype"] = self.dtype.str

AttributeError: 'ListDtype' object has no attribute 'str'

distributed.protocol.core - CRITICAL - Failed to Serialize

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames

msg, serializers=serializers, on_error=on_error, context=context

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.batched - WARNING - Error in batched write, retrying

distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors

yield

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object

return x.host_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize

header, frames = self.device_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize

header, frames = self.serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize

column_header, column_frames = column.serialize_columns(self._columns)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in 

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize

header["dtype"] = self.dtype.str

AttributeError: 'ListDtype' object has no attribute 'str'

distributed.protocol.core - CRITICAL - Failed to Serialize

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames

msg, serializers=serializers, on_error=on_error, context=context

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.batched - WARNING - Error in batched write, retrying

distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors

yield

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object

return x.host_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize

header, frames = self.device_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize

header, frames = self.serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize

column_header, column_frames = column.serialize_columns(self._columns)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in 

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize

header["dtype"] = self.dtype.str

AttributeError: 'ListDtype' object has no attribute 'str'

distributed.protocol.core - CRITICAL - Failed to Serialize

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames

msg, serializers=serializers, on_error=on_error, context=context

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.batched - WARNING - Error in batched write, retrying

distributed.utils - ERROR - 'ListDtype' object has no attribute 'str'

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors

yield

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/comm/serialize.py", line 17, in dask_serialize_cudf_object

return x.host_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 97, in host_serialize

header, frames = self.device_serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/abc.py", line 41, in device_serialize

header, frames = self.serialize()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py", line 530, in serialize

column_header, column_frames = column.serialize_columns(self._columns)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in serialize_columns

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1931, in 

header_columns = [c.serialize() for c in columns]

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 1112, in serialize

header["dtype"] = self.dtype.str

AttributeError: 'ListDtype' object has no attribute 'str'

distributed.protocol.core - CRITICAL - Failed to Serialize

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.comm.utils - ERROR - ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames

msg, serializers=serializers, on_error=on_error, context=context

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

distributed.batched - ERROR - Error in batched write

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/batched.py", line 94, in _background_send

payload, serializers=self.serializers, on_error="raise"

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/gen.py", line 735, in run

value = future.result()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/tcp.py", line 230, in write

**self.handshake_options,

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 54, in to_frames

return _to_frames()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/utils.py", line 35, in _to_frames

msg, serializers=serializers, on_error=on_error, context=context

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 54, in dumps

for key, value in data.items()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/core.py", line 55, in 

if type(value) is Serialize

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 277, in serialize

raise TypeError(msg, str(x)[:10000])

TypeError: ('Could not serialize object of type DataFrame.', '            Authors         Reviewers Engaging User  ... Cont1  Cont2  Cat1\n0          [User_A]          [User_A]        User_B  ...   0.3    0.3     A\n1  [User_A, User_E]  [User_A, User_E]        User_B  ...   0.4    0.4     B\n2  [User_B, User_C]  [User_B, User_C]        User_A  ...   0.5    0.5     A\n3          [User_C]          [User_C]        User_D  ...   0.6    0.6     C\n\n[4 rows x 8 columns]')

=============================== warnings summary ===============================

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

return f(*args, **kwds)
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39829 instead

http_address["port"], self.http_server.port
tests/unit/test_io.py: 5 warnings

tests/unit/test_tf_dataloader.py: 24 warnings

tests/unit/test_tools.py: 40 warnings

tests/unit/test_torch_dataloader.py: 6 warnings

tests/unit/test_workflow.py: 2 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45829 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py: 20 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

[x.dtype for x in self._data.columns], index=self._data.names
tests/unit/test_tools.py::test_inspect[csv]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33571 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[csv]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42493 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 36669 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33607 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33603 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.

warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42549 instead

http_address["port"], self.http_server.port
tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42421 instead

http_address["port"], self.http_server.port
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        144     19     80      7    85%   53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   13-17, 54-288

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   48->49, 49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   22-23, 26-45, 56-69, 72

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   46->47, 47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         38      6     22      7    75%   55->56, 56, 58->59, 59, 63->64, 64, 83->84, 84, 85->86, 86, 89->90, 90, 96->98

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   51->52, 52, 55->56, 56-58, 61->67, 67-69

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              78     78     26      0     0%   16-175

nvtabular/io/csv.py                                               14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                                              82      3     34      7    91%   129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181

nvtabular/io/dataframe_engine.py                                  12      1      4      1    88%   31->32, 32

nvtabular/io/dataset.py                                          134     18     56     10    84%   196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516

nvtabular/io/dataset_engine.py                                    13      0      0      0   100%

nvtabular/io/hugectr.py                                           45      2     22      2    91%   27->32, 32, 72->95, 99

nvtabular/io/parquet.py                                          124      2     40      2    98%   54->55, 55-63, 189->191

nvtabular/io/shuffle.py                                           25      7     10      2    63%   37->40, 38->39, 39-46

nvtabular/io/writer.py                                           123      9     45      2    92%   30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205

nvtabular/io/writer_factory.py                                    16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      260     15    108      8    94%   71->72, 72, 77-78, 123->124, 124, 131-132, 143, 202->204, 217->218, 218, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457

nvtabular/loader/tensorflow.py                                   117      8     52      7    90%   51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70

nvtabular/loader/torch.py                                         41     10      8      0    67%   25-27, 30-36

nvtabular/ops/init.py                                         18      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   45->46, 46, 47->49, 49-52

nvtabular/ops/categorify.py                                      463     89    260     51    78%   203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 401->404, 404-406, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986

nvtabular/ops/clip.py                                             19      2      6      3    80%   43->44, 44, 52->54, 54->55, 55

nvtabular/ops/column_similarity.py                                86     22     32      5    69%   79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             40      4      6      1    89%   75->76, 76, 98, 101, 104

nvtabular/ops/filter.py                                           21      1      6      1    93%   43->44, 44

nvtabular/ops/hash_bucket.py                                      31      2     18      2    88%   70->73, 73, 98->102, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51->52, 52, 66->67, 67, 81->82, 81->exit, 82

nvtabular/ops/join_external.py                                    66      5     28      6    88%   93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165

nvtabular/ops/join_groupby.py                                     77      5     28      2    93%   99->100, 100, 103->110, 174, 177, 180-181

nvtabular/ops/lambdaop.py                                         27      4     10      4    78%   60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          62      0     18      0   100%

nvtabular/ops/normalize.py                                        70      7     14      2    87%   63->62, 105->107, 107-108, 128, 131-132, 135-136

nvtabular/ops/operator.py                                         15      1      2      1    88%   22->24, 24

nvtabular/ops/rename.py                                           18      3     10      3    71%   40->41, 41, 53->54, 54, 55->58, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     12     66      6    90%   139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      236     27     62      8    86%   22, 45, 89->90, 90-95, 104->106, 106, 126->127, 127-128, 141->137, 150-158, 170->172, 217->221, 264->265, 265, 274-280, 294->297, 297-300, 306-309

nvtabular/tools/dataset_inspector.py                              77     15     34      2    72%   30->32, 32-39, 79->80, 80-96

nvtabular/tools/inspector_script.py                               17     17      0      0     0%   17-75

nvtabular/utils.py                                                27      4     10      3    81%   26->27, 27, 28->31, 31, 37->38, 38, 53

nvtabular/worker.py                                               65      1     30      3    96%   69->70, 70, 77->97, 80->92

nvtabular/workflow.py                                            127     10     72      7    91%   32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284
TOTAL                                                           3600    648   1523    192    79%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 78.65%

=========================== short test summary info ============================

FAILED tests/unit/test_tools.py::test_powerlaw[None-1000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_powerlaw[None-10000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_powerlaw[distro1-1000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_powerlaw[distro1-10000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_uniform[None-1000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_uniform[None-10000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_uniform[distro1-1000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_uniform[distro1-10000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_cat_rep[None-1000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_cat_rep[None-10000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_cat_rep[distro1-1000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_cat_rep[distro1-10000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_json_convert - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_full_df[None-1000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_full_df[None-100000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_full_df[distro1-1000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_full_df[distro1-100000] - KeyError: 'labs'

FAILED tests/unit/test_tools.py::test_inspect[csv] - KeyError: 'name-string'

FAILED tests/unit/test_tools.py::test_inspect[parquet] - KeyError: 'name-cat'

FAILED tests/unit/test_torch_dataloader.py::test_mh_model_support - concurren...

===== 20 failed, 532 passed, 8 skipped, 111 warnings in 486.32s (0:08:06) ======

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins4543960522503149747.sh

nvidia-merlin-bot · 2021-01-21T17:17:30Z

Click to view CI Results

GitHub pull request #521 of commit 8111e3e78efb2dcb5094177d720729acb68582fb, no merge conflicts.
Running as SYSTEM
Setting status of 8111e3e78efb2dcb5094177d720729acb68582fb to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1497/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 8111e3e78efb2dcb5094177d720729acb68582fb^{commit} # timeout=10
Checking out Revision 8111e3e78efb2dcb5094177d720729acb68582fb (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
Commit message: "Data gen and data inspect work together"
 > git rev-list --no-walk b55bef79bd496a2cf7505d302b5b48a7f4dc8da6 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins1032848889484682575.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+16.g779b544
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
    status = self.run(options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 406, in run
    pycompile=options.compile,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 76, in install_given_reqs
    auto_confirm=True
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 685, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 545, in from_dist
    link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /var/jenkins_home/workspace/nvtab_integration/nvtabular does not match installed location of nvtabular (at /var/jenkins_home/workspace/nvtab_docs/nvtabular)
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins4371275478190635855.sh

albert17 · 2021-01-21T17:19:59Z

rerun tests

nvidia-merlin-bot · 2021-01-21T17:20:14Z

Click to view CI Results

GitHub pull request #521 of commit 8111e3e78efb2dcb5094177d720729acb68582fb, no merge conflicts.
Running as SYSTEM
Setting status of 8111e3e78efb2dcb5094177d720729acb68582fb to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1498/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 8111e3e78efb2dcb5094177d720729acb68582fb^{commit} # timeout=10
Checking out Revision 8111e3e78efb2dcb5094177d720729acb68582fb (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
Commit message: "Data gen and data inspect work together"
 > git rev-list --no-walk 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7317973124949593702.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+16.g779b544
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
    status = self.run(options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 406, in run
    pycompile=options.compile,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 76, in install_given_reqs
    auto_confirm=True
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 685, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 545, in from_dist
    link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /var/jenkins_home/workspace/nvtab_integration/nvtabular does not match installed location of nvtabular (at /var/jenkins_home/workspace/nvtab_docs/nvtabular)
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins6246142106023035049.sh

jperez999 · 2021-01-21T17:25:36Z

rerun tests

albert17 · 2021-01-21T17:27:59Z

rerun tests

nvidia-merlin-bot · 2021-01-21T17:34:14Z

Click to view CI Results

GitHub pull request #521 of commit 8111e3e78efb2dcb5094177d720729acb68582fb, no merge conflicts.
Running as SYSTEM
Setting status of 8111e3e78efb2dcb5094177d720729acb68582fb to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1499/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 8111e3e78efb2dcb5094177d720729acb68582fb^{commit} # timeout=10
Checking out Revision 8111e3e78efb2dcb5094177d720729acb68582fb (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
Commit message: "Data gen and data inspect work together"
 > git rev-list --no-walk 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins2916650104798956803.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+16.g779b544
    Can't uninstall 'nvtabular'. No files were found to uninstall.
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
83 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [  9%]

........                                                                 [ 10%]

tests/unit/test_io.py .................................................. [ 19%]

........................................ssssssss                         [ 28%]

tests/unit/test_notebooks.py ....                                        [ 28%]

tests/unit/test_ops.py ................................................. [ 37%]

........................................................................ [ 50%]

....................................                                     [ 56%]

tests/unit/test_s3.py ..                                                 [ 57%]

tests/unit/test_tf_dataloader.py ...................                     [ 60%]

tests/unit/test_tf_layers.py ........................................... [ 68%]

...................................                                      [ 74%]

tests/unit/test_tools.py .......................                         [ 78%]

tests/unit/test_torch_dataloader.py ..............................       [ 84%]

tests/unit/test_workflow.py ............................................ [ 91%]

.............................................                            [100%]
=============================== warnings summary ===============================

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

return f(*args, **kwds)
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41295 instead

http_address["port"], self.http_server.port
tests/unit/test_io.py: 5 warnings

tests/unit/test_tf_dataloader.py: 24 warnings

tests/unit/test_tools.py: 1416 warnings

tests/unit/test_torch_dataloader.py: 6 warnings

tests/unit/test_workflow.py: 2 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35017 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py: 708 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

[x.dtype for x in self._data.columns], index=self._data.names
tests/unit/test_tools.py::test_full_df[None-1000]

tests/unit/test_tools.py::test_full_df[None-100000]

tests/unit/test_tools.py::test_full_df[distro1-1000]

tests/unit/test_tools.py::test_full_df[distro1-100000]

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.

warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")
tests/unit/test_tools.py::test_inspect[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43507 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 36691 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 38165 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 38857 instead

http_address["port"], self.http_server.port
tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34427 instead

http_address["port"], self.http_server.port
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        144     19     80      7    85%   53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   13-17, 54-288

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   48->49, 49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   22-23, 26-45, 56-69, 72

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27      1     12      1    95%   46->47, 47

nvtabular/framework_utils/torch/models.py                         38      0     22      0   100%

nvtabular/framework_utils/torch/utils.py                          31      4     10      2    85%   51->52, 52, 55->56, 56-58

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              78     78     26      0     0%   16-175

nvtabular/io/csv.py                                               14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                                              82      3     34      7    91%   129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181

nvtabular/io/dataframe_engine.py                                  12      1      4      1    88%   31->32, 32

nvtabular/io/dataset.py                                          134     18     56     10    84%   196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516

nvtabular/io/dataset_engine.py                                    13      0      0      0   100%

nvtabular/io/hugectr.py                                           45      2     22      2    91%   27->32, 32, 72->95, 99

nvtabular/io/parquet.py                                          124      2     40      2    98%   54->55, 55-63, 189->191

nvtabular/io/shuffle.py                                           25      7     10      2    63%   37->40, 38->39, 39-46

nvtabular/io/writer.py                                           123      9     45      2    92%   30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205

nvtabular/io/writer_factory.py                                    16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      260     13    108      7    95%   71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457

nvtabular/loader/tensorflow.py                                   117      8     52      7    90%   51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70

nvtabular/loader/torch.py                                         41     10      8      0    67%   25-27, 30-36

nvtabular/ops/init.py                                         18      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   45->46, 46, 47->49, 49-52

nvtabular/ops/categorify.py                                      463     86    260     50    79%   203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986

nvtabular/ops/clip.py                                             19      2      6      3    80%   43->44, 44, 52->54, 54->55, 55

nvtabular/ops/column_similarity.py                                86     22     32      5    69%   79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             40      4      6      1    89%   75->76, 76, 98, 101, 104

nvtabular/ops/filter.py                                           21      1      6      1    93%   43->44, 44

nvtabular/ops/hash_bucket.py                                      31      2     18      2    88%   70->73, 73, 98->102, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51->52, 52, 66->67, 67, 81->82, 81->exit, 82

nvtabular/ops/join_external.py                                    66      5     28      6    88%   93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165

nvtabular/ops/join_groupby.py                                     77      5     28      2    93%   99->100, 100, 103->110, 174, 177, 180-181

nvtabular/ops/lambdaop.py                                         27      4     10      4    78%   60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          62      0     18      0   100%

nvtabular/ops/normalize.py                                        70      7     14      2    87%   63->62, 105->107, 107-108, 128, 131-132, 135-136

nvtabular/ops/operator.py                                         15      1      2      1    88%   22->24, 24

nvtabular/ops/rename.py                                           18      3     10      3    71%   40->41, 41, 53->54, 54, 55->58, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     12     66      6    90%   139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   168->170, 215->219, 304->307, 305->304, 307

nvtabular/tools/dataset_inspector.py                              80     15     36      2    73%   30->32, 32-39, 80->81, 81-97

nvtabular/tools/inspector_script.py                               17     17      0      0     0%   17-75

nvtabular/utils.py                                                27      4     10      3    81%   26->27, 27, 28->31, 31, 37->38, 38, 53

nvtabular/worker.py                                               65      1     30      3    96%   69->70, 70, 77->97, 80->92

nvtabular/workflow.py                                            127     10     72      7    91%   32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284
TOTAL                                                           3602    597   1525    178    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 80.28%

========== 551 passed, 8 skipped, 2177 warnings in 488.69s (0:08:08) ===========

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins7097503565645035849.sh

nvidia-merlin-bot · 2021-01-21T17:42:43Z

Click to view CI Results

GitHub pull request #521 of commit 8111e3e78efb2dcb5094177d720729acb68582fb, no merge conflicts.
Running as SYSTEM
Setting status of 8111e3e78efb2dcb5094177d720729acb68582fb to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1500/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 8111e3e78efb2dcb5094177d720729acb68582fb^{commit} # timeout=10
Checking out Revision 8111e3e78efb2dcb5094177d720729acb68582fb (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
Commit message: "Data gen and data inspect work together"
 > git rev-list --no-walk 8111e3e78efb2dcb5094177d720729acb68582fb # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8781814357731727211.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+28.g8111e3e
    Uninstalling nvtabular-0.3.0+28.g8111e3e:
      Successfully uninstalled nvtabular-0.3.0+28.g8111e3e
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
83 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [  9%]

........                                                                 [ 10%]

tests/unit/test_io.py .................................................. [ 19%]

........................................ssssssss                         [ 28%]

tests/unit/test_notebooks.py ....                                        [ 28%]

tests/unit/test_ops.py ................................................. [ 37%]

........................................................................ [ 50%]

....................................                                     [ 56%]

tests/unit/test_s3.py ..                                                 [ 57%]

tests/unit/test_tf_dataloader.py ...................                     [ 60%]

tests/unit/test_tf_layers.py ........................................... [ 68%]

...................................                                      [ 74%]

tests/unit/test_tools.py .......................                         [ 78%]

tests/unit/test_torch_dataloader.py ..............................       [ 84%]

tests/unit/test_workflow.py ............................................ [ 91%]

.............................................                            [100%]
=============================== warnings summary ===============================

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

return f(*args, **kwds)
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43883 instead

http_address["port"], self.http_server.port
tests/unit/test_io.py: 5 warnings

tests/unit/test_tf_dataloader.py: 24 warnings

tests/unit/test_tools.py: 1416 warnings

tests/unit/test_torch_dataloader.py: 6 warnings

tests/unit/test_workflow.py: 2 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 46197 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py: 708 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

[x.dtype for x in self._data.columns], index=self._data.names
tests/unit/test_tools.py::test_full_df[None-1000]

tests/unit/test_tools.py::test_full_df[None-100000]

tests/unit/test_tools.py::test_full_df[distro1-1000]

tests/unit/test_tools.py::test_full_df[distro1-100000]

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.

warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")
tests/unit/test_tools.py::test_inspect[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35651 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33635 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39797 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42993 instead

http_address["port"], self.http_server.port
tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45851 instead

http_address["port"], self.http_server.port
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        144     19     80      7    85%   53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   13-17, 54-288

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   48->49, 49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   22-23, 26-45, 56-69, 72

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27      1     12      1    95%   46->47, 47

nvtabular/framework_utils/torch/models.py                         38      0     22      0   100%

nvtabular/framework_utils/torch/utils.py                          31      4     10      2    85%   51->52, 52, 55->56, 56-58

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              78     78     26      0     0%   16-175

nvtabular/io/csv.py                                               14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                                              82      3     34      7    91%   129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181

nvtabular/io/dataframe_engine.py                                  12      1      4      1    88%   31->32, 32

nvtabular/io/dataset.py                                          134     18     56     10    84%   196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516

nvtabular/io/dataset_engine.py                                    13      0      0      0   100%

nvtabular/io/hugectr.py                                           45      2     22      2    91%   27->32, 32, 72->95, 99

nvtabular/io/parquet.py                                          124      2     40      2    98%   54->55, 55-63, 189->191

nvtabular/io/shuffle.py                                           25      7     10      2    63%   37->40, 38->39, 39-46

nvtabular/io/writer.py                                           123      9     45      2    92%   30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205

nvtabular/io/writer_factory.py                                    16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      260     13    108      7    95%   71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457

nvtabular/loader/tensorflow.py                                   117      8     52      7    90%   51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70

nvtabular/loader/torch.py                                         41     10      8      0    67%   25-27, 30-36

nvtabular/ops/init.py                                         18      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   45->46, 46, 47->49, 49-52

nvtabular/ops/categorify.py                                      463     86    260     50    79%   203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986

nvtabular/ops/clip.py                                             19      2      6      3    80%   43->44, 44, 52->54, 54->55, 55

nvtabular/ops/column_similarity.py                                86     22     32      5    69%   79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             40      4      6      1    89%   75->76, 76, 98, 101, 104

nvtabular/ops/filter.py                                           21      1      6      1    93%   43->44, 44

nvtabular/ops/hash_bucket.py                                      31      2     18      2    88%   70->73, 73, 98->102, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51->52, 52, 66->67, 67, 81->82, 81->exit, 82

nvtabular/ops/join_external.py                                    66      5     28      6    88%   93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165

nvtabular/ops/join_groupby.py                                     77      5     28      2    93%   99->100, 100, 103->110, 174, 177, 180-181

nvtabular/ops/lambdaop.py                                         27      4     10      4    78%   60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          62      0     18      0   100%

nvtabular/ops/normalize.py                                        70      7     14      2    87%   63->62, 105->107, 107-108, 128, 131-132, 135-136

nvtabular/ops/operator.py                                         15      1      2      1    88%   22->24, 24

nvtabular/ops/rename.py                                           18      3     10      3    71%   40->41, 41, 53->54, 54, 55->58, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     12     66      6    90%   139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   168->170, 215->219, 304->307, 305->304, 307

nvtabular/tools/dataset_inspector.py                              80     15     36      2    73%   30->32, 32-39, 80->81, 81-97

nvtabular/tools/inspector_script.py                               17     17      0      0     0%   17-75

nvtabular/utils.py                                                27      4     10      3    81%   26->27, 27, 28->31, 31, 37->38, 38, 53

nvtabular/worker.py                                               65      1     30      3    96%   69->70, 70, 77->97, 80->92

nvtabular/workflow.py                                            127     10     72      7    91%   32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284
TOTAL                                                           3602    597   1525    178    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 80.28%

========== 551 passed, 8 skipped, 2177 warnings in 488.35s (0:08:08) ===========

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins844357304054879981.sh

benfred · 2021-01-19T18:39:33Z

tests/unit/test_tools.py

+    cluster = LocalCUDACluster()
+    client = Client(cluster)


We have test fixtures for these - can you use that instead? https://github.com/NVIDIA/NVTabular/blob/7a8fdd7f584f3c7ca3a0acf6b61c4493e3438255/tests/conftest.py#L63-L67

The danger here is that if you throw an exception in this method the cluster won't get shutdown, causing problems in other tests.

benfred · 2021-01-21T18:16:06Z

nvtabular/tools/dataset_inspector.py

+            Dask dataframe with the data
+        ddf : dask.dataframe.DataFrame
+            Dask dataframe with the correct dtypes
+        col: string


Can we use the numpy docstring guide here for https://numpydoc.readthedocs.io/en/latest/format.html#method-docstrings (see parameters).

Suggested change

col: string

col: str

Sorry about that, I will use the guide.

benfred · 2021-01-21T18:16:22Z

nvtabular/tools/dataset_inspector.py

+            Dask dataframe with the correct dtypes
+        col: string
+            Col to process
+        data: Dictionary


Suggested change

data: Dictionary

data: dict

benfred · 2021-01-21T18:16:30Z

nvtabular/tools/dataset_inspector.py

+            Col to process
+        data: Dictionary
+            Dictionary to store the output stats
+        col_type: tring


Suggested change

col_type: tring

col_type: str

benfred · 2021-01-21T18:16:38Z

nvtabular/tools/dataset_inspector.py

+            Dictionary to store the output stats
+        col_type: tring
+            Column type (i.e cat, cont, label)
+        key_names: Dictionary


Suggested change

key_names: Dictionary

key_names: dict

benfred · 2021-01-21T18:17:40Z

nvtabular/tools/dataset_inspector.py

+        """
+        Parameters
+        -----------
+        path: str, list of str, or <dask.dataframe|cudf|pd>.DataFrame


Why not use a nvtabular.Dataset object here? That wraps all the functionality you need (takes dask dataframe/path/cudf dataframe/ pandas dataframe)

Dataset is the nvtabular.io.Dataset i think based on imports.

The paths get turned into a nvt dataset. We ask for the paths so that the class can be used from command line to generate a json file.

We are using the Dataset object here: dataset = Dataset(path, engine=dataset_format).

Whats the advantage here to taking a path/format and then immediately converting to a Dataset? Can' we just pass the dataset object in directly?

nvtabular/tools/dataset_inspector.py

benfred · 2021-01-21T18:18:51Z

nvtabular/tools/dataset_inspector.py

+
+        # Stop Dask Cluster
+        client.shutdown()
+        cluster.close()


we should use a context manager for this - to make sure we get shutdown/close on exceptions

nvidia-merlin-bot · 2021-01-21T23:08:28Z

Click to view CI Results

GitHub pull request #521 of commit 78b7571f39b0cbf8b1ec6aa47b3c66181a9a436c, no merge conflicts.
Running as SYSTEM
Setting status of 78b7571f39b0cbf8b1ec6aa47b3c66181a9a436c to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1504/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 78b7571f39b0cbf8b1ec6aa47b3c66181a9a436c^{commit} # timeout=10
Checking out Revision 78b7571f39b0cbf8b1ec6aa47b3c66181a9a436c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 78b7571f39b0cbf8b1ec6aa47b3c66181a9a436c # timeout=10
Commit message: "Initial Stats computation as an operator"
 > git rev-list --no-walk c6c8faadcadae36f680cf2efd6cec8bf042fd483 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins2072347295624927623.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+29.g78b7571
    Uninstalling nvtabular-0.3.0+29.g78b7571:
      Successfully uninstalled nvtabular-0.3.0+29.g78b7571
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
./tests/unit/test_tools.py:12:1: F401 'nvtabular.io.Dataset' imported but unused
./tests/unit/test_tools.py:13:1: F401 'tests.conftest.client' imported but unused
./nvtabular/tools/dataset_inspector.py:17:1: F401 'contextlib.contextmanager' imported but unused
./nvtabular/tools/dataset_inspector.py:19:1: F401 'cudf' imported but unused
./nvtabular/tools/dataset_inspector.py:23:1: F401 'dask_cuda.LocalCUDACluster' imported but unused
./nvtabular/tools/dataset_inspector.py:32:21: F821 undefined name 'LocalCluster'
./nvtabular/tools/dataset_inspector.py:74:9: F841 local variable 'cats' is assigned to but never used
./nvtabular/tools/dataset_inspector.py:75:9: F841 local variable 'conts' is assigned to but never used
./nvtabular/tools/dataset_inspector.py:76:9: F841 local variable 'labels' is assigned to but never used
./nvtabular/tools/dataset_inspector.py:92:20: F821 undefined name 'all_cols'
./nvtabular/tools/dataset_inspector.py:102:24: F821 undefined name 'all_cols'
./nvtabular/ops/data_stats.py:28:6: F821 undefined name 'annotate'
./nvtabular/ops/data_stats.py:29:28: F821 undefined name 'ColumnNames'
./nvtabular/ops/data_stats.py:41:20: F821 undefined name 'ddf_dtypes'
./nvtabular/ops/data_stats.py:69:13: F841 local variable 'mean_val' is assigned to but never used
./nvtabular/ops/data_stats.py:99:13: F821 undefined name 'output'
./nvtabular/ops/data_stats.py:107:5: F821 undefined name 'transform'
./nvtabular/ops/data_stats.py:107:25: F821 undefined name 'Operator'
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins5151767239707400864.sh

nvidia-merlin-bot · 2021-01-22T19:05:39Z

Click to view CI Results

GitHub pull request #521 of commit 510e821db2a5b68be7c21092b54bf90f45d2b26a, no merge conflicts.
Running as SYSTEM
Setting status of 510e821db2a5b68be7c21092b54bf90f45d2b26a to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1512/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 510e821db2a5b68be7c21092b54bf90f45d2b26a^{commit} # timeout=10
Checking out Revision 510e821db2a5b68be7c21092b54bf90f45d2b26a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 510e821db2a5b68be7c21092b54bf90f45d2b26a # timeout=10
Commit message: "Improves but still error"
 > git rev-list --no-walk e45c08fa4242ed7f390441f915b804e578d947b6 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins7264767334017863537.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+30.g510e821
    Uninstalling nvtabular-0.3.0+30.g510e821:
      Successfully uninstalled nvtabular-0.3.0+30.g510e821
  Running setup.py develop for nvtabular
Successfully installed nvtabular
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/data_stats.py
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/tools/dataset_inspector.py
Oh no! 💥 💔 💥
2 files would be reformatted, 82 files would be left unchanged.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins4447000565955154219.sh

nvidia-merlin-bot · 2021-01-22T19:33:55Z

Click to view CI Results

GitHub pull request #521 of commit 827f7c42e7110062a24e124257c485bb219ec494, no merge conflicts.
Running as SYSTEM
Setting status of 827f7c42e7110062a24e124257c485bb219ec494 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1513/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 827f7c42e7110062a24e124257c485bb219ec494^{commit} # timeout=10
Checking out Revision 827f7c42e7110062a24e124257c485bb219ec494 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 827f7c42e7110062a24e124257c485bb219ec494 # timeout=10
Commit message: "Removes list support to simplify"
 > git rev-list --no-walk 510e821db2a5b68be7c21092b54bf90f45d2b26a # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6520956366340217961.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+31.g827f7c4
    Uninstalling nvtabular-0.3.0+31.g827f7c4:
      Successfully uninstalled nvtabular-0.3.0+31.g827f7c4
  Running setup.py develop for nvtabular
Successfully installed nvtabular
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/data_stats.py
would reformat /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/tools/dataset_inspector.py
Oh no! 💥 💔 💥
2 files would be reformatted, 82 files would be left unchanged.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins1967979468793437349.sh

nvidia-merlin-bot · 2021-01-23T11:04:20Z

Click to view CI Results

GitHub pull request #521 of commit 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188, no merge conflicts.
Running as SYSTEM
Setting status of 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1518/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188^{commit} # timeout=10
Checking out Revision 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 # timeout=10
Commit message: "Tests inspect-datagen working"
 > git rev-list --no-walk d8eb85c4d049d9e07509cd1da7c0515dddf73027 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6924596389324111943.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+18.g10ee22c
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
    status = self.run(options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 406, in run
    pycompile=options.compile,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 76, in install_given_reqs
    auto_confirm=True
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 685, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 545, in from_dist
    link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /var/jenkins_home/workspace/nvtab_integration/nvtabular does not match installed location of nvtabular (at /var/jenkins_home/workspace/nvtab_docs/nvtabular)
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins6785374982083974648.sh

albert17 · 2021-01-23T11:05:49Z

@benfred I applied all the changes.

@jperez999 It looks like CI is broken.

albert17 · 2021-01-25T12:19:10Z

rerun tests

nvidia-merlin-bot · 2021-01-25T12:19:26Z

Click to view CI Results

GitHub pull request #521 of commit 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188, no merge conflicts.
Running as SYSTEM
Setting status of 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1519/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188^{commit} # timeout=10
Checking out Revision 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 # timeout=10
Commit message: "Tests inspect-datagen working"
 > git rev-list --no-walk 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins4792261139226930572.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+18.g10ee22c
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
    status = self.run(options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 406, in run
    pycompile=options.compile,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 76, in install_given_reqs
    auto_confirm=True
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 685, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 545, in from_dist
    link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /var/jenkins_home/workspace/nvtab_integration/nvtabular does not match installed location of nvtabular (at /var/jenkins_home/workspace/nvtab_docs/nvtabular)
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins7461381200271302484.sh

benfred

Couple minor things here - but aside from that looks good.

benfred · 2021-01-25T17:26:34Z

nvtabular/tools/dataset_inspector.py

+        """
+        Parameters
+        -----------
+        path: str, list of str, or <dask.dataframe|cudf|pd>.DataFrame


Whats the advantage here to taking a path/format and then immediately converting to a Dataset? Can' we just pass the dataset object in directly?

benfred · 2021-01-25T17:36:31Z

nvtabular/tools/dataset_inspector.py

+                if col_type != "labels":
+                    data[col_type][col][key_names["min"][col_type]] = output[col]["min"]
+                    data[col_type][col][key_names["max"][col_type]] = output[col]["max"]
+                    data[col_type][col][key_names["mean"][col_type]] = output[col]["mean"]
+                if col_type == "conts":


Nitpick: the key_names structure is confusing - and removing it in favour of an if/elif will reduce the number of lines slightly:

Suggested change

if col_type != "labels":

data[col_type][col][key_names["min"][col_type]] = output[col]["min"]

data[col_type][col][key_names["max"][col_type]] = output[col]["max"]

data[col_type][col][key_names["mean"][col_type]] = output[col]["mean"]

if col_type == "conts":

if col_type == "cats":

data[col_type][col]["min_entry_size"] = output[col]["min"]

data[col_type][col]["max_entry_size"] = output[col]["max"]

data[col_type][col]["avg_entry_size"] = output[col]["mean"]

elif col_type == "conts":

data[col_type][col]["min_val"] = output[col]["min"]

data[col_type][col]["max_val"] = output[col]["max"]

data[col_type][col]["mean"] = output[col]["mean"]

benfred · 2021-01-25T17:43:39Z

nvtabular/ops/data_stats.py

+            if np.issubdtype(dtype, np.float):
+                col_type = "cont"
+            else:
+                col_type = "cat"


Can we assume that all integers are categorical columns? What if the column is an integer representing something like the users age?

Also - do we need a cat/col/label breakdown at all here? Can we just calculate different statistics based off the dtype of the column?

nvtabular/ops/data_stats.py

nvidia-merlin-bot · 2021-01-25T21:19:33Z

Click to view CI Results

GitHub pull request #521 of commit 71db921df009b9765d69e4d10f0070eb8b99c3b8, no merge conflicts.
Running as SYSTEM
Setting status of 71db921df009b9765d69e4d10f0070eb8b99c3b8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1520/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 71db921df009b9765d69e4d10f0070eb8b99c3b8^{commit} # timeout=10
Checking out Revision 71db921df009b9765d69e4d10f0070eb8b99c3b8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 71db921df009b9765d69e4d10f0070eb8b99c3b8 # timeout=10
Commit message: "Reestructures script and fixes review"
 > git rev-list --no-walk 99e1d9c8825a7cc0dcbcebc5fc34dfcd5d814188 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8083792694137722375.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+18.g10ee22c
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
    status = self.run(options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 406, in run
    pycompile=options.compile,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 76, in install_given_reqs
    auto_confirm=True
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 685, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 545, in from_dist
    link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /var/jenkins_home/workspace/nvtab_integration/nvtabular does not match installed location of nvtabular (at /var/jenkins_home/workspace/nvtab_docs/nvtabular)
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins4579420982097147350.sh

nvidia-merlin-bot · 2021-01-26T00:04:37Z

Click to view CI Results

GitHub pull request #521 of commit f71fd9d8d9f39ca9e4125f85eda14784a30bcd09, no merge conflicts.
Running as SYSTEM
Setting status of f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1521/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse f71fd9d8d9f39ca9e4125f85eda14784a30bcd09^{commit} # timeout=10
Checking out Revision f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 # timeout=10
Commit message: "All working"
 > git rev-list --no-walk 71db921df009b9765d69e4d10f0070eb8b99c3b8 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins3160668179631641617.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+18.g10ee22c
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
    status = self.run(options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 406, in run
    pycompile=options.compile,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 76, in install_given_reqs
    auto_confirm=True
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 685, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 545, in from_dist
    link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /var/jenkins_home/workspace/nvtab_integration/nvtabular does not match installed location of nvtabular (at /var/jenkins_home/workspace/nvtab_docs/nvtabular)
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins3355861542452888696.sh

albert17 · 2021-01-26T00:49:53Z

@benfred Changes applied.

I have added more configuration options for the Dask cluster, and I got that part out of the inspector to the script. I think it makes more sense to just pass a client to the Inspector rather than dealing inside wit this.

benfred · 2021-01-26T01:39:18Z

rerun tests

nvidia-merlin-bot · 2021-01-26T01:59:36Z

Click to view CI Results

GitHub pull request #521 of commit f71fd9d8d9f39ca9e4125f85eda14784a30bcd09, no merge conflicts.
Running as SYSTEM
Setting status of f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1522/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse f71fd9d8d9f39ca9e4125f85eda14784a30bcd09^{commit} # timeout=10
Checking out Revision f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 # timeout=10
Commit message: "All working"
 > git rev-list --no-walk f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7532762471586864464.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+18.g10ee22c
    Can't uninstall 'nvtabular'. No files were found to uninstall.
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [  9%]

........                                                                 [ 10%]

tests/unit/test_io.py .................................................. [ 19%]

........................................ssssssss                         [ 28%]

tests/unit/test_notebooks.py ....                                        [ 28%]

tests/unit/test_ops.py ................................................. [ 37%]

........................................................................ [ 50%]

.....................................                                    [ 57%]

tests/unit/test_s3.py ..                                                 [ 57%]

tests/unit/test_tf_dataloader.py ...................                     [ 60%]

tests/unit/test_tf_layers.py ........................................... [ 68%]

...................................                                      [ 74%]

tests/unit/test_tools.py ......................                          [ 78%]

tests/unit/test_torch_dataloader.py ..............................       [ 84%]

tests/unit/test_workflow.py ............................................ [ 91%]

.Build timed out (after 20 minutes). Marking the build as failed.

..Build was aborted

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins8151429897635378600.sh

benfred · 2021-01-26T02:05:37Z

rerun tests

nvidia-merlin-bot · 2021-01-26T02:14:53Z

Click to view CI Results

GitHub pull request #521 of commit f71fd9d8d9f39ca9e4125f85eda14784a30bcd09, no merge conflicts.
Running as SYSTEM
Setting status of f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1523/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse f71fd9d8d9f39ca9e4125f85eda14784a30bcd09^{commit} # timeout=10
Checking out Revision f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 # timeout=10
Commit message: "All working"
 > git rev-list --no-walk f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8861694820751636797.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+39.gf71fd9d
    Uninstalling nvtabular-0.3.0+39.gf71fd9d:
      Successfully uninstalled nvtabular-0.3.0+39.gf71fd9d
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [  9%]

........                                                                 [ 10%]

tests/unit/test_io.py .................................................. [ 19%]

........................................ssssssss                         [ 28%]

tests/unit/test_notebooks.py ....                                        [ 28%]

tests/unit/test_ops.py ................................................. [ 37%]

........................................................................ [ 50%]

.....................................                                    [ 57%]

tests/unit/test_s3.py ..                                                 [ 57%]

tests/unit/test_tf_dataloader.py ...................                     [ 60%]

tests/unit/test_tf_layers.py ........................................... [ 68%]

...................................                                      [ 74%]

tests/unit/test_tools.py .....................F                          [ 78%]

tests/unit/test_torch_dataloader.py ..............................       [ 84%]

tests/unit/test_workflow.py ............................................ [ 91%]

.............................................                            [100%]
=================================== FAILURES ===================================

____________________ test_inspect_datagen[uniform-parquet] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-23/test_inspect_datagen_uniform_p0')

datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-23/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-23/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-23/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-23/parquet0')}

engine = 'parquet', dist = 'uniform'
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("dist", ["uniform"])
def test_inspect_datagen(tmpdir, datasets, engine, dist):
    # Dataset
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])

    # Dataset columns type config
    columns_dict = {}
    columns_dict["cats"] = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    columns_dict["conts"] = ["x", "y"]
    columns_dict["labels"] = ["label"]

    # Create inspector and inspect
    output_inspect1 = tmpdir + "/dataset_info1.json"
    dataset = Dataset(paths, engine=engine)
    a = datains.DatasetInspector()
    a.inspect(dataset, columns_dict, output_inspect1)
    assert os.path.isfile(output_inspect1)

    # Generate dataset using data_gen tool
    output_datagen = tmpdir + "/datagen"
    os.mkdir(output_datagen)
    with fsspec.open(output_inspect1) as f:
        output1 = json.load(f)
    cols = datagen._get_cols_from_schema(output1)
    if dist == "uniform":
        df_gen = datagen.DatasetGen(datagen.UniformDistro(), gpu_frac=0.00001)
    else:
        df_gen = datagen.DatasetGen(datagen.PowerLawDistro(0.1), gpu_frac=0.00001)

    output_datagen_files = df_gen.full_df_create(
        output1["num_rows"], cols, entries=True, output=output_datagen
    )

    # Inspect again and check output are the same
    output_inspect2 = tmpdir + "/dataset_info2.json"
    dataset = Dataset(output_datagen_files, engine=engine)
    a.inspect(dataset, columns_dict, output_inspect2)
    assert os.path.isfile(output_inspect2)

    # Compare json outputs
    with fsspec.open(output_inspect2) as f:
        output2 = json.load(f)
    for k1 in output1.keys():
        if k1 == "num_rows":
            assert output1[k1] == output2[k1]
        else:
            for k2 in output1[k1].keys():
                for k3 in output1[k1][k2].keys():
                    if k3 == "dtype":
                        if output1[k1][k2][k3] == "object":
                            assert (
                                output1[k1][k2][k3] == output2[k1][k2][k3]
                                or "int64" == output2[k1][k2][k3]
                            )
                        else:
                            assert output1[k1][k2][k3] == output2[k1][k2][k3]
                    else:


                      assert output1[k1][k2][k3] == pytest.approx(


                            output2[k1][k2][k3], rel=1e-1, abs=1e-1
                        )

E                           assert 5.279796343439019 == 4.76510067114094 ± 4.8e-01

E                            +  where 4.76510067114094 ± 4.8e-01 = <function approx at 0x7f473a1a2cb0>(4.76510067114094, rel=0.1, abs=0.1)

E                            +    where <function approx at 0x7f473a1a2cb0> = pytest.approx
tests/unit/test_tools.py:220: AssertionError

=============================== warnings summary ===============================

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

return f(*args, **kwds)
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35563 instead

http_address["port"], self.http_server.port
tests/unit/test_io.py: 5 warnings

tests/unit/test_tf_dataloader.py: 24 warnings

tests/unit/test_tools.py: 1416 warnings

tests/unit/test_torch_dataloader.py: 6 warnings

tests/unit/test_workflow.py: 2 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41379 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py: 708 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

[x.dtype for x in self._data.columns], index=self._data.names
tests/unit/test_tools.py::test_full_df[None-1000]

tests/unit/test_tools.py::test_full_df[None-100000]

tests/unit/test_tools.py::test_full_df[distro1-1000]

tests/unit/test_tools.py::test_full_df[distro1-100000]

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.

warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")
tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39229 instead

http_address["port"], self.http_server.port
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        144     19     80      7    85%   53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   13-17, 54-288

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   48->49, 49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   22-23, 26-45, 56-69, 72

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27      1     12      1    95%   46->47, 47

nvtabular/framework_utils/torch/models.py                         38      0     22      0   100%

nvtabular/framework_utils/torch/utils.py                          31      4     10      2    85%   51->52, 52, 55->56, 56-58

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              78     78     26      0     0%   16-175

nvtabular/io/csv.py                                               14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                                              82      3     34      7    91%   129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181

nvtabular/io/dataframe_engine.py                                  12      1      4      1    88%   31->32, 32

nvtabular/io/dataset.py                                          134     18     56     10    84%   196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516

nvtabular/io/dataset_engine.py                                    13      0      0      0   100%

nvtabular/io/hugectr.py                                           45      2     22      2    91%   27->32, 32, 72->95, 99

nvtabular/io/parquet.py                                          124      2     40      2    98%   54->55, 55-63, 189->191

nvtabular/io/shuffle.py                                           25      7     10      2    63%   37->40, 38->39, 39-46

nvtabular/io/writer.py                                           123      9     45      2    92%   30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205

nvtabular/io/writer_factory.py                                    16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      260     13    108      7    95%   71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457

nvtabular/loader/tensorflow.py                                   117      8     52      7    90%   51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70

nvtabular/loader/torch.py                                         41     10      8      0    67%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   45->46, 46, 47->49, 49-52

nvtabular/ops/categorify.py                                      463     86    260     50    79%   203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986

nvtabular/ops/clip.py                                             19      2      6      3    80%   43->44, 44, 52->54, 54->55, 55

nvtabular/ops/column_similarity.py                                86     22     32      5    69%   79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221

nvtabular/ops/data_stats.py                                       57      2     24      4    93%   84->86, 86->88, 89->80, 92->80, 100, 103

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             40      4      6      1    89%   75->76, 76, 98, 101, 104

nvtabular/ops/filter.py                                           21      1      6      1    93%   43->44, 44

nvtabular/ops/hash_bucket.py                                      31      2     18      2    88%   70->73, 73, 98->102, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51->52, 52, 66->67, 67, 81->82, 81->exit, 82

nvtabular/ops/join_external.py                                    66      5     28      6    88%   93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165

nvtabular/ops/join_groupby.py                                     77      5     28      2    93%   99->100, 100, 103->110, 174, 177, 180-181

nvtabular/ops/lambdaop.py                                         27      4     10      4    78%   60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          62      0     18      0   100%

nvtabular/ops/normalize.py                                        70      7     14      2    87%   63->62, 105->107, 107-108, 128, 131-132, 135-136

nvtabular/ops/operator.py                                         15      1      2      1    88%   22->24, 24

nvtabular/ops/rename.py                                           18      3     10      3    71%   40->41, 41, 53->54, 54, 55->58, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     12     66      6    90%   139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   168->170, 215->219, 304->307, 305->304, 307

nvtabular/tools/dataset_inspector.py                              52      9     18      0    76%   30-39

nvtabular/tools/inspector_script.py                               45     45      0      0     0%   17-168

nvtabular/utils.py                                                27      4     10      3    81%   26->27, 27, 28->31, 31, 37->38, 38, 53

nvtabular/worker.py                                               65      1     30      3    96%   69->70, 70, 77->97, 80->92

nvtabular/workflow.py                                            127     10     72      7    91%   32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284
TOTAL                                                           3660    621   1531    180    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 80.14%

=========================== short test summary info ============================

FAILED tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] - asse...

===== 1 failed, 550 passed, 8 skipped, 2173 warnings in 497.58s (0:08:17) ======

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins5598061608533875040.sh

nvidia-merlin-bot · 2021-01-26T11:53:43Z

Click to view CI Results

GitHub pull request #521 of commit 1d4b4e16944707581c3a3ab0fac91b1c0f0ce466, no merge conflicts.
Running as SYSTEM
Setting status of 1d4b4e16944707581c3a3ab0fac91b1c0f0ce466 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1524/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 1d4b4e16944707581c3a3ab0fac91b1c0f0ce466^{commit} # timeout=10
Checking out Revision 1d4b4e16944707581c3a3ab0fac91b1c0f0ce466 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 1d4b4e16944707581c3a3ab0fac91b1c0f0ce466 # timeout=10
Commit message: "Increases error tolerance"
 > git rev-list --no-walk f71fd9d8d9f39ca9e4125f85eda14784a30bcd09 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7319059244865873169.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [  9%]

........                                                                 [ 10%]

tests/unit/test_io.py .................................................. [ 19%]

........................................ssssssss                         [ 28%]

tests/unit/test_notebooks.py ....                                        [ 28%]

tests/unit/test_ops.py ................................................. [ 37%]

........................................................................ [ 50%]

.....................................                                    [ 57%]

tests/unit/test_s3.py ..                                                 [ 57%]

tests/unit/test_tf_dataloader.py ...................                     [ 60%]

tests/unit/test_tf_layers.py ........................................... [ 68%]

...................................                                      [ 74%]

tests/unit/test_tools.py ......................                          [ 78%]

tests/unit/test_torch_dataloader.py ..............................       [ 84%]

tests/unit/test_workflow.py ............................................ [ 91%]

.............................................                            [100%]
=============================== warnings summary ===============================

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

return f(*args, **kwds)
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35811 instead

http_address["port"], self.http_server.port
tests/unit/test_io.py: 5 warnings

tests/unit/test_tf_dataloader.py: 24 warnings

tests/unit/test_tools.py: 1416 warnings

tests/unit/test_torch_dataloader.py: 6 warnings

tests/unit/test_workflow.py: 2 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43095 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py: 708 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

[x.dtype for x in self._data.columns], index=self._data.names
tests/unit/test_tools.py::test_full_df[None-1000]

tests/unit/test_tools.py::test_full_df[None-100000]

tests/unit/test_tools.py::test_full_df[distro1-1000]

tests/unit/test_tools.py::test_full_df[distro1-100000]

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.

warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")
tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 46369 instead

http_address["port"], self.http_server.port
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        144     19     80      7    85%   53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   13-17, 54-288

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   48->49, 49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   22-23, 26-45, 56-69, 72

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27      1     12      1    95%   46->47, 47

nvtabular/framework_utils/torch/models.py                         38      0     22      0   100%

nvtabular/framework_utils/torch/utils.py                          31      4     10      2    85%   51->52, 52, 55->56, 56-58

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              78     78     26      0     0%   16-175

nvtabular/io/csv.py                                               14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                                              82      3     34      7    91%   129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181

nvtabular/io/dataframe_engine.py                                  12      1      4      1    88%   31->32, 32

nvtabular/io/dataset.py                                          134     18     56     10    84%   196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516

nvtabular/io/dataset_engine.py                                    13      0      0      0   100%

nvtabular/io/hugectr.py                                           45      2     22      2    91%   27->32, 32, 72->95, 99

nvtabular/io/parquet.py                                          124      2     40      2    98%   54->55, 55-63, 189->191

nvtabular/io/shuffle.py                                           25      7     10      2    63%   37->40, 38->39, 39-46

nvtabular/io/writer.py                                           123      9     45      2    92%   30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205

nvtabular/io/writer_factory.py                                    16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      260     13    108      7    95%   71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457

nvtabular/loader/tensorflow.py                                   117      8     52      7    90%   51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70

nvtabular/loader/torch.py                                         41     10      8      0    67%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   45->46, 46, 47->49, 49-52

nvtabular/ops/categorify.py                                      463     86    260     50    79%   203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986

nvtabular/ops/clip.py                                             19      2      6      3    80%   43->44, 44, 52->54, 54->55, 55

nvtabular/ops/column_similarity.py                                86     22     32      5    69%   79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221

nvtabular/ops/data_stats.py                                       57      2     24      4    93%   84->86, 86->88, 89->80, 92->80, 100, 103

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             40      4      6      1    89%   75->76, 76, 98, 101, 104

nvtabular/ops/filter.py                                           21      1      6      1    93%   43->44, 44

nvtabular/ops/hash_bucket.py                                      31      2     18      2    88%   70->73, 73, 98->102, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51->52, 52, 66->67, 67, 81->82, 81->exit, 82

nvtabular/ops/join_external.py                                    66      5     28      6    88%   93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165

nvtabular/ops/join_groupby.py                                     77      5     28      2    93%   99->100, 100, 103->110, 174, 177, 180-181

nvtabular/ops/lambdaop.py                                         27      4     10      4    78%   60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          62      0     18      0   100%

nvtabular/ops/normalize.py                                        70      7     14      2    87%   63->62, 105->107, 107-108, 128, 131-132, 135-136

nvtabular/ops/operator.py                                         15      1      2      1    88%   22->24, 24

nvtabular/ops/rename.py                                           18      3     10      3    71%   40->41, 41, 53->54, 54, 55->58, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     12     66      6    90%   139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   168->170, 215->219, 304->307, 305->304, 307

nvtabular/tools/dataset_inspector.py                              52      9     18      0    76%   30-39

nvtabular/tools/inspector_script.py                               45     45      0      0     0%   17-168

nvtabular/utils.py                                                27      4     10      3    81%   26->27, 27, 28->31, 31, 37->38, 38, 53

nvtabular/worker.py                                               65      1     30      3    96%   69->70, 70, 77->97, 80->92

nvtabular/workflow.py                                            127     10     72      7    91%   32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284
TOTAL                                                           3660    621   1531    180    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 80.14%

========== 551 passed, 8 skipped, 2173 warnings in 457.01s (0:07:37) ===========

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins6824405630993703858.sh

nvidia-merlin-bot · 2021-01-26T16:38:06Z

Click to view CI Results

GitHub pull request #521 of commit 635376ececa34948c08dc6c133e33bc3c5d097ee, no merge conflicts.
Running as SYSTEM
Setting status of 635376ececa34948c08dc6c133e33bc3c5d097ee to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1525/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/521/*:refs/remotes/origin/pr/521/* # timeout=10
 > git rev-parse 635376ececa34948c08dc6c133e33bc3c5d097ee^{commit} # timeout=10
Checking out Revision 635376ececa34948c08dc6c133e33bc3c5d097ee (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 635376ececa34948c08dc6c133e33bc3c5d097ee # timeout=10
Commit message: "All Working"
 > git rev-list --no-walk 1d4b4e16944707581c3a3ab0fac91b1c0f0ce466 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins3994755179524787056.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.3.0+39.g635376e
    Uninstalling nvtabular-0.3.0+39.g635376e:
      Successfully uninstalled nvtabular-0.3.0+39.g635376e
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
84 files would be left unchanged.
/opt/conda/envs/rapids/lib/python3.7/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, asyncio-0.12.0, hypothesis-5.37.4, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.2.0
collected 559 items
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [  9%]

........                                                                 [ 10%]

tests/unit/test_io.py .................................................. [ 19%]

........................................ssssssss                         [ 28%]

tests/unit/test_notebooks.py ....                                        [ 28%]

tests/unit/test_ops.py ................................................. [ 37%]

........................................................................ [ 50%]

.....................................                                    [ 57%]

tests/unit/test_s3.py ..                                                 [ 57%]

tests/unit/test_tf_dataloader.py ...................                     [ 60%]

tests/unit/test_tf_layers.py ........................................... [ 68%]

...................................                                      [ 74%]

tests/unit/test_tools.py ......................                          [ 78%]

tests/unit/test_torch_dataloader.py ..............................       [ 84%]

tests/unit/test_workflow.py ............................................ [ 91%]

.............................................                            [100%]
=============================== warnings summary ===============================

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

../../../../../opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219

/opt/conda/envs/rapids/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

return f(*args, **kwds)
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_column_group.py::test_nested_column_group

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m

warnings.warn(errors.NumbaWarning(msg))
tests/unit/test_io.py::test_hugectr[True-0-op_columns0-parquet-hugectr]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40037 instead

http_address["port"], self.http_server.port
tests/unit/test_io.py: 5 warnings

tests/unit/test_tf_dataloader.py: 24 warnings

tests/unit/test_tools.py: 1416 warnings

tests/unit/test_torch_dataloader.py: 6 warnings

tests/unit/test_workflow.py: 2 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:672: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 46529 instead

http_address["port"], self.http_server.port
tests/unit/test_tools.py: 708 warnings

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:556: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

[x.dtype for x in self._data.columns], index=self._data.names
tests/unit/test_tools.py::test_full_df[None-1000]

tests/unit/test_tools.py::test_full_df[None-100000]

tests/unit/test_tools.py::test_full_df[distro1-1000]

tests/unit/test_tools.py::test_full_df[distro1-100000]

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/utils.py:47: UserWarning: get_memory_info is not supported. Using total device memory from NVML.

warnings.warn("get_memory_info is not supported. Using total device memory from NVML.")
tests/unit/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40585 instead

http_address["port"], self.http_server.port
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        144     19     80      7    85%   53->54, 54, 86->87, 87, 97->98, 98, 101->103, 127->128, 128, 151-164, 188->191, 191, 275->278, 278

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   13-17, 54-288

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   47->56, 56, 64->45, 99->100, 100, 107->108, 108, 187->188, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   48->49, 49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   22-23, 26-45, 56-69, 72

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27      1     12      1    95%   46->47, 47

nvtabular/framework_utils/torch/models.py                         38      0     22      0   100%

nvtabular/framework_utils/torch/utils.py                          31      4     10      2    85%   51->52, 52, 55->56, 56-58

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              78     78     26      0     0%   16-175

nvtabular/io/csv.py                                               14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                                              82      3     34      7    91%   129->127, 157->160, 167->168, 168, 172->174, 174->170, 178->179, 179, 180->181, 181

nvtabular/io/dataframe_engine.py                                  12      1      4      1    88%   31->32, 32

nvtabular/io/dataset.py                                          134     18     56     10    84%   196->197, 197, 198->199, 199, 209->210, 210, 218->219, 219, 227->250, 232->236, 236-250, 325->326, 326, 465->466, 466-467, 495->496, 496-497, 515->516, 516

nvtabular/io/dataset_engine.py                                    13      0      0      0   100%

nvtabular/io/hugectr.py                                           45      2     22      2    91%   27->32, 32, 72->95, 99

nvtabular/io/parquet.py                                          124      2     40      2    98%   54->55, 55-63, 189->191

nvtabular/io/shuffle.py                                           25      7     10      2    63%   37->40, 38->39, 39-46

nvtabular/io/writer.py                                           123      9     45      2    92%   30, 47, 71->72, 72, 110, 113, 181->182, 182, 203-205

nvtabular/io/writer_factory.py                                    16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      260     13    108      7    95%   71->72, 72, 77-78, 123->124, 124, 131-132, 202->204, 240->241, 241-242, 350->351, 351, 352->355, 355-356, 449->450, 450, 457

nvtabular/loader/tensorflow.py                                   117      8     52      7    90%   51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 281->282, 282, 309->313, 341->342, 342

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 58-60, 65->66, 66-70

nvtabular/loader/torch.py                                         41     10      8      0    67%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   45->46, 46, 47->49, 49-52

nvtabular/ops/categorify.py                                      463     86    260     50    79%   203->204, 204, 220->221, 221, 224->225, 225, 230->233, 233, 244->249, 249, 347->348, 348, 411->413, 416->417, 417, 418->419, 419, 467->468, 468-470, 472->473, 473, 474->475, 475, 497->500, 500, 510->511, 511, 518->522, 522, 538->541, 541->542, 542-544, 546->547, 547-548, 550->551, 551-552, 554->555, 555-571, 573->577, 577, 581->582, 582, 583->584, 584, 591->592, 592, 593->594, 594, 599->600, 600, 609->616, 616-617, 621->622, 622, 634->635, 635, 636->640, 640, 643->661, 661-664, 687->688, 688, 691->692, 692, 693->694, 694, 704->706, 706-709, 816->817, 817, 818->819, 819, 856->871, 857->867, 861->871, 867-869, 871->872, 872-877, 899->900, 900, 911->912, 912, 928->933, 931->932, 932, 942->939, 947->939, 954->955, 955, 970->976, 972->973, 973, 976-986

nvtabular/ops/clip.py                                             19      2      6      3    80%   43->44, 44, 52->54, 54->55, 55

nvtabular/ops/column_similarity.py                                86     22     32      5    69%   79->84, 84, 154-155, 164-166, 174-190, 205->215, 207->210, 210->211, 211, 220->221, 221

nvtabular/ops/data_stats.py                                       57      2     24      4    93%   84->86, 86->88, 89->80, 92->80, 100, 103

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             40      4      6      1    89%   75->76, 76, 98, 101, 104

nvtabular/ops/filter.py                                           21      1      6      1    93%   43->44, 44

nvtabular/ops/hash_bucket.py                                      31      2     18      2    88%   70->73, 73, 98->102, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51->52, 52, 66->67, 67, 81->82, 81->exit, 82

nvtabular/ops/join_external.py                                    66      5     28      6    88%   93->94, 94, 95->96, 96, 110->113, 113, 126->130, 148->150, 150, 164->165, 165

nvtabular/ops/join_groupby.py                                     77      5     28      2    93%   99->100, 100, 103->110, 174, 177, 180-181

nvtabular/ops/lambdaop.py                                         27      4     10      4    78%   60->61, 61, 64->65, 65, 72->73, 73, 74->78, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          62      0     18      0   100%

nvtabular/ops/normalize.py                                        70      7     14      2    87%   63->62, 105->107, 107-108, 128, 131-132, 135-136

nvtabular/ops/operator.py                                         15      1      2      1    88%   22->24, 24

nvtabular/ops/rename.py                                           18      3     10      3    71%   40->41, 41, 53->54, 54, 55->58, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     12     66      6    90%   139->140, 140, 160->164, 167->176, 219, 222-223, 226-227, 235->236, 236-242, 303->306, 332->335

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   168->170, 215->219, 304->307, 305->304, 307

nvtabular/tools/dataset_inspector.py                              52      9     18      0    76%   30-39

nvtabular/tools/inspector_script.py                               45     45      0      0     0%   17-168

nvtabular/utils.py                                                27      4     10      3    81%   26->27, 27, 28->31, 31, 37->38, 38, 53

nvtabular/worker.py                                               65      1     30      3    96%   69->70, 70, 77->97, 80->92

nvtabular/workflow.py                                            127     10     72      7    91%   32->33, 33, 110->112, 112, 124-126, 163->164, 164, 196->197, 197, 200->201, 201, 274->275, 275, 283->284, 284
TOTAL                                                           3660    621   1531    180    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 80.14%

========== 551 passed, 8 skipped, 2173 warnings in 454.22s (0:07:34) ===========

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

--- Logging error ---

Traceback (most recent call last):

File "/opt/conda/envs/rapids/lib/python3.7/logging/init.py", line 1028, in emit

stream.write(msg + self.terminator)

ValueError: I/O operation on closed file.

Call stack:

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 890, in _bootstrap

self._bootstrap_inner()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 926, in _bootstrap_inner

self.run()

File "/opt/conda/envs/rapids/lib/python3.7/threading.py", line 870, in run

self._target(*self._args, **self._kwargs)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 417, in run_loop

loop.start()

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 149, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 541, in run_forever

self._run_once()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once

handle._run()

File "/opt/conda/envs/rapids/lib/python3.7/asyncio/events.py", line 88, in _run

self._context.run(self._callback, *self._args)

File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/nanny.py", line 456, in _on_exit

logger.warning("Restarting worker")

Message: 'Restarting worker'

Arguments: ()

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins5330818293151064816.sh

It looks like some files got commited recently (in NVIDIA-Merlin#521) with windows style linefeed/carriage returns instead of just the linefeed in the rest of the codebase. Fix so we don't generate massive whitespace diffs on every commit.

It looks like some files got commited recently (in #521) with windows style linefeed/carriage returns instead of just the linefeed in the rest of the codebase. Fix so we don't generate massive whitespace diffs on every commit.

* Initial commit in new branch * Adds unit test * Updates json output and multihot calculation * Updates list processing * Updates test * Adds cudf issue * Data inspector ready * Test works * Dataset inspect read - Tests passing * Moves dataset inspector script * Initial inspect-datagent test * Data gen and data inspect work together * Initial Stats computation as an operator * Improves but still error * Removes list support to simplify * Different Series type for computations * Cleans and use attributes * Data Stats Operator working * Tests inspect-datagen working * Reestructures script and fixes review * All Working

It looks like some files got commited recently (in #521) with windows style linefeed/carriage returns instead of just the linefeed in the rest of the codebase. Fix so we don't generate massive whitespace diffs on every commit.

albert17 requested a review from jperez999 January 8, 2021 17:34

albert17 linked an issue Jan 8, 2021 that may be closed by this pull request

[FEA] Dataset Generation tool #473

Closed

albert17 marked this pull request as ready for review January 19, 2021 17:13

albert17 requested a review from benfred January 19, 2021 18:34

jperez999 requested changes Jan 19, 2021

View reviewed changes

albert17 requested a review from jperez999 January 21, 2021 17:44

jperez999 approved these changes Jan 21, 2021

View reviewed changes

benfred requested changes Jan 21, 2021

View reviewed changes

Alberto Alvarez added 3 commits January 23, 2021 03:04

Cleans and use attributes

9db83a4

Data Stats Operator working

56fece5

Tests inspect-datagen working

99e1d9c

albert17 requested a review from benfred January 23, 2021 11:05

benfred reviewed Jan 25, 2021

View reviewed changes

Reestructures script and fixes review

71db921

albert17 requested a review from benfred January 26, 2021 00:47

benfred approved these changes Jan 26, 2021

View reviewed changes

All Working

635376e

jperez999 approved these changes Jan 26, 2021

View reviewed changes

albert17 merged commit db22a41 into NVIDIA-Merlin:main Jan 26, 2021

benfred mentioned this pull request Jan 30, 2021

Remove carriage return (\r\n -> \n) #546

Merged

albert17 deleted the data-inspect branch April 20, 2021 16:57

-                if col_type != "labels":
-                    data[col_type][col][key_names["min"][col_type]] = output[col]["min"]
-                    data[col_type][col][key_names["max"][col_type]] = output[col]["max"]
-                    data[col_type][col][key_names["mean"][col_type]] = output[col]["mean"]
-                if col_type == "conts":
+                if col_type == "cats":
+                    data[col_type][col]["min_entry_size"] = output[col]["min"]
+                    data[col_type][col]["max_entry_size"] = output[col]["max"]
+                    data[col_type][col]["avg_entry_size"] = output[col]["mean"]
+                elif col_type == "conts":
+                    data[col_type][col]["min_val"] = output[col]["min"]
+                    data[col_type][col]["max_val"] = output[col]["max"]
+                    data[col_type][col]["mean"] = output[col]["mean"]

Data inspect #521

Data inspect #521

Conversation

albert17 commented Jan 8, 2021 • edited Loading

nvidia-merlin-bot commented Jan 8, 2021

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jan 12, 2021

nvidia-merlin-bot commented Jan 12, 2021

nvidia-merlin-bot commented Jan 12, 2021

nvidia-merlin-bot commented Jan 15, 2021

nvidia-merlin-bot commented Jan 19, 2021

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jan 19, 2021

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jan 19, 2021

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

jperez999 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvidia-merlin-bot commented Jan 19, 2021

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

albert17 commented Jan 19, 2021

nvidia-merlin-bot commented Jan 19, 2021

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jan 20, 2021

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jan 21, 2021

albert17 commented Jan 21, 2021

nvidia-merlin-bot commented Jan 21, 2021

jperez999 commented Jan 21, 2021

albert17 commented Jan 21, 2021

nvidia-merlin-bot commented Jan 21, 2021

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jan 21, 2021

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvidia-merlin-bot commented Jan 21, 2021

nvidia-merlin-bot commented Jan 22, 2021

nvidia-merlin-bot commented Jan 22, 2021

nvidia-merlin-bot commented Jan 23, 2021

albert17 commented Jan 23, 2021

albert17 commented Jan 25, 2021

nvidia-merlin-bot commented Jan 25, 2021

benfred left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvidia-merlin-bot commented Jan 25, 2021

nvidia-merlin-bot commented Jan 26, 2021

albert17 commented Jan 26, 2021

benfred commented Jan 26, 2021

nvidia-merlin-bot commented Jan 26, 2021

benfred commented Jan 26, 2021

nvidia-merlin-bot commented Jan 26, 2021

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jan 26, 2021

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jan 26, 2021

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

albert17 commented Jan 8, 2021 •

edited

Loading

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing