Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segmented_gather() for null LIST rows #9537

Merged

Conversation

mythrocks
Copy link
Contributor

segmented_gather() currently assumes that null LIST rows also have
a 0 size (as defined by the difference of adjacent offsets.)

This might not hold, for example, for LIST columns that are members
of STRUCT columns whose parent null masks are superimposed on its
children. This would cause a non-empty list row to be marked null,
without compaction. This leads to errors in fetching elements of a
list row as seen in NVIDIA/spark-rapids/pull/3770.

This commit adds the handling of uncompacted LIST rows in
segmented_gather().

`segmented_gather()` currently assumes that null LIST rows also have
a `0` size (as defined by the difference of adjacent offsets.)

This might not hold, for example, for LIST columns that are members
of STRUCT columns whose parent null masks are superimposed on its
children. This would cause a non-empty list row to be marked null,
without compaction. This leads to errors in fetching elements of a
list row as seen in NVIDIA/spark-rapids/pull/3770.

This commit adds the handling of uncompacted LIST rows in
`segmented_gather()`.
@mythrocks mythrocks self-assigned this Oct 27, 2021
@mythrocks mythrocks requested a review from a team as a code owner October 27, 2021 20:11
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Oct 27, 2021
@mythrocks mythrocks added bug Something isn't working non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Oct 27, 2021
Copy link
Contributor

@codereport codereport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@codecov
Copy link

codecov bot commented Oct 27, 2021

Codecov Report

Merging #9537 (6e3fd8d) into branch-21.12 (ab4bfaa) will decrease coverage by 0.12%.
The diff coverage is n/a.

❗ Current head 6e3fd8d differs from pull request most recent head 1778377. Consider uploading reports for the commit 1778377 to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9537      +/-   ##
================================================
- Coverage         10.79%   10.66%   -0.13%     
================================================
  Files               116      117       +1     
  Lines             18869    19735     +866     
================================================
+ Hits               2036     2104      +68     
- Misses            16833    17631     +798     
Impacted Files Coverage Δ
python/dask_cudf/dask_cudf/sorting.py 92.90% <0.00%> (-1.21%) ⬇️
python/cudf/cudf/io/csv.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/hdf.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/dlpack.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
... and 68 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0951ba...1778377. Read the comment docs.

@mythrocks
Copy link
Contributor Author

Rerun tests

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Oct 29, 2021
@davidwendt
Copy link
Contributor

Could you run this with compute-sanitizer?

compute-sanitizer --tool memcheck gtests/COPYING_TEST --gtest_filter=SegmentedGatherTest* --rmm_mode=cuda

Same for the extract tests too.

@mythrocks
Copy link
Contributor Author

Could you run this with compute-sanitizer?

compute-sanitizer --tool memcheck gtests/COPYING_TEST --gtest_filter=SegmentedGatherTest* --rmm_mode=cuda

Same for the extract tests too.

Thank you for suggesting the memcheck test:

$ git:(extract-list-element-nulls) ✗ compute-sanitizer --tool memcheck gtests/COPYING_TEST --gtest_filter=SegmentedGatherTest\* --rmm_mode=cuda
...

[----------] Global test environment tear-down
[==========] 203 tests from 22 test suites ran. (142742 ms total)
[  PASSED  ] 203 tests.
========= ERROR SUMMARY: 0 errors

$ git:(extract-list-element-nulls) ✗ compute-sanitizer --tool memcheck gtests/LISTS_TEST --gtest_filter=ListsExtract\* --rmm_mode=cuda 
...
[----------] Global test environment tear-down
[==========] 80 tests from 35 test suites ran. (25638 ms total)
[  PASSED  ] 80 tests.
========= ERROR SUMMARY: 0 errors

@davidwendt
Copy link
Contributor

rerun tests

@mythrocks
Copy link
Contributor Author

Rerun tests

@galipremsagar
Copy link
Contributor

rerun tests

@mythrocks
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 2ef82f2 into rapidsai:branch-21.12 Nov 4, 2021
@mythrocks
Copy link
Contributor Author

@davidwendt, @codereport, @ttnghia: Thank you for the reviews.
@galipremsagar: Thank you for sorting out the 11.5 CI failure.

I have merged this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants