Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segmented-sort to ignore indices outside the offsets #11888

Merged
merged 6 commits into from
Oct 13, 2022

Conversation

davidwendt
Copy link
Contributor

Description

Fixes cudf::segmented_sorted_order to ignore indices outside the specified offsets values.

The segmented-sort function in general sorts subsets of the input using a column of offsets (integers) to identify the position of each segment. Here is an example:

input    = { 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }
offsets1 = { 0,       3,          7,      10 }

There are 3 segments to sort: [0,3), [3,7), and [7,10)
Segment 1 sorts to { 7, 8, 9 }
Segment 2 sorts to { 3, 4, 5, 6 }
Segment 3 sorts to { 0, 1, 2 }
The segmented-sort result is { 7, 8, 9, 3, 4, 5, 6, 0, 1, 2 }

If the offsets do not fully cover all the input the segmented-sort should ignore any segments outside of the offsets.

input    = { 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }
offsets2 = {          3,          7       }

Here there is only 1 segments to sort: [3,7) => { 3, 4, 5, 6 }
The segmented-sort result is { 9, 8, 7, 3, 4, 5, 6, 2, 1, 0 }
The values before the first offset and after the last offset should be left unchanged.

The gtests have been corrected to expect this behavior.
Also, the SegmentedReductionTestUntyped.PartialSegmentReduction gtest was improved to include offset gaps at the beginning and at the end to verify consistent behavior there as well.

Found while working on #11729

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added bug Something isn't working 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Oct 10, 2022
@davidwendt davidwendt self-assigned this Oct 10, 2022
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Oct 10, 2022
@davidwendt davidwendt marked this pull request as ready for review October 10, 2022 21:23
@davidwendt davidwendt requested a review from a team as a code owner October 10, 2022 21:23
@codecov
Copy link

codecov bot commented Oct 10, 2022

Codecov Report

Base: 87.40% // Head: 88.11% // Increases project coverage by +0.70% 🎉

Coverage data is based on head (bbf6569) compared to base (f72c4ce).
Patch coverage: 86.42% of modified lines in pull request are covered.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #11888      +/-   ##
================================================
+ Coverage         87.40%   88.11%   +0.70%     
================================================
  Files               133      133              
  Lines             21833    21881      +48     
================================================
+ Hits              19084    19280     +196     
+ Misses             2749     2601     -148     
Impacted Files Coverage Δ
python/cudf/cudf/core/udf/__init__.py 97.05% <ø> (+47.05%) ⬆️
python/cudf/cudf/io/orc.py 92.94% <ø> (-0.09%) ⬇️
python/cudf/cudf/utils/ioutils.py 79.47% <ø> (ø)
...thon/dask_cudf/dask_cudf/tests/test_distributed.py 18.86% <ø> (+4.94%) ⬆️
python/cudf/cudf/core/_base_index.py 82.20% <43.75%> (-3.35%) ⬇️
python/cudf/cudf/io/text.py 91.66% <66.66%> (-8.34%) ⬇️
python/strings_udf/strings_udf/__init__.py 86.27% <76.00%> (-10.61%) ⬇️
python/cudf/cudf/core/index.py 92.91% <95.16%> (+0.28%) ⬆️
python/cudf/cudf/__init__.py 90.69% <100.00%> (ø)
python/cudf/cudf/core/column/categorical.py 89.34% <100.00%> (ø)
... and 13 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time I forgot to ask about doxygen 😞

cpp/include/cudf/sorting.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/sorting.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/sorting.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/sorting.hpp Show resolved Hide resolved
@davidwendt davidwendt requested a review from bdice October 12, 2022 13:18
@davidwendt
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 678946b into rapidsai:branch-22.12 Oct 13, 2022
@davidwendt davidwendt deleted the bug-seg-sort-offset-gaps branch October 13, 2022 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants