-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix segmented-sort to ignore indices outside the offsets #11888
Fix segmented-sort to ignore indices outside the offsets #11888
Conversation
Codecov ReportBase: 87.40% // Head: 88.11% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-22.12 #11888 +/- ##
================================================
+ Coverage 87.40% 88.11% +0.70%
================================================
Files 133 133
Lines 21833 21881 +48
================================================
+ Hits 19084 19280 +196
+ Misses 2749 2601 -148
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the time I forgot to ask about doxygen 😞
@gpucibot merge |
Description
Fixes
cudf::segmented_sorted_order
to ignore indices outside the specified offsets values.The segmented-sort function in general sorts subsets of the input using a column of offsets (integers) to identify the position of each segment. Here is an example:
There are 3 segments to sort:
[0,3)
,[3,7)
, and[7,10)
Segment 1 sorts to
{ 7, 8, 9 }
Segment 2 sorts to
{ 3, 4, 5, 6 }
Segment 3 sorts to
{ 0, 1, 2 }
The segmented-sort result is
{ 7, 8, 9, 3, 4, 5, 6, 0, 1, 2 }
If the offsets do not fully cover all the input the segmented-sort should ignore any segments outside of the offsets.
Here there is only 1 segments to sort:
[3,7) => { 3, 4, 5, 6 }
The segmented-sort result is
{ 9, 8, 7, 3, 4, 5, 6, 2, 1, 0 }
The values before the first offset and after the last offset should be left unchanged.
The gtests have been corrected to expect this behavior.
Also, the
SegmentedReductionTestUntyped.PartialSegmentReduction
gtest was improved to include offset gaps at the beginning and at the end to verify consistent behavior there as well.Found while working on #11729
Checklist