-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HostColumnVectoreCore#isNull should return true for out-of-range rows #10779
HostColumnVectoreCore#isNull should return true for out-of-range rows #10779
Conversation
Signed-off-by: Gera Shegalov <gera@apache.org>
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10779 +/- ##
================================================
+ Coverage 86.40% 86.43% +0.03%
================================================
Files 143 143
Lines 22448 22448
================================================
+ Hits 19396 19403 +7
+ Misses 3052 3045 -7
Continue to review full report at Codecov.
|
return BitVectorHelper.isNull(offHeap.valid, rowIndex); | ||
} | ||
return false; | ||
return rowIndex < 0 || rowIndex >= rows // unknown, hence NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is logically correct. Accessing out of bounds is different from having null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We get to decide what is logically correct because we define the logic. E.g., we have different out-of-bound policies in other use cases. We could make it configurable here as well once there is a different use case necessitating it.
I need to instrument codegen in Spark to get a better grasp of how we end up in this lookup. However, in terms of tests it appears to be a fix that strictly improves the plugin because it does not break existing tests.
@gpucibot merge |
This PR adds tests verifying the fix for correctly dealing with all-null data partitions in rapidsai/cudf#10779. Signed-off-by: Gera Shegalov <gera@apache.org>
This PR suggests a 3VL way of interpreting
isNull
for arowId
out of bounds. Such a value is unknown and therefore isNull should betrue
.NVIDIA/spark-rapids#5140 shows that
SpecificUnsafeProjection
may probe child columns for NULL even though the parent column row is also NULL.However there are no rows in the child CV when the parent row is NULL leading to an assert violation if asserts are enabled or an NPE if disabled.
Signed-off-by: Gera Shegalov gera@apache.org