-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Struct binary comparison op functionality for spark rapids #11153
Struct binary comparison op functionality for spark rapids #11153
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving with suggestions after discussing offline with @rwlee. Thanks!
cpp/src/binaryop/binaryop.cpp
Outdated
scalar const& s, | ||
rmm::cuda_stream_view stream, | ||
rmm::mr::device_memory_resource* mr) | ||
rmm::device_buffer scalar_col_bitmask_and(column_view const& col, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a subtle distinction between valid masks and bitmasks. This case is a valid mask. "Bitmasks" are never nullptr
but "valid masks" might not be allocated (just nullptr
). This checks if the column is nullable so it's dealing with valid masks.
See past refactoring here: #9588
template <typename DeviceComparator, class AdaptableUnaryFunction, class Iterator> | ||
struct device_comparison_functor { | ||
using optional_iterator = thrust::transform_iterator<AdaptableUnaryFunction, Iterator>; | ||
device_comparison_functor(optional_iterator const& optional_iter, | ||
bool const is_lhs_scalar, | ||
bool const is_rhs_scalar, | ||
DeviceComparator const& comparator) | ||
: _optional_iter(optional_iter), | ||
_is_lhs_scalar(is_lhs_scalar), | ||
_is_rhs_scalar(is_rhs_scalar), | ||
_comparator(comparator){}; | ||
|
||
bool __device__ operator()(size_type i) | ||
{ | ||
return _optional_iter[i].has_value() && | ||
_comparator(cudf::experimental::row::lhs_index_type{_is_lhs_scalar ? 0 : i}, | ||
cudf::experimental::row::rhs_index_type{_is_rhs_scalar ? 0 : i}); | ||
} | ||
|
||
optional_iterator const _optional_iter; | ||
bool const _is_lhs_scalar; | ||
bool const _is_rhs_scalar; | ||
DeviceComparator const _comparator; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the comment about this adding net lines of code: it's fine to skip the constructor and use list initialization since the parameters are fairly self-contained.
Also this doesn't need so many template parameters, you can just write template <typename OptionalIterator>
and not define using optional_iterator = ...;
.
Finally, don't take iterators by reference. They're cheap to copy. See #11084. (But this is moot if we remove the constructor.)
I would try this:
template <typename DeviceComparator, class AdaptableUnaryFunction, class Iterator> | |
struct device_comparison_functor { | |
using optional_iterator = thrust::transform_iterator<AdaptableUnaryFunction, Iterator>; | |
device_comparison_functor(optional_iterator const& optional_iter, | |
bool const is_lhs_scalar, | |
bool const is_rhs_scalar, | |
DeviceComparator const& comparator) | |
: _optional_iter(optional_iter), | |
_is_lhs_scalar(is_lhs_scalar), | |
_is_rhs_scalar(is_rhs_scalar), | |
_comparator(comparator){}; | |
bool __device__ operator()(size_type i) | |
{ | |
return _optional_iter[i].has_value() && | |
_comparator(cudf::experimental::row::lhs_index_type{_is_lhs_scalar ? 0 : i}, | |
cudf::experimental::row::rhs_index_type{_is_rhs_scalar ? 0 : i}); | |
} | |
optional_iterator const _optional_iter; | |
bool const _is_lhs_scalar; | |
bool const _is_rhs_scalar; | |
DeviceComparator const _comparator; | |
}; | |
template <typename OptionalIterator, typename DeviceComparator> | |
struct device_comparison_functor { | |
bool __device__ operator()(size_type i) | |
{ | |
return optional_iter[i].has_value() && | |
comparator(cudf::experimental::row::lhs_index_type{is_lhs_scalar ? 0 : i}, | |
cudf::experimental::row::rhs_index_type{is_rhs_scalar ? 0 : i}); | |
} | |
OptionalIterator const optional_iter; | |
bool const is_lhs_scalar; | |
bool const is_rhs_scalar; | |
DeviceComparator const comparator; | |
}; |
No code changes should be needed at the calling sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: If you have public member variables then you don't need the constructor and can directly use initializer list for instantiating the struct. If the member variables are private then you must have a constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why we use struct
instead of class
for functors like this where we expect to use list initialization (and no constructor). The members of a struct
are public by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had tried this originally (implicit constructor from initializer list), but it has compilation failures.
/home/ryanlee/dev/rapids/cudf/cpp/src/binaryop/compiled/struct_binary_ops.cuh(88): error: no instance of constructor "device_comparison_functor" matches the argument list
argument types are: (thrust::transform_iterator<cudf::detail::optional_accessor<__nv_bool, cudf::nullate::DYNAMIC>, cudf::column_device_view::count_it, thrust::use_default, thrust::use_default>, __nv_bool, __nv_bool, cudf::experimental::row::lexicographic::less_equivalent_comparator<cudf::experimental::row::lexicographic::strong_index_comparator_adapter<cudf::experimental::row::lexicographic::device_row_comparator<cudf::nullate::DYNAMIC, cudf::experimental::row::lexicographic::sorting_physical_element_comparator>>>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great except that there are a few minor issues mentioned in my comments and one more thing that is desired: using const
qualifier for variables as much as possible.
Co-authored-by: Nghia Truong <ttnghia@users.noreply.github.com> Co-authored-by: Bradley Dice <bdice@bradleydice.com>
@gpucibot merge |
Adds support for Spark's null aware equality binop and expands/improves Java testing for struct binops. Properly tests null structs and full operator testing coverage. Utilizes existing Spark struct binop support with JNI changes to force the full null-aware comparison. Expands on #11153 Partial solution to #8964 -- `NULL_MAX` and `NULL_MIN` still outstanding. Authors: - Ryan Lee (https://github.com/rwlee) Approvers: - Tobias Ribizel (https://github.com/upsj) - Vukasin Milovanovic (https://github.com/vuule) - Jason Lowe (https://github.com/jlowe) URL: #11520
Replaces #9452
Takes the core cudf work from 9452 and integrates it into the JNI while removing struct support integration into cudf's binary operator API. Includes scalar comparison support in addition to vector to vector comparison support.
Follow on needed
Add null_equal operator support