Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct binary comparison op functionality for spark rapids #11153

Merged
merged 140 commits into from
Jul 8, 2022

Conversation

rwlee
Copy link
Contributor

@rwlee rwlee commented Jun 25, 2022

Replaces #9452
Takes the core cudf work from 9452 and integrates it into the JNI while removing struct support integration into cudf's binary operator API. Includes scalar comparison support in addition to vector to vector comparison support.

Follow on needed
Add null_equal operator support

cpp/include/cudf/binaryop.hpp Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/binary_ops.cu Show resolved Hide resolved
cpp/src/binaryop/compiled/struct_binary_ops.cuh Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/binary_ops.cu Outdated Show resolved Hide resolved
@bdice bdice self-requested a review July 7, 2022 23:22
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with suggestions after discussing offline with @rwlee. Thanks!

cpp/include/cudf/binaryop.hpp Outdated Show resolved Hide resolved
scalar const& s,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
rmm::device_buffer scalar_col_bitmask_and(column_view const& col,
Copy link
Contributor

@bdice bdice Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a subtle distinction between valid masks and bitmasks. This case is a valid mask. "Bitmasks" are never nullptr but "valid masks" might not be allocated (just nullptr). This checks if the column is nullable so it's dealing with valid masks.

See past refactoring here: #9588

cpp/src/binaryop/binaryop.cpp Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/struct_binary_ops.cuh Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/struct_binary_ops.cuh Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/struct_binary_ops.cuh Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/struct_binary_ops.cuh Outdated Show resolved Hide resolved
Comment on lines 35 to 58
template <typename DeviceComparator, class AdaptableUnaryFunction, class Iterator>
struct device_comparison_functor {
using optional_iterator = thrust::transform_iterator<AdaptableUnaryFunction, Iterator>;
device_comparison_functor(optional_iterator const& optional_iter,
bool const is_lhs_scalar,
bool const is_rhs_scalar,
DeviceComparator const& comparator)
: _optional_iter(optional_iter),
_is_lhs_scalar(is_lhs_scalar),
_is_rhs_scalar(is_rhs_scalar),
_comparator(comparator){};

bool __device__ operator()(size_type i)
{
return _optional_iter[i].has_value() &&
_comparator(cudf::experimental::row::lhs_index_type{_is_lhs_scalar ? 0 : i},
cudf::experimental::row::rhs_index_type{_is_rhs_scalar ? 0 : i});
}

optional_iterator const _optional_iter;
bool const _is_lhs_scalar;
bool const _is_rhs_scalar;
DeviceComparator const _comparator;
};
Copy link
Contributor

@bdice bdice Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the comment about this adding net lines of code: it's fine to skip the constructor and use list initialization since the parameters are fairly self-contained.

Also this doesn't need so many template parameters, you can just write template <typename OptionalIterator> and not define using optional_iterator = ...;.

Finally, don't take iterators by reference. They're cheap to copy. See #11084. (But this is moot if we remove the constructor.)

I would try this:

Suggested change
template <typename DeviceComparator, class AdaptableUnaryFunction, class Iterator>
struct device_comparison_functor {
using optional_iterator = thrust::transform_iterator<AdaptableUnaryFunction, Iterator>;
device_comparison_functor(optional_iterator const& optional_iter,
bool const is_lhs_scalar,
bool const is_rhs_scalar,
DeviceComparator const& comparator)
: _optional_iter(optional_iter),
_is_lhs_scalar(is_lhs_scalar),
_is_rhs_scalar(is_rhs_scalar),
_comparator(comparator){};
bool __device__ operator()(size_type i)
{
return _optional_iter[i].has_value() &&
_comparator(cudf::experimental::row::lhs_index_type{_is_lhs_scalar ? 0 : i},
cudf::experimental::row::rhs_index_type{_is_rhs_scalar ? 0 : i});
}
optional_iterator const _optional_iter;
bool const _is_lhs_scalar;
bool const _is_rhs_scalar;
DeviceComparator const _comparator;
};
template <typename OptionalIterator, typename DeviceComparator>
struct device_comparison_functor {
bool __device__ operator()(size_type i)
{
return optional_iter[i].has_value() &&
comparator(cudf::experimental::row::lhs_index_type{is_lhs_scalar ? 0 : i},
cudf::experimental::row::rhs_index_type{is_rhs_scalar ? 0 : i});
}
OptionalIterator const optional_iter;
bool const is_lhs_scalar;
bool const is_rhs_scalar;
DeviceComparator const comparator;
};

No code changes should be needed at the calling sites.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: If you have public member variables then you don't need the constructor and can directly use initializer list for instantiating the struct. If the member variables are private then you must have a constructor.

Copy link
Contributor

@bdice bdice Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why we use struct instead of class for functors like this where we expect to use list initialization (and no constructor). The members of a struct are public by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had tried this originally (implicit constructor from initializer list), but it has compilation failures.

/home/ryanlee/dev/rapids/cudf/cpp/src/binaryop/compiled/struct_binary_ops.cuh(88): error: no instance of constructor "device_comparison_functor" matches the argument list
            argument types are: (thrust::transform_iterator<cudf::detail::optional_accessor<__nv_bool, cudf::nullate::DYNAMIC>, cudf::column_device_view::count_it, thrust::use_default, thrust::use_default>, __nv_bool, __nv_bool, cudf::experimental::row::lexicographic::less_equivalent_comparator<cudf::experimental::row::lexicographic::strong_index_comparator_adapter<cudf::experimental::row::lexicographic::device_row_comparator<cudf::nullate::DYNAMIC, cudf::experimental::row::lexicographic::sorting_physical_element_comparator>>>)

Copy link
Contributor

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great except that there are a few minor issues mentioned in my comments and one more thing that is desired: using const qualifier for variables as much as possible.

rwlee and others added 3 commits July 8, 2022 00:24
Co-authored-by: Nghia Truong <ttnghia@users.noreply.github.com>
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
@rwlee
Copy link
Contributor Author

rwlee commented Jul 8, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit e6ff641 into rapidsai:branch-22.08 Jul 8, 2022
rapids-bot bot pushed a commit that referenced this pull request Aug 15, 2022
Adds support for Spark's null aware equality binop and expands/improves Java testing for struct binops. Properly tests null structs and full operator testing coverage. Utilizes existing Spark struct binop support with JNI changes to force the full null-aware comparison.

Expands on #11153

Partial solution to #8964 -- `NULL_MAX` and `NULL_MIN` still outstanding.

Authors:
  - Ryan Lee (https://github.com/rwlee)

Approvers:
  - Tobias Ribizel (https://github.com/upsj)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Jason Lowe (https://github.com/jlowe)

URL: #11520
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants