Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast path for experimental::row::equality #12676

Merged
merged 22 commits into from
Feb 16, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
4a8085a
building equality::self_comparator
divyegala Feb 2, 2023
f71d161
two table comp
divyegala Feb 2, 2023
3ca298c
copyright years
divyegala Feb 2, 2023
7c167a7
centralizing repeated logic
divyegala Feb 2, 2023
0ceb79e
address review to create functors
divyegala Feb 3, 2023
37e7326
updating has_nested_columns docs
divyegala Feb 3, 2023
b44f603
Merge remote-tracking branch 'upstream/branch-23.04' into equality-co…
divyegala Feb 3, 2023
c2ff1fc
address review for underscore prefixes in structs
divyegala Feb 7, 2023
c2ca8ee
Merge remote-tracking branch 'upstream/branch-23.04' into equality-co…
divyegala Feb 7, 2023
ffdf10c
Merge remote-tracking branch 'upstream/branch-23.04' into equality-co…
divyegala Feb 8, 2023
53e918f
add rank
divyegala Feb 8, 2023
65e2bce
fix compile times for rank
divyegala Feb 8, 2023
c6bc7f5
Merge remote-tracking branch 'upstream/branch-23.04' into equality-co…
divyegala Feb 8, 2023
1344e33
Apply suggestions from code review
divyegala Feb 11, 2023
4123379
address review
divyegala Feb 11, 2023
26f38b3
Merge remote-tracking branch 'upstream/branch-23.04' into equality-co…
divyegala Feb 11, 2023
9d0f7a6
address review, mark members of functors as private
divyegala Feb 11, 2023
fe41be8
removing partitioning
divyegala Feb 11, 2023
02dd5c5
simplify lists/contains since it already has a nested-type dispatch m…
divyegala Feb 12, 2023
5db4d03
Merge branch 'branch-23.04' into equality-comp-fast-path
divyegala Feb 13, 2023
9aa23a5
Merge branch 'branch-23.04' into equality-comp-fast-path
divyegala Feb 15, 2023
b52d0f3
Merge branch 'branch-23.04' into equality-comp-fast-path
divyegala Feb 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 72 additions & 7 deletions cpp/include/cudf/table/experimental/row_operators.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,16 @@ using optional_dremel_view = thrust::optional<detail::dremel_device_view const>;
* second letter in both words is the first non-equal letter, and `a < b`, thus
* `aac < abb`.
*
* @note: The operator overloads in sub-class `element_comparator` are templated via the
* `type_dispatcher` to help select an overload instance for each column in a table.
* So, `cudf::is_nested<Element>` will return `true` if the table has nested-type columns,
* but it will be a runtime error if template parameter `has_nested_columns != true`.
divyegala marked this conversation as resolved.
Show resolved Hide resolved
*
* @tparam has_nested_columns compile-time optimization for primitive types.
* This template parameter is to be used by the developer by querying
* `cudf::detail::has_nested_columns(input)`. `true` compiles operator
* overloads for nested types, while `false` only compiles operator
* overloads for primitive types.
divyegala marked this conversation as resolved.
Show resolved Hide resolved
* @tparam Nullate A cudf::nullate type describing whether to check for nulls.
* @tparam PhysicalElementComparator A relational comparator functor that compares individual values
* rather than logical elements, defaults to `NaN` aware relational comparator that evaluates `NaN`
Expand Down Expand Up @@ -857,6 +867,16 @@ class self_comparator {
*
* `F(i,j)` returns true if and only if row `i` compares lexicographically less than row `j`.
*
* @note: The operator overloads in sub-class `element_comparator` are templated via the
* `type_dispatcher` to help select an overload instance for each column in a table.
* So, `cudf::is_nested<Element>` will return `true` if the table has nested-type columns,
* but it will be a runtime error if template parameter `has_nested_columns != true`.
*
* @tparam has_nested_columns compile-time optimization for primitive types.
* This template parameter is to be used by the developer by querying
* `cudf::detail::has_nested_columns(input)`. `true` compiles operator
* overloads for nested types, while `false` only compiles operator
* overloads for primitive types.
* @tparam Nullate A cudf::nullate type describing whether to check for nulls.
* @tparam PhysicalElementComparator A relational comparator functor that compares individual
* values rather than logical elements, defaults to `NaN` aware relational comparator that
Expand Down Expand Up @@ -1009,6 +1029,16 @@ class two_table_comparator {
* only if row `i` of the right table compares lexicographically less than row
* `j` of the left table.
*
* @note: The operator overloads in sub-class `element_comparator` are templated via the
* `type_dispatcher` to help select an overload instance for each column in a table.
* So, `cudf::is_nested<Element>` will return `true` if the table has nested-type columns,
* but it will be a runtime error if template parameter `has_nested_columns != true`.
*
* @tparam has_nested_columns compile-time optimization for primitive types.
* This template parameter is to be used by the developer by querying
* `cudf::detail::has_nested_columns(input)`. `true` compiles operator
* overloads for nested types, while `false` only compiles operator
* overloads for primitive types.
* @tparam Nullate A cudf::nullate type describing whether to check for nulls.
* @tparam PhysicalElementComparator A relational comparator functor that compares individual
* values rather than logical elements, defaults to `NaN` aware relational comparator that
Expand Down Expand Up @@ -1131,11 +1161,22 @@ struct nan_equal_physical_equality_comparator {
* returns false, representing unequal rows. If the rows are compared without mismatched elements,
* the rows are equal.
*
* @note: The operator overloads in sub-class `element_comparator` are templated via the
* `type_dispatcher` to help select an overload instance for each column in a table.
* So, `cudf::is_nested<Element>` will return `true` if the table has nested-type columns,
* but it will be a runtime error if template parameter `has_nested_columns != true`.
*
* @tparam has_nested_columns compile-time optimization for primitive types.
* This template parameter is to be used by the developer by querying
* `cudf::detail::has_nested_columns(input)`. `true` compiles operator
* overloads for nested types, while `false` only compiles operator
* overloads for primitive types.
divyegala marked this conversation as resolved.
Show resolved Hide resolved
* @tparam Nullate A cudf::nullate type describing whether to check for nulls.
* @tparam PhysicalEqualityComparator A equality comparator functor that compares individual values
* rather than logical elements, defaults to a comparator for which `NaN == NaN`.
*/
template <typename Nullate,
template <bool has_nested_columns,
typename Nullate,
typename PhysicalEqualityComparator = nan_equal_physical_equality_comparator>
class device_row_comparator {
friend class self_comparator; ///< Allow self_comparator to access private members
Expand Down Expand Up @@ -1246,14 +1287,14 @@ class device_row_comparator {

template <typename Element,
CUDF_ENABLE_IF(not cudf::is_equality_comparable<Element, Element>() and
not cudf::is_nested<Element>()),
(not has_nested_columns or not cudf::is_nested<Element>())),
typename... Args>
__device__ bool operator()(Args...)
{
CUDF_UNREACHABLE("Attempted to compare elements of uncomparable types.");
}

template <typename Element, CUDF_ENABLE_IF(cudf::is_nested<Element>())>
template <typename Element, CUDF_ENABLE_IF(has_nested_columns and cudf::is_nested<Element>())>
__device__ bool operator()(size_type const lhs_element_index,
size_type const rhs_element_index) const noexcept
{
Expand Down Expand Up @@ -1437,6 +1478,16 @@ class self_comparator {
*
* `F(i,j)` returns true if and only if row `i` compares equal to row `j`.
*
* @note: The operator overloads in sub-class `element_comparator` are templated via the
* `type_dispatcher` to help select an overload instance for each column in a table.
* So, `cudf::is_nested<Element>` will return `true` if the table has nested-type columns,
* but it will be a runtime error if template parameter `has_nested_columns != true`.
*
* @tparam has_nested_columns compile-time optimization for primitive types.
* This template parameter is to be used by the developer by querying
* `cudf::detail::has_nested_columns(input)`. `true` compiles operator
* overloads for nested types, while `false` only compiles operator
* overloads for primitive types.
divyegala marked this conversation as resolved.
Show resolved Hide resolved
* @tparam Nullate A cudf::nullate type describing whether to check for nulls.
* @tparam PhysicalEqualityComparator A equality comparator functor that compares individual
* values rather than logical elements, defaults to a comparator for which `NaN == NaN`.
Expand All @@ -1445,13 +1496,15 @@ class self_comparator {
* @param comparator Physical element equality comparison functor.
* @return A binary callable object
*/
template <typename Nullate,
template <bool has_nested_columns,
typename Nullate,
typename PhysicalEqualityComparator = nan_equal_physical_equality_comparator>
auto equal_to(Nullate nullate = {},
null_equality nulls_are_equal = null_equality::EQUAL,
PhysicalEqualityComparator comparator = {}) const noexcept
{
return device_row_comparator{nullate, *d_t, *d_t, nulls_are_equal, comparator};
return device_row_comparator<has_nested_columns, Nullate, PhysicalEqualityComparator>{
nullate, *d_t, *d_t, nulls_are_equal, comparator};
}

private:
Expand Down Expand Up @@ -1539,6 +1592,16 @@ class two_table_comparator {
* Similarly, `F(rhs_index_type i, lhs_index_type j)` returns true if and only if row `i` of the
* right table compares equal to row `j` of the left table.
*
* @note: The operator overloads in sub-class `element_comparator` are templated via the
* `type_dispatcher` to help select an overload instance for each column in a table.
* So, `cudf::is_nested<Element>` will return `true` if the table has nested-type columns,
* but it will be a runtime error if template parameter `has_nested_columns != true`.
*
* @tparam has_nested_columns compile-time optimization for primitive types.
* This template parameter is to be used by the developer by querying
* `cudf::detail::has_nested_columns(input)`. `true` compiles operator
* overloads for nested types, while `false` only compiles operator
* overloads for primitive types.
divyegala marked this conversation as resolved.
Show resolved Hide resolved
* @tparam Nullate A cudf::nullate type describing whether to check for nulls.
* @tparam PhysicalEqualityComparator A equality comparator functor that compares individual
* values rather than logical elements, defaults to a `NaN == NaN` equality comparator.
Expand All @@ -1547,14 +1610,16 @@ class two_table_comparator {
* @param comparator Physical element equality comparison functor.
* @return A binary callable object
*/
template <typename Nullate,
template <bool has_nested_columns,
typename Nullate,
typename PhysicalEqualityComparator = nan_equal_physical_equality_comparator>
auto equal_to(Nullate nullate = {},
null_equality nulls_are_equal = null_equality::EQUAL,
PhysicalEqualityComparator comparator = {}) const noexcept
{
return strong_index_comparator_adapter{
device_row_comparator(nullate, *d_left_table, *d_right_table, nulls_are_equal, comparator)};
device_row_comparator<has_nested_columns, Nullate, PhysicalEqualityComparator>(
nullate, *d_left_table, *d_right_table, nulls_are_equal, comparator)};
}

private:
Expand Down
76 changes: 58 additions & 18 deletions cpp/src/binaryop/compiled/struct_binary_ops.cuh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2022, NVIDIA CORPORATION.
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -106,6 +106,35 @@ void apply_struct_binary_op(mutable_column_view& out,
}
}

template <typename OptionalIteratorType, typename DeviceComparatorType>
struct struct_equality_functor {
struct_equality_functor(OptionalIteratorType optional_iter_,
bool is_lhs_scalar_,
bool is_rhs_scalar_,
bool preserve_output_,
DeviceComparatorType device_comparator_)
: optional_iter(optional_iter_),
is_lhs_scalar(is_lhs_scalar_),
is_rhs_scalar(is_rhs_scalar_),
preserve_output(preserve_output_),
device_comparator(device_comparator_)
{
}

auto __device__ operator()(size_type i)
divyegala marked this conversation as resolved.
Show resolved Hide resolved
{
auto lhs = cudf::experimental::row::lhs_index_type{is_lhs_scalar ? 0 : i};
auto rhs = cudf::experimental::row::rhs_index_type{is_rhs_scalar ? 0 : i};
return optional_iter[i].has_value() and (device_comparator(lhs, rhs) == preserve_output);
}

OptionalIteratorType optional_iter;
bool is_lhs_scalar;
bool is_rhs_scalar;
bool preserve_output;
DeviceComparatorType device_comparator;
};

template <typename PhysicalEqualityComparator =
cudf::experimental::row::equality::physical_equality_comparator>
void apply_struct_equality_op(mutable_column_view& out,
Expand All @@ -125,26 +154,37 @@ void apply_struct_equality_op(mutable_column_view& out,
auto trhs = table_view{{rhs}};
auto table_comparator =
cudf::experimental::row::equality::two_table_comparator{tlhs, trhs, stream};
auto device_comparator =
table_comparator.equal_to(nullate::DYNAMIC{has_nested_nulls(tlhs) || has_nested_nulls(trhs)},
null_equality::EQUAL,
comparator);

auto outd = column_device_view::create(out, stream);
auto optional_iter =
cudf::detail::make_optional_iterator<bool>(*outd, nullate::DYNAMIC{out.has_nulls()});
thrust::tabulate(rmm::exec_policy(stream),
out.begin<bool>(),
out.end<bool>(),
[optional_iter,
is_lhs_scalar,
is_rhs_scalar,
preserve_output = (op != binary_operator::NOT_EQUAL),
device_comparator] __device__(size_type i) {
auto lhs = cudf::experimental::row::lhs_index_type{is_lhs_scalar ? 0 : i};
auto rhs = cudf::experimental::row::rhs_index_type{is_rhs_scalar ? 0 : i};
return optional_iter[i].has_value() and
(device_comparator(lhs, rhs) == preserve_output);
});

auto const comparator_helper = [&](auto const device_comparator) {
thrust::tabulate(rmm::exec_policy(stream),
out.begin<bool>(),
out.end<bool>(),
struct_equality_functor<decltype(optional_iter), decltype(device_comparator)>(
optional_iter,
is_lhs_scalar,
is_rhs_scalar,
op != binary_operator::NOT_EQUAL,
device_comparator));
};

if (cudf::detail::has_nested_columns(tlhs) or cudf::detail::has_nested_columns(trhs)) {
auto device_comparator = table_comparator.equal_to<true>(
nullate::DYNAMIC{has_nested_nulls(tlhs) || has_nested_nulls(trhs)},
null_equality::EQUAL,
comparator);

comparator_helper(device_comparator);
} else {
auto device_comparator = table_comparator.equal_to<false>(
nullate::DYNAMIC{has_nested_nulls(tlhs) || has_nested_nulls(trhs)},
null_equality::EQUAL,
comparator);

comparator_helper(device_comparator);
}
}
} // namespace cudf::binops::compiled::detail
Loading