Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested struct binop comparison #9452

Closed
wants to merge 79 commits into from
Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
262c2a2
struct binop first pass
rwlee Sep 30, 2021
a865fe0
vector-vector nested struct comparison
rwlee Oct 14, 2021
ce4440c
cleanup and simplify core code
rwlee Oct 16, 2021
1472c5f
remove type dispatch and other code cleanup
rwlee Oct 20, 2021
4a31fb6
move struct comparison to compiled binops code
rwlee Oct 20, 2021
ce2d727
improved testing, type checks, and skipped null value calculations
rwlee Oct 21, 2021
10c95f9
cleanup
rwlee Oct 21, 2021
12cd09e
Merge branch 'branch-21.12' into rwlee/struct_col_compare
rwlee Oct 21, 2021
b6fa590
fix upmerge issues
rwlee Oct 21, 2021
d64c1f9
fix logic and improve documentation
rwlee Oct 22, 2021
57900da
clean up logic for nulls
rwlee Oct 22, 2021
f170149
remove unecessary call to superimpose parent nulls
rwlee Oct 22, 2021
de129a1
PR fixes
rwlee Oct 27, 2021
5e84e89
Merge branch 'branch-21.12' into rwlee/struct_col_compare
rwlee Oct 27, 2021
b632192
pr fixes
rwlee Oct 27, 2021
4266f8c
Merge branch 'branch-21.12' into rwlee/struct_col_compare
rwlee Oct 27, 2021
367ec07
restructure struct binop code and other pr fixes
rwlee Nov 3, 2021
5a1f016
Merge branch 'branch-21.12' into rwlee/struct_col_compare
rwlee Nov 3, 2021
1f29168
Merge branch 'branch-21.12' into rwlee/struct_col_compare
rwlee Nov 3, 2021
1d6263e
full paths for includes
rwlee Nov 9, 2021
97bd5e1
Merge branch 'branch-21.12' into rwlee/struct_col_compare
rwlee Nov 9, 2021
48d0355
move to new TU and remove common code
rwlee Nov 11, 2021
6cf0e16
fix logic errors and push down struct branching
rwlee Nov 11, 2021
9ec2acf
remove deleted file from CMakeLists
rwlee Nov 11, 2021
3016abf
Naming and comment fixes
rwlee Nov 11, 2021
b2a7973
naming
rwlee Nov 11, 2021
191da69
style formatting
rwlee Nov 11, 2021
2cf2b28
merge apply_binary_op and _impl implementation
rwlee Nov 12, 2021
2b634c4
all apply_binary_op calls call apply_binary_op_impl
rwlee Nov 13, 2021
19f1afb
common code path
rwlee Nov 23, 2021
f316a0a
explicit instantiation of struct_compare
rwlee Nov 23, 2021
c684ef1
Merge branch 'branch-22.02' into rwlee/struct_col_compare
rwlee Nov 23, 2021
7f36241
streamline explicit instantiation
rwlee Nov 29, 2021
8cf0660
Merge remote-tracking branch 'pub/branch-22.02' into rwlee/struct_col…
rwlee Nov 29, 2021
2abefd5
remove op argument
rwlee Dec 5, 2021
8cc05e2
documentation
rwlee Dec 6, 2021
1bde152
Merge branch 'branch-22.02' into rwlee/struct_col_compare
rwlee Dec 6, 2021
470acfe
Fix upmerge errors
rwlee Dec 6, 2021
ce21d90
Merge remote-tracking branch 'pub/branch-22.02' into rwlee/struct_col…
rwlee Dec 16, 2021
83fa370
Merge remote-tracking branch 'pub/branch-22.04' into rwlee/struct_col…
rwlee Feb 1, 2022
2b77739
fix new ops from upmerge
rwlee Feb 4, 2022
de09cec
Fix floating point nan handling in struct comparison binops
rwlee Feb 8, 2022
8ad9545
Merge remote-tracking branch 'pub/branch-22.04' into rwlee/struct_col…
rwlee Feb 8, 2022
251d607
fix formatting
rwlee Feb 9, 2022
703aaf8
fix copyright
rwlee Feb 14, 2022
43e451b
fix accidently deletd function
rwlee Feb 14, 2022
9ec4a41
style fix
rwlee Feb 15, 2022
201a89b
copyright fix
rwlee Feb 15, 2022
cc164d6
Merge remote-tracking branch 'pub/branch-22.04' into rwlee/struct_col…
rwlee Feb 15, 2022
1bb1534
fix cmake style
rwlee Feb 15, 2022
6c6c8ab
re-add missing function name
rwlee Feb 16, 2022
42e58ae
style fix
rwlee Feb 16, 2022
475c896
Fix struct equality binop comparisons
rwlee Feb 19, 2022
1dae04a
PR reviews
rwlee Mar 7, 2022
62224cf
Merge remote-tracking branch 'pub/branch-22.04' into rwlee/struct_col…
rwlee Mar 8, 2022
a35600d
refactor row comparison operators into common spaceship operator
rwlee Mar 22, 2022
b6f0397
Merge remote-tracking branch 'pub/branch-22.04' into rwlee/struct_col…
rwlee Mar 22, 2022
fcc1dd2
first pass, test failures
rwlee Mar 29, 2022
5abf2a8
Merge remote-tracking branch 'pub/branch-22.06' into rwlee/struct_col…
rwlee Mar 29, 2022
9d50ac0
Refactor struct binop comparison to use experimental ops
rwlee Apr 16, 2022
8628c24
Merge remote-tracking branch 'pub/branch-22.06' into rwlee/struct_col…
rwlee Apr 16, 2022
a836a96
Merge remote-tracking branch 'pub/branch-22.06' into rwlee/struct_col…
rwlee Apr 18, 2022
a537805
fix performance regression and code cleanup
rwlee May 2, 2022
f7af41f
Merge remote-tracking branch 'pub/branch-22.06' into rwlee/struct_col…
rwlee May 2, 2022
1af4643
fix merge errors
rwlee May 3, 2022
4d929d9
Merge remote-tracking branch 'upstream/branch-22.06' into rwlee/struc…
bdice May 3, 2022
2298988
Revert include changes.
bdice May 3, 2022
bf1c6ee
split off weak ordering row operator changes
rwlee May 4, 2022
5d87db2
device_row_comparator private with friend class
rwlee May 4, 2022
fd716b9
Merge remote-tracking branch 'pub/branch-22.06' into rwlee/row_op_split
rwlee May 4, 2022
2dd2045
device_less conversion to templated struct
rwlee May 6, 2022
7ba960e
fold parameter pack
rwlee May 9, 2022
84833e7
Apply suggestions from code review
rwlee May 10, 2022
a944b4f
Merge remote-tracking branch 'pub/branch-22.06' into rwlee/struct_col…
rwlee May 10, 2022
4d197ea
fix code style
rwlee May 10, 2022
08092fe
Merge branch 'rwlee/row_op_split' of github.com:rwlee/cudf into rwlee…
rwlee May 10, 2022
d8986c5
Merge branch 'rwlee/row_op_split' into rwlee/struct_col_compare
rwlee May 10, 2022
548dcf1
fix code format
rwlee May 10, 2022
1dd1159
Merge remote-tracking branch 'upstream/branch-22.06' into rwlee/struc…
bdice May 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions cpp/include/cudf/detail/structs/utilities.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,20 @@ std::tuple<cudf::table_view, std::vector<rmm::device_buffer>> superimpose_parent
table_view const& table,
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Checks if column or any of its children are struct columns with structs that are null.
rwlee marked this conversation as resolved.
Show resolved Hide resolved
rwlee marked this conversation as resolved.
Show resolved Hide resolved
*
* This function searchings for structs that are null and differentiates between them and structs
rwlee marked this conversation as resolved.
Show resolved Hide resolved
* containing null values. Struct nulls add a column to the table result of the flatten column
* utility. The existence of struct nulls necessitates the use of column_nullability::FORCE when
* flattening the column for comparison.
*
* @param col Column to check for structs containing nulls
* @return true If the column is or contains a struct column with struct nulls
* @return false If the column is not a struct column or does not contain struct nulls
*/
bool contains_struct_nulls(column_view const& col);
rwlee marked this conversation as resolved.
Show resolved Hide resolved
} // namespace detail
} // namespace structs
} // namespace cudf
157 changes: 145 additions & 12 deletions cpp/src/binaryop/compiled/binary_ops.cu
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,20 @@

#include <cudf/column/column_device_view.cuh>
#include <cudf/column/column_factories.hpp>
#include <cudf/detail/iterator.cuh>
#include <cudf/detail/utilities/vector_factories.hpp>
#include <cudf/scalar/scalar_device_view.cuh>
#include <cudf/strings/detail/utilities.cuh>
#include <cudf/table/row_operators.cuh>
#include <cudf/table/table_device_view.cuh>
#include <cudf/table/table_view.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <rmm/device_uvector.hpp>
#include <rmm/exec_policy.hpp>

#include <cudf/detail/structs/utilities.hpp>

namespace cudf {
namespace binops {
namespace compiled {
Expand Down Expand Up @@ -239,6 +246,111 @@ struct null_considering_binop {
}
};

rmm::device_uvector<order> get_orders(binary_operator op,
uint32_t const num_columns,
rmm::cuda_stream_view stream)
{
std::vector<order> op_modifier(
num_columns,
(op == binary_operator::LESS || op == binary_operator::GREATER_EQUAL) ? order::ASCENDING
rwlee marked this conversation as resolved.
Show resolved Hide resolved
: order::DESCENDING);
return cudf::detail::make_device_uvector_async(op_modifier, stream);
}

template <typename Comparator, typename OptionalOutIter>
void struct_compare_tabulation(mutable_column_view& out,
rwlee marked this conversation as resolved.
Show resolved Hide resolved
Comparator compare,
binary_operator op,
OptionalOutIter optional_iter,
rmm::cuda_stream_view stream)
{
(op == binary_operator::EQUAL || op == binary_operator::LESS || op == binary_operator::GREATER)
? thrust::tabulate(rmm::exec_policy(stream),
out.begin<bool>(),
out.end<bool>(),
[optional_iter, compare] __device__(size_type i) {
return optional_iter[i].has_value() and compare(i, i);
})
: thrust::tabulate(rmm::exec_policy(stream),
rwlee marked this conversation as resolved.
Show resolved Hide resolved
out.begin<bool>(),
out.end<bool>(),
[optional_iter, compare] __device__(size_type i) {
return optional_iter[i].has_value() and not compare(i, i);
rwlee marked this conversation as resolved.
Show resolved Hide resolved
});
}

void struct_binary_operation(mutable_column_view& out,
column_view const& lhs,
column_view const& rhs,
binary_operator op,
rmm::cuda_stream_view stream)
{
bool const has_struct_nulls =
structs::detail::contains_struct_nulls(lhs) || structs::detail::contains_struct_nulls(rhs);
auto const lhs_superimposed = structs::detail::superimpose_parent_nulls(lhs);
auto const rhs_superimposed = structs::detail::superimpose_parent_nulls(rhs);
auto const lhs_flattener = structs::detail::flatten_nested_columns(
table_view{{std::get<0>(lhs_superimposed)}},
{},
{},
has_struct_nulls ? structs::detail::column_nullability::FORCE
: structs::detail::column_nullability::MATCH_INCOMING);
auto const rhs_flattener = structs::detail::flatten_nested_columns(
table_view{{std::get<0>(rhs_superimposed)}},
{},
{},
has_struct_nulls ? structs::detail::column_nullability::FORCE
: structs::detail::column_nullability::MATCH_INCOMING);
rwlee marked this conversation as resolved.
Show resolved Hide resolved

auto lhs_flat = lhs_flattener.flattened_columns();
auto rhs_flat = rhs_flattener.flattened_columns();

auto d_out = column_device_view::create(out, stream);
auto d_lhs = table_device_view::create(lhs_flat);
auto d_rhs = table_device_view::create(rhs_flat);
bool has_nulls = has_nested_nulls(lhs_flat) || has_nested_nulls(rhs_flat);

if (op == binary_operator::EQUAL || op == binary_operator::NOT_EQUAL) {
if (has_nulls) {
auto out_iter = cudf::detail::make_optional_iterator<bool>(
*d_out, contains_nulls::DYNAMIC{}, out.has_nulls());
struct_compare_tabulation(
out, row_equality_comparator<true>{*d_lhs, *d_rhs, true}, op, out_iter, stream);
} else {
auto out_iter = cudf::detail::make_optional_iterator<bool>(
*d_out, contains_nulls::DYNAMIC{}, out.has_nulls());
struct_compare_tabulation(
out, row_equality_comparator<false>{*d_lhs, *d_rhs, true}, op, out_iter, stream);
}
} else if (op == binary_operator::LESS || op == binary_operator::LESS_EQUAL ||
op == binary_operator::GREATER || op == binary_operator::GREATER_EQUAL) {
if (has_nulls) {
auto out_iter = cudf::detail::make_optional_iterator<bool>(
*d_out, contains_nulls::DYNAMIC{}, out.has_nulls());
struct_compare_tabulation(
out,
row_lexicographic_comparator<true>{
*d_lhs, *d_rhs, get_orders(op, lhs_flat.num_columns(), stream).data()},
op,
out_iter,
stream);
} else {
auto out_iter = cudf::detail::make_optional_iterator<bool>(
*d_out, contains_nulls::DYNAMIC{}, out.has_nulls());
struct_compare_tabulation(
out,
row_lexicographic_comparator<false>{
*d_lhs, *d_rhs, get_orders(op, lhs_flat.num_columns(), stream).data()},
op,
out_iter,
stream);
}
// } else if (op == binary_operator::NULL_EQUALS) {
} else {
CUDF_FAIL("Unsupported operator for these types");
}
hyperbolic2346 marked this conversation as resolved.
Show resolved Hide resolved
}

} // namespace

std::unique_ptr<column> string_null_min_max(scalar const& lhs,
Expand Down Expand Up @@ -351,10 +463,16 @@ void binary_operation(mutable_column_view& out,
binary_operator op,
rmm::cuda_stream_view stream)
{
auto lhsd = column_device_view::create(lhs, stream);
auto rhsd = column_device_view::create(rhs, stream);
auto outd = mutable_column_device_view::create(out, stream);
operator_dispatcher(*outd, *lhsd, *rhsd, false, false, op, stream);
if (lhs.type().id() == type_id::STRUCT && rhs.type().id() == type_id::STRUCT) {
rwlee marked this conversation as resolved.
Show resolved Hide resolved
CUDF_EXPECTS(struct_children_support_operation(out.type(), lhs, rhs, op),
"Unsupported operator for these types");
struct_binary_operation(out, lhs, rhs, op, stream);
} else {
auto lhsd = column_device_view::create(lhs, stream);
auto rhsd = column_device_view::create(rhs, stream);
auto outd = mutable_column_device_view::create(out, stream);
operator_dispatcher(*outd, *lhsd, *rhsd, false, false, op, stream);
rwlee marked this conversation as resolved.
Show resolved Hide resolved
}
}
// scalar_vector
void binary_operation(mutable_column_view& out,
Expand All @@ -363,10 +481,17 @@ void binary_operation(mutable_column_view& out,
binary_operator op,
rmm::cuda_stream_view stream)
{
auto [lhsd, aux] = scalar_to_column_device_view(lhs, stream);
auto rhsd = column_device_view::create(rhs, stream);
auto outd = mutable_column_device_view::create(out, stream);
operator_dispatcher(*outd, *lhsd, *rhsd, true, false, op, stream);
if (lhs.type().id() == type_id::STRUCT && rhs.type().id() == type_id::STRUCT) {
auto lhs_col = make_column_from_scalar(lhs, rhs.size(), stream);
CUDF_EXPECTS(struct_children_support_operation(out.type(), lhs_col->view(), rhs, op),
"Unsupported operator for these types");
struct_binary_operation(out, lhs_col->view(), rhs, op, stream);
} else {
auto [lhsd, aux] = scalar_to_column_device_view(lhs, stream);
auto rhsd = column_device_view::create(rhs, stream);
auto outd = mutable_column_device_view::create(out, stream);
operator_dispatcher(*outd, *lhsd, *rhsd, true, false, op, stream);
}
}
// vector_scalar
void binary_operation(mutable_column_view& out,
Expand All @@ -375,10 +500,18 @@ void binary_operation(mutable_column_view& out,
binary_operator op,
rmm::cuda_stream_view stream)
{
auto lhsd = column_device_view::create(lhs, stream);
auto [rhsd, aux] = scalar_to_column_device_view(rhs, stream);
auto outd = mutable_column_device_view::create(out, stream);
operator_dispatcher(*outd, *lhsd, *rhsd, false, true, op, stream);
if (lhs.type().id() == type_id::STRUCT && rhs.type().id() == type_id::STRUCT) {
auto rhs_col = make_column_from_scalar(rhs, lhs.size(), stream);
CUDF_EXPECTS(struct_children_support_operation(out.type(), lhs, rhs_col->view(), op),
"Unsupported operator for these types");

struct_binary_operation(out, lhs, rhs_col->view(), op, stream);
} else {
auto lhsd = column_device_view::create(lhs, stream);
auto [rhsd, aux] = scalar_to_column_device_view(rhs, stream);
auto outd = mutable_column_device_view::create(out, stream);
operator_dispatcher(*outd, *lhsd, *rhsd, false, true, op, stream);
}
}

} // namespace compiled
Expand Down
16 changes: 15 additions & 1 deletion cpp/src/binaryop/compiled/binary_ops.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -167,11 +167,25 @@ std::optional<data_type> get_common_type(data_type out, data_type lhs, data_type
* @param out output type of the binary operation
* @param lhs first operand type of the binary operation
* @param rhs second operand type of the binary operation
* @param op binary operator enum.
* @param op binary operator enum
* @return true if given binary operator supports given input and output types.
*/
bool is_supported_operation(data_type out, data_type lhs, data_type rhs, binary_operator op);

/**
* @brief Check if input binary operation is supported for the given input and output types.
*
* @param out output type of the binary operation
* @param lhs left column of the binary operation
* @param rhs right column of the binary operation
* @param op binary operator enum
* @return true if given binary operator supports given input and output types.
*/
bool struct_children_support_operation(data_type out,
rwlee marked this conversation as resolved.
Show resolved Hide resolved
column_view const& lhs,
column_view const& rhs,
binary_operator op);

// Defined in individual .cu files.
/**
* @brief Deploys single type or double type dispatcher that runs binary operation on each element
Expand Down
26 changes: 26 additions & 0 deletions cpp/src/binaryop/compiled/util.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@

#include <cudf/binaryop.hpp>
#include <cudf/column/column_device_view.cuh>
#include <cudf/column/column_view.hpp>
#include <cudf/types.hpp>
#include <cudf/utilities/traits.hpp>
#include <cudf/utilities/type_dispatcher.hpp>

Expand Down Expand Up @@ -181,6 +183,30 @@ std::optional<data_type> get_common_type(data_type out, data_type lhs, data_type

bool is_supported_operation(data_type out, data_type lhs, data_type rhs, binary_operator op)
{
if (lhs.id() == type_id::STRUCT && rhs.id() == type_id::STRUCT) {
return op == binary_operator::EQUAL || op == binary_operator::NOT_EQUAL ||
op == binary_operator::LESS || op == binary_operator::LESS_EQUAL ||
op == binary_operator::GREATER || op == binary_operator::GREATER_EQUAL;
}
return double_type_dispatcher(lhs, rhs, is_supported_operation_functor{}, out, op);
rwlee marked this conversation as resolved.
Show resolved Hide resolved
}

bool struct_children_support_operation(data_type out,
column_view const& lhs,
column_view const& rhs,
binary_operator op)
{
if (lhs.type().id() == type_id::STRUCT && rhs.type().id() == type_id::STRUCT) {
return lhs.num_children() == rhs.num_children() &&
std::all_of(thrust::counting_iterator<size_type>(0),
thrust::counting_iterator<size_type>(lhs.num_children()),
[&](size_type i) {
return struct_children_support_operation(
out, lhs.child(i), rhs.child(i), op);
});

} else {
return is_supported_operation(out, lhs.type(), rhs.type(), op);
}
}
} // namespace cudf::binops::compiled
8 changes: 8 additions & 0 deletions cpp/src/structs/utilities.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,14 @@ std::tuple<cudf::table_view, std::vector<rmm::device_buffer>> superimpose_parent
return {table_view{superimposed_columns}, std::move(superimposed_nullmasks)};
}

bool contains_struct_nulls(column_view const& col)
{
return (col.type().id() == type_id::STRUCT && col.has_nulls()) ||
std::all_of(col.child_begin(), col.child_end(), [](auto const& child) {
rwlee marked this conversation as resolved.
Show resolved Hide resolved
return contains_struct_nulls(child);
});
}

} // namespace detail
} // namespace structs
} // namespace cudf
1 change: 1 addition & 0 deletions cpp/tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ ConfigureTest(BINARY_TEST
binaryop/binop-compiled-test.cpp
binaryop/binop-compiled-fixed_point-test.cpp
binaryop/binop-generic-ptx-test.cpp
binaryop/binop-struct-test.cpp
)

###################################################################################################
Expand Down
Loading