Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement mixed equality/conditional joins #9917

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
c2ba04d
Add APIs for mixed equality/inequality joins.
vyasr Nov 23, 2021
c1eec27
Add new implementations for mixed joins.
vyasr Nov 24, 2021
bcc2c7b
Move mixed joins to separate file to avoid introducing hash join rela…
vyasr Nov 24, 2021
06c26ea
Start implementing map logic in new join, add notes.
vyasr Nov 25, 2021
061c74c
Create new kernels for mixed joins.
vyasr Nov 25, 2021
db11c2b
Pass hash table device view to kernel.
vyasr Nov 29, 2021
342a8d1
Revert change to build_join_hash_table allowing empty tables and filt…
vyasr Nov 29, 2021
1fdce0f
Version of mixed kernel with correct logic, but currently requires to…
vyasr Dec 2, 2021
9b173b0
Fix some bugs and reduce shared memory usage to the point of compilat…
vyasr Dec 7, 2021
a77fbaf
Create new array to hold per-row matches and define new comparator fo…
vyasr Dec 8, 2021
13c30d2
Compiling version of new code that computes per-row matches.
vyasr Dec 9, 2021
9c3c3a7
Change size APIs to return per-row counts and verify results.
vyasr Dec 9, 2021
ae446db
Change mixed join APIs to accept the offsets.
vyasr Dec 9, 2021
df173af
First attempt at using offsets column in join.
vyasr Dec 9, 2021
5b8bf2d
Fix seg fault, now just a buggy impl.
vyasr Dec 9, 2021
636a604
Fix bugs, implementation now works as expected for simple test.
vyasr Dec 11, 2021
cb14a53
Fix a bug and add one more extensive test.
vyasr Dec 13, 2021
890fd6f
Start addressing TODOs.
vyasr Dec 13, 2021
e693960
Address more TODOs.
vyasr Dec 14, 2021
6b39bdd
Start setting up scaffolding for more systematic testing.
vyasr Dec 14, 2021
444a140
Change APIs to accept separate table views for the hash join.
vyasr Dec 14, 2021
5515daa
Add compare_nulls parameter and various additional internal error che…
vyasr Dec 14, 2021
40ea881
Update tests to only use the subset of necessary columns for the cond…
vyasr Dec 14, 2021
9e54706
Move equality tables before conditional tables.
vyasr Dec 14, 2021
2b2c6cc
Plumb compare_nulls all the way through.
vyasr Dec 14, 2021
273be21
Start defining helper functions for generalized mixed join testing.
vyasr Dec 14, 2021
c34de6a
Move mixed join tests to separate TU.
vyasr Dec 14, 2021
473efdb
Define standard testing function for mixed inner joins.
vyasr Dec 15, 2021
bd87a3d
Update to use new cuco API with 4 iterators instead of two.
vyasr Dec 15, 2021
c5f8f9a
Add proper support for non-inner joins.
vyasr Dec 15, 2021
e11e84e
Add left join APIs and tests.
vyasr Dec 15, 2021
07b48c1
Add full join APIs and tests.
vyasr Dec 15, 2021
c74af29
Add tests of and fix bug with asymmetric joins.
vyasr Dec 15, 2021
8a4df8e
Clean up some TODOs.
vyasr Dec 15, 2021
f9739fc
Add tests for and fix null support.
vyasr Dec 15, 2021
80a268d
Rewrite docstrings.
vyasr Dec 16, 2021
b240cc5
Change size APIs to return columns instead of device_uvectors.
vyasr Dec 16, 2021
2bb25a3
Change join APIs to use column_views instead of device_spans.
vyasr Dec 16, 2021
5f49458
Merge remote-tracking branch 'origin/branch-22.02' into feature/mixed…
vyasr Dec 16, 2021
d724647
Update cuco.
vyasr Dec 16, 2021
ec5d601
Fix unnecessary extra line.
vyasr Dec 16, 2021
15a6d52
Apply suggestions from code review
vyasr Dec 16, 2021
1dc63d8
Fix swapping logic in operator in conjunction with proper handling of…
vyasr Dec 18, 2021
78dd48d
Simplify operator logic and add comments.
vyasr Dec 18, 2021
9041b99
Merge remote-tracking branch 'origin/branch-22.02' into feature/mixed…
vyasr Dec 21, 2021
5726c5e
Fix style.
vyasr Dec 21, 2021
ecb77ca
Remove addressed TODOs.
vyasr Dec 22, 2021
a7f9d15
Merge remote-tracking branch 'origin/branch-22.02' into feature/mixed…
vyasr Jan 6, 2022
b7accc3
Address simple PR comments.
vyasr Jan 6, 2022
0daaacd
Switch raw pointers to device spans for matches_per_row.
vyasr Jan 6, 2022
0915bff
Remove unsupported full join size calculation.
vyasr Jan 6, 2022
d7ab5a7
Merge remote-tracking branch 'origin/branch-22.02' into feature/mixed…
vyasr Jan 11, 2022
0442cd3
Construct a row equality comparator on the host and pass it to the ke…
vyasr Jan 11, 2022
ad41f9d
Some minor cleanup.
vyasr Jan 11, 2022
4d4f4e6
Change the offsets to an rmm::device_uvector<size_type> instead of a …
vyasr Jan 11, 2022
c0e2297
Fix APIs to accept const spans instead of non-const ones.
vyasr Jan 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions cpp/include/cudf/ast/detail/expression_evaluator.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -426,9 +426,10 @@ struct expression_evaluator {
* @param row_index Row index of all input and output data column(s).
*/
template <typename ResultSubclass, typename T, bool result_has_nulls>
CUDF_DFI void evaluate(expression_result<ResultSubclass, T, result_has_nulls>& output_object,
cudf::size_type const row_index,
IntermediateDataType<has_nulls>* thread_intermediate_storage) const
__device__ __forceinline__ void evaluate(
expression_result<ResultSubclass, T, result_has_nulls>& output_object,
cudf::size_type const row_index,
IntermediateDataType<has_nulls>* thread_intermediate_storage) const
{
evaluate(output_object, row_index, row_index, row_index, thread_intermediate_storage);
}
Expand All @@ -446,11 +447,12 @@ struct expression_evaluator {
* @param output_row_index The row in the output to insert the result.
*/
template <typename ResultSubclass, typename T, bool result_has_nulls>
CUDF_DFI void evaluate(expression_result<ResultSubclass, T, result_has_nulls>& output_object,
cudf::size_type const left_row_index,
cudf::size_type const right_row_index,
cudf::size_type const output_row_index,
IntermediateDataType<has_nulls>* thread_intermediate_storage) const
__device__ __forceinline__ void evaluate(
expression_result<ResultSubclass, T, result_has_nulls>& output_object,
cudf::size_type const left_row_index,
cudf::size_type const right_row_index,
cudf::size_type const output_row_index,
IntermediateDataType<has_nulls>* thread_intermediate_storage) const
{
cudf::size_type operator_source_index{0};
for (cudf::size_type operator_index = 0; operator_index < plan.operators.size();
Expand Down
43 changes: 22 additions & 21 deletions cpp/include/cudf/join.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -897,24 +897,25 @@ std::unique_ptr<rmm::device_uvector<size_type>> conditional_left_anti_join(
* @param right_conditional The right table used for the conditional join.
* @param binary_predicate The condition on which to join.
* @param compare_nulls Whether or not null values join to each other or not.
* @param output_size An optional pair of values indicating the exact output size and the number of
* matches for each row in the larger of the two input tables, left or right (may be precomputed
* using the corresponding mixed_inner_join_size API).
* @param output_size_data An optional pair of values indicating the exact output size and the
* number of matches for each row in the larger of the two input tables, left or right (may be
* precomputed using the corresponding mixed_inner_join_size API).
* @param mr Device memory resource used to allocate the returned table and columns' device memory
*
* @return A pair of vectors [`left_indices`, `right_indices`] that can be used to construct
* the result of performing a mixed inner join between the four input tables.
*/
std::pair<std::unique_ptr<rmm::device_uvector<size_type>>,
std::unique_ptr<rmm::device_uvector<size_type>>>
mixed_inner_join(table_view const& left_equality,
table_view const& right_equality,
table_view const& left_conditional,
table_view const& right_conditional,
ast::expression const& binary_predicate,
null_equality compare_nulls = null_equality::EQUAL,
std::optional<std::pair<std::size_t, column_view>> output_size_data = {},
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());
mixed_inner_join(
table_view const& left_equality,
table_view const& right_equality,
table_view const& left_conditional,
table_view const& right_conditional,
ast::expression const& binary_predicate,
null_equality compare_nulls = null_equality::EQUAL,
std::optional<std::pair<std::size_t, device_span<size_type>>> output_size_data = {},
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Returns a pair of row index vectors corresponding to all pairs of
Expand Down Expand Up @@ -956,9 +957,9 @@ mixed_inner_join(table_view const& left_equality,
* @param right_conditional The right table used for the conditional join.
* @param binary_predicate The condition on which to join.
* @param compare_nulls Whether or not null values join to each other or not.
* @param output_size An optional pair of values indicating the exact output size and the number of
* matches for each row in the larger of the two input tables, left or right (may be precomputed
* using the corresponding mixed_left_join_size API).
* @param output_size_data An optional pair of values indicating the exact output size and the
* number of matches for each row in the larger of the two input tables, left or right (may be
* precomputed using the corresponding mixed_left_join_size API).
* @param mr Device memory resource used to allocate the returned table and columns' device memory
*
* @return A pair of vectors [`left_indices`, `right_indices`] that can be used to construct
Expand All @@ -972,7 +973,7 @@ mixed_left_join(table_view const& left_equality,
table_view const& right_conditional,
ast::expression const& binary_predicate,
null_equality compare_nulls = null_equality::EQUAL,
std::optional<std::pair<std::size_t, column_view>> output_size_data = {},
std::optional<std::pair<std::size_t, device_span<size_type>>> output_size_data = {},
vyasr marked this conversation as resolved.
Show resolved Hide resolved
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
Expand Down Expand Up @@ -1015,9 +1016,9 @@ mixed_left_join(table_view const& left_equality,
* @param right_conditional The right table used for the conditional join.
* @param binary_predicate The condition on which to join.
* @param compare_nulls Whether or not null values join to each other or not.
* @param output_size An optional pair of values indicating the exact output size and the number of
* matches for each row in the larger of the two input tables, left or right (may be precomputed
* using the corresponding mixed_full_join_size API).
* @param output_size_data An optional pair of values indicating the exact output size and the
* number of matches for each row in the larger of the two input tables, left or right (may be
* precomputed using the corresponding mixed_full_join_size API).
* @param mr Device memory resource used to allocate the returned table and columns' device memory
*
* @return A pair of vectors [`left_indices`, `right_indices`] that can be used to construct
Expand All @@ -1031,7 +1032,7 @@ mixed_full_join(table_view const& left_equality,
table_view const& right_conditional,
ast::expression const& binary_predicate,
null_equality compare_nulls = null_equality::EQUAL,
std::optional<std::pair<std::size_t, column_view>> output_size_data = {},
std::optional<std::pair<std::size_t, device_span<size_type>>> output_size_data = {},
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
Expand Down Expand Up @@ -1068,7 +1069,7 @@ mixed_full_join(table_view const& left_equality,
* be relied upon, simply passed to the corresponding `mixed_inner_join` API as
* is.
*/
std::pair<std::size_t, std::unique_ptr<column>> mixed_inner_join_size(
std::pair<std::size_t, std::unique_ptr<rmm::device_uvector<size_type>>> mixed_inner_join_size(
table_view const& left_equality,
table_view const& right_equality,
table_view const& left_conditional,
Expand Down Expand Up @@ -1111,7 +1112,7 @@ std::pair<std::size_t, std::unique_ptr<column>> mixed_inner_join_size(
* be relied upon, simply passed to the corresponding `mixed_left_join` API as
* is.
*/
std::pair<std::size_t, std::unique_ptr<column>> mixed_left_join_size(
std::pair<std::size_t, std::unique_ptr<rmm::device_uvector<size_type>>> mixed_left_join_size(
table_view const& left_equality,
table_view const& right_equality,
table_view const& left_conditional,
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cudf/table/row_operators.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -492,11 +492,11 @@ template <template <typename> class hash_function, typename Nullate>
class row_hasher {
public:
row_hasher() = delete;
CUDA_HOST_DEVICE_CALLABLE row_hasher(Nullate has_nulls, table_device_view t)
CUDF_HOST_DEVICE row_hasher(Nullate has_nulls, table_device_view t)
: _table{t}, _has_nulls{has_nulls}
{
}
CUDA_HOST_DEVICE_CALLABLE row_hasher(Nullate has_nulls, table_device_view t, uint32_t seed)
CUDF_HOST_DEVICE row_hasher(Nullate has_nulls, table_device_view t, uint32_t seed)
: _table{t}, _seed(seed), _has_nulls{has_nulls}
{
}
Expand Down
4 changes: 2 additions & 2 deletions cpp/src/join/hash_join.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ constexpr auto remap_sentinel_hash(H hash, S sentinel)
*/
class make_pair_function {
public:
CUDA_HOST_DEVICE_CALLABLE make_pair_function(row_hash const& hash,
hash_value_type const empty_key_sentinel)
CUDF_HOST_DEVICE make_pair_function(row_hash const& hash,
hash_value_type const empty_key_sentinel)
: _hash{hash}, _empty_key_sentinel{empty_key_sentinel}
{
}
Expand Down
Loading