Skip to content

Commit

Permalink
Update join to use experimental row hasher and comparator (#12787)
Browse files Browse the repository at this point in the history
Part of #11844. I will create a separate PR for `mixed_join`.

Compilation times:
`main` 94bbc82 : `16m47.513s`
This PR 5d75db8 : `16m47.520s`

Benchmarks: #12787 (comment)

Authors:
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Nghia Truong (https://github.com/ttnghia)

URL: #12787
  • Loading branch information
divyegala authored Apr 6, 2023
1 parent 0cf8c91 commit d5aad2f
Show file tree
Hide file tree
Showing 5 changed files with 427 additions and 214 deletions.
19 changes: 9 additions & 10 deletions cpp/include/cudf/detail/join.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@
template <typename T>
class default_allocator;

namespace cudf::structs::detail {
class flattened_table;
namespace cudf::experimental::row::equality {
class preprocessed_table;
}

namespace cudf {
Expand Down Expand Up @@ -77,9 +77,9 @@ struct hash_join {
rmm::device_buffer const _composite_bitmask; ///< Bitmask to denote whether a row is valid
cudf::null_equality const _nulls_equal; ///< whether to consider nulls as equal
cudf::table_view _build; ///< input table to build the hash map
std::unique_ptr<cudf::structs::detail::flattened_table>
_flattened_build_table; ///< flattened data structures for `_build`
map_type _hash_table; ///< hash table built on `_build`
std::shared_ptr<cudf::experimental::row::equality::preprocessed_table>
_preprocessed_build; ///< input table preprocssed for row operators
map_type _hash_table; ///< hash table built on `_build`

public:
/**
Expand Down Expand Up @@ -152,21 +152,20 @@ struct hash_join {
* i.e. if full join is specified as the join type then left join is called. Behavior
* is undefined if the provided `output_size` is smaller than the actual output size.
*
* @throw cudf::logic_error if build table is empty and `JoinKind == INNER_JOIN`.
*
* @tparam JoinKind The type of join to be performed.
* @throw cudf::logic_error if build table is empty and `join == INNER_JOIN`.
*
* @param probe_table Table of probe side columns to join.
* @param join The type of join to be performed.
* @param output_size Optional value which allows users to specify the exact output size.
* @param stream CUDA stream used for device memory operations and kernel launches.
* @param mr Device memory resource used to allocate the returned vectors.
*
* @return Join output indices vector pair.
*/
template <cudf::detail::join_kind JoinKind>
std::pair<std::unique_ptr<rmm::device_uvector<size_type>>,
std::unique_ptr<rmm::device_uvector<size_type>>>
probe_join_indices(cudf::table_view const& probe_table,
join_kind join,
std::optional<std::size_t> output_size,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;
Expand All @@ -179,10 +178,10 @@ struct hash_join {
* @throw cudf::logic_error if the number of columns in build table and probe table do not match.
* @throw cudf::logic_error if the column data types in build table and probe table do not match.
*/
template <cudf::detail::join_kind JoinKind>
std::pair<std::unique_ptr<rmm::device_uvector<size_type>>,
std::unique_ptr<rmm::device_uvector<size_type>>>
compute_hash_join(cudf::table_view const& probe,
join_kind join,
std::optional<std::size_t> output_size,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;
Expand Down
Loading

0 comments on commit d5aad2f

Please sign in to comment.