Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully support nested types in cudf::contains #10656

Merged
merged 394 commits into from
Aug 17, 2022
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
394 commits
Select commit Hold shift + click to select a range
7863017
limit changes to distinct and not other stream compaction ops
devavret Apr 10, 2022
8bb7572
small ctor changes in row hasher
devavret Apr 10, 2022
3fb6865
Use accumulate wherever possible
devavret Apr 11, 2022
3cc1159
Merge branch 'branch-22.06' into list-row-eq
devavret Apr 11, 2022
e76a2f3
Add nulls in benchmark
devavret Apr 12, 2022
4f46db9
Add seed support to row hasher
devavret Apr 12, 2022
cbe757a
adding noexcept speeds up a bit
devavret Apr 12, 2022
92c77a5
hide device_row_hasher ctor to disallow use without owning row_hasher
devavret Apr 12, 2022
655bedd
Move nested traversal into an adapter class
devavret Apr 12, 2022
8078e3c
Column_device_view review changes
devavret Apr 12, 2022
aa5385b
Merge branch 'list-row-eq' into list-row-hash
devavret Apr 12, 2022
9c1b0d0
Add empty nesting tests
devavret Apr 12, 2022
e314601
Merge branch 'branch-22.06' into list-row-hash
devavret Apr 13, 2022
b5ca2e7
style check copyright
devavret Apr 13, 2022
0634092
Add a test
ttnghia Apr 14, 2022
7cb22f0
Fix bug that ignores null at the top level
ttnghia Apr 14, 2022
52ef585
Rename variable
ttnghia Apr 14, 2022
1534bc0
Rename variable
ttnghia Apr 14, 2022
2c40182
Review comments by @bdice
devavret Apr 14, 2022
cecf5ec
Add a test
ttnghia Apr 14, 2022
d9d1818
Fix test
ttnghia Apr 14, 2022
48938c5
Doc updates
devavret Apr 14, 2022
bede9db
Review changes
devavret Apr 14, 2022
405975c
Fix empty struct bug
devavret Apr 14, 2022
a7f6963
more review changes
devavret Apr 14, 2022
4388ab4
iterate over structs directly instead of transforming from indices
devavret Apr 14, 2022
0150c47
remove null hash as a param from adapter
devavret Apr 14, 2022
94bb3c1
Implement `table_comparator` in the new experimental struct comparator
ttnghia Apr 14, 2022
1cd15ff
Temporary disable failed tests
ttnghia Apr 14, 2022
8e3689f
Fix slicing problem
ttnghia Apr 15, 2022
65a5cde
review changes
devavret Apr 15, 2022
51789d2
has_nulls -> check_nulls
devavret Apr 15, 2022
023ec1f
Fix a problem with struct offsets being already applied
devavret Apr 15, 2022
1b0238f
leftover has_nulls -> check_nulls, and docs
devavret Apr 17, 2022
4a97709
Change how seed is included in row hashin
devavret Apr 17, 2022
8bdcd9e
change adapter to stored element hasher
devavret Apr 18, 2022
70086b6
Fix broken pytest
devavret Apr 18, 2022
a8a2788
Fix index
ttnghia Apr 18, 2022
c532d7f
Revert "Fix index"
ttnghia Apr 19, 2022
5faa24f
Shorten function
ttnghia Apr 21, 2022
bc15949
Rewrite `search_bound`
ttnghia Apr 22, 2022
53bb891
Add `lexicographic::table_comparator`
ttnghia Apr 22, 2022
652e90d
Rewrite `contains_scalar_dispatch`
ttnghia Apr 22, 2022
0205e4e
Implement scalar search for list type
ttnghia Apr 22, 2022
4c99c81
Add const, and comments
ttnghia Apr 25, 2022
8a3adfa
Merge remote-tracking branch 'devavret/list-row-hash' into support_st…
ttnghia Apr 25, 2022
7b3de03
Add template keyword
ttnghia Apr 26, 2022
b256026
Use `cuco::static_map`
ttnghia Apr 28, 2022
6c9e0e4
Rename template parameter
ttnghia Apr 28, 2022
329ea9b
Use `make_device_uvector` instead of `thrust::uninitialized_fill`
ttnghia Apr 28, 2022
4904a65
Misc
ttnghia Apr 28, 2022
50b8891
Add strong index type.
bdice Apr 16, 2022
b9ed4d7
Revert changes to non-experimental row operators.
bdice Apr 20, 2022
d67f17e
Use enum for strongly typed index.
bdice May 3, 2022
464ed2b
Add two table comparator and adapter.
bdice May 3, 2022
b26b318
Add friends. :)
bdice May 3, 2022
1fd199d
Apply two-table comparator to search algorithms.
bdice May 3, 2022
18bd9f0
Move shared lhs/rhs logic into launch_search.
bdice May 3, 2022
b5b8b39
Improve comments, remove old code.
bdice May 3, 2022
96a72fa
Merge branch 'branch-22.06' into support_structs_in_contains
ttnghia May 4, 2022
6877ad7
Add template list tests
ttnghia May 4, 2022
075d9d6
Merge branch 'branch-22.06' into support_structs_in_contains
ttnghia May 11, 2022
388c516
Merge branch 'strong-index-type' into support_structs_in_contains
ttnghia May 11, 2022
f3dd8e7
Hack to get thing working
ttnghia May 11, 2022
4060b4f
Merge remote-tracking branch 'upstream/branch-22.06' into strong-inde…
bdice May 11, 2022
73c4b27
Move strong typing code into cudf::experimental::row::lexicographic.
bdice May 11, 2022
02774fb
Merge branch 'branch-22.06' into support_structs_in_contains
ttnghia May 13, 2022
f746182
Fix compiler warning treated as error
ttnghia May 13, 2022
ef40471
Fix error
ttnghia May 13, 2022
3a8f5ca
Change headers order
ttnghia May 13, 2022
f1c7079
Merge branch 'strong-index-type' into support_structs_in_contains
ttnghia May 13, 2022
ce81da4
Fix row comparator
ttnghia May 13, 2022
b339138
Adopt new row comparator
ttnghia May 13, 2022
9cdbe27
Merge remote-tracking branch 'upstream/branch-22.06' into strong-inde…
bdice May 13, 2022
c8a38fe
Improve comment.
bdice May 13, 2022
8b5ef34
Fix docstrings.
bdice May 13, 2022
4c63f4c
Fix tests
ttnghia May 13, 2022
1845e5e
Fix comparator
ttnghia May 13, 2022
205c0f2
Cleanup `search_ordered.cu`
ttnghia May 13, 2022
f475cf0
Add `contains_nested.cu` and remove struct `contains.hpp`
ttnghia May 13, 2022
48de04d
Remove `contains.cu` for struct
ttnghia May 13, 2022
0d8a208
Move file
ttnghia May 13, 2022
77f85b4
Enable weak ordering machinery (weak_ordering_comparator_impl) to wra…
bdice May 13, 2022
529e944
Remove template template parameters.
bdice May 13, 2022
fb0e192
Use references.
bdice May 13, 2022
d157199
Remove file
ttnghia May 13, 2022
56d99ba
Use Ts const...
bdice May 13, 2022
c5998b7
Move strong typing to cudf::experimental::row.
bdice May 13, 2022
b78d978
Use constexpr.
bdice May 13, 2022
427ce6e
Rewrite code
ttnghia May 13, 2022
f0f1efc
Add const
ttnghia May 13, 2022
81a4572
Fix compile error
ttnghia May 13, 2022
8b1f4e6
Add type check
ttnghia May 13, 2022
3aea8d4
Use custom iterator class.
bdice May 14, 2022
bbaf360
Use __device__ only.
bdice May 14, 2022
4a1d7aa
Add comment.
bdice May 14, 2022
09c5661
Use symmetry of comparator (now possible with weak ordering) to avoid…
bdice May 14, 2022
290323f
Add constexpr to two_table_device_row_comparator_adapter.
bdice May 14, 2022
4c69edd
Remove forward (always accepts lvalues).
bdice May 14, 2022
ea8c223
Merge branch 'strong-index-type' into strong_typed_index
ttnghia May 16, 2022
857f570
Implement strong typed index for equality comparator
ttnghia May 16, 2022
d9f63f0
Adopt new strong typed index
ttnghia May 16, 2022
12f7a8b
Remove header
ttnghia May 16, 2022
fbd5b90
Indicate reversed signature.
bdice May 16, 2022
3db6484
Move constructor to implementation, add shape compatibility check.
bdice May 16, 2022
3e81b53
Improve docstrings.
bdice May 16, 2022
1834095
Use thrust::iterator_facade.
bdice May 16, 2022
9cb656b
Merge branch 'branch-22.06' into strong_typed_index
ttnghia May 16, 2022
c766bf3
Merge branch 'strong-index-type' into strong_typed_index
ttnghia May 16, 2022
a311bcc
Add type check
ttnghia May 16, 2022
b935835
Change parameter side for comparator
ttnghia May 16, 2022
73b8caf
Add comment
ttnghia May 17, 2022
ff26024
Use const for struct members.
bdice May 17, 2022
f779bff
Slim down the strong index layer by using a templated struct.
bdice May 17, 2022
157abbc
Simplify construction.
bdice May 17, 2022
a2ac19d
Use size_type const where possible.
bdice May 17, 2022
75249e8
Require weakly or strongly typed values for lhs_index and rhs_index.
bdice May 17, 2022
f50faf5
Merge branch 'strong-index-type' into strong_typed_index
ttnghia May 17, 2022
bed1162
Unconstrain template typenames.
bdice May 18, 2022
6930952
Merge branch 'strong-index-type' into strong_typed_index
ttnghia May 18, 2022
2781af1
Remove type check
ttnghia May 18, 2022
8b239d4
Update adapter comparator
ttnghia May 18, 2022
17bd96c
Merge branch 'branch-22.06' into strong_typed_index
ttnghia May 18, 2022
ae77f68
Merge branch 'branch-22.06' into strong_typed_index
ttnghia May 18, 2022
d6b5eb9
Remove deprecated code
ttnghia May 18, 2022
5f39a28
Fix comments
ttnghia May 18, 2022
bf3555c
Fix header check
ttnghia May 18, 2022
df98698
Fix format
ttnghia May 18, 2022
b67d070
Rename variables and fix header
ttnghia May 18, 2022
4429cc4
Merge branch 'branch-22.06' into strong_typed_index
ttnghia May 18, 2022
893db8a
Address review comments
ttnghia May 18, 2022
7e69d3b
Merge branch 'branch-22.06' into strong_typed_index
ttnghia May 18, 2022
c51b053
Fix renaming issue
ttnghia May 18, 2022
934ee73
Switch operands
ttnghia May 18, 2022
cf996f0
Update cpp/src/structs/search/contains.cu
ttnghia May 18, 2022
1dd30eb
Merge branch 'strong_typed_index' into support_structs_in_contains
ttnghia May 18, 2022
73fbc47
Fix compile issue
ttnghia May 18, 2022
fdcbd43
Rewrite `multi_contains_nested_elements`
ttnghia May 18, 2022
261e98c
Add `nested_type_scalar_to_column_view`
ttnghia May 19, 2022
91447cf
Rename .cuh into .hpp
ttnghia May 19, 2022
16ea31d
WIP
ttnghia May 19, 2022
f0f2c4a
Merge remote-tracking branch 'nghia/column_utility' into support_stru…
ttnghia May 19, 2022
662933d
Adopt `nested_type_scalar_to_column_view`
ttnghia May 19, 2022
658cefb
Rewrite `contains_nested.cu`
ttnghia May 19, 2022
29613bc
Move implementation to cpp file
ttnghia May 19, 2022
a596d40
Merge branch 'column_utility' into support_structs_in_contains
ttnghia May 19, 2022
e765cc8
Add comment
ttnghia May 19, 2022
ca25394
Move file
ttnghia May 19, 2022
1b6c3e7
Fix comment
ttnghia May 19, 2022
85e426a
Merge branch 'branch-22.06' into support_structs_in_contains
ttnghia May 20, 2022
5a1d7b3
Add declarations into header
ttnghia May 20, 2022
7fd9b8f
Remove extern declaration
ttnghia May 20, 2022
0246b5a
Add doxygen
ttnghia May 20, 2022
5a3fefe
Rewrite comments
ttnghia May 20, 2022
2c1d8ad
Update doxygen
ttnghia May 20, 2022
1ba60a2
Cleanup
ttnghia May 20, 2022
d72da6e
Simplify code
ttnghia May 20, 2022
5079cce
Add empty test
ttnghia May 20, 2022
427a36d
Add TrivialInputTest
ttnghia May 20, 2022
e1edeb4
Implement `row_hasher_adapter`
ttnghia May 20, 2022
9811c4d
Fix test
ttnghia May 20, 2022
df13c3c
Rewrite comments
ttnghia May 20, 2022
bbbf223
Rewrite comments
ttnghia May 20, 2022
3e94ab6
Add SlicedInputNoNulls test
ttnghia May 20, 2022
5816703
Add SlicedInputHavingNulls test
ttnghia May 20, 2022
d0e2cbf
Add StructOfLists test
ttnghia May 20, 2022
705087b
Cleanup
ttnghia May 20, 2022
008520a
Rename variables
ttnghia May 20, 2022
bae8587
Add TrivialInputTest
ttnghia May 20, 2022
e8c72e6
Add EmptyInputTest
ttnghia May 20, 2022
eff240b
Rename variable
ttnghia May 20, 2022
8f3b640
Add more test
ttnghia May 20, 2022
bde4f2e
Rename tests and change macros
ttnghia May 20, 2022
34afa0f
Remove variable
ttnghia May 20, 2022
c9b3e56
Add more list tests
ttnghia May 20, 2022
07cf7e3
Modify test
ttnghia May 20, 2022
fb87126
Add verbosity
ttnghia May 20, 2022
40754e0
Add TrivialInput test
ttnghia May 20, 2022
cdbaee3
Add SlicedInputHavingNulls test
ttnghia May 20, 2022
c153d38
Add ListsOfStructs test
ttnghia May 20, 2022
f38b1ff
Fix cmake format
ttnghia May 20, 2022
2554afc
Update copyright year
ttnghia May 20, 2022
679524d
Materialize column from scalar
ttnghia May 23, 2022
3678871
Merge branch 'branch-22.08' into support_structs_in_contains
ttnghia May 23, 2022
fa0b92e
Merge branch 'branch-22.08' into support_structs_in_contains
ttnghia May 26, 2022
a6e88cb
Merge branch 'branch-22.08' into support_structs_in_contains
ttnghia May 31, 2022
8ff9c31
Minor corrections
ttnghia May 31, 2022
dfb6649
Fix doxygen warnings in cudf/*.hpp (#10896)
karthikeyann May 28, 2022
1d21874
Move new data structure into `row_operators.cuh`
ttnghia May 31, 2022
7247cd3
Rewrite comments
ttnghia May 31, 2022
0351a1b
Use `CUDF_ENABLE_IF` macro
ttnghia May 31, 2022
ff96d3d
Add comments
ttnghia May 31, 2022
f170d53
Explicity add type
ttnghia May 31, 2022
1fec114
Change verbosity value
ttnghia May 31, 2022
17e47f6
Add comment
ttnghia May 31, 2022
1f38650
Update tests
ttnghia May 31, 2022
e5b2cae
Revert "Use `CUDF_ENABLE_IF` macro"
ttnghia May 31, 2022
bf5e47e
Merge branch 'branch-22.08' into support_structs_in_contains
ttnghia May 31, 2022
385fa1d
Use `CUDF_ENABLE_IF`
ttnghia May 31, 2022
0bd161e
Remove `needles_it` variable
ttnghia May 31, 2022
fc4b1e7
Rewrite doxygen
ttnghia Jun 1, 2022
f05d170
Fix typos
ttnghia Jun 1, 2022
4eb552c
Rename function and structs
ttnghia Jun 1, 2022
047cfd7
Use dynamic nullate
ttnghia Jun 1, 2022
d8e31b1
Rewrite comments
ttnghia Jun 1, 2022
bd99a5d
Rewrite doxygen
ttnghia Jun 1, 2022
e7a7ebc
Rename variable
ttnghia Jun 1, 2022
36c9d89
Rename variable
ttnghia Jun 1, 2022
1c92a02
Add a benchmark
ttnghia Jun 2, 2022
074d382
Fix typo
ttnghia Jun 2, 2022
221a9ce
Revert "Add a benchmark"
ttnghia Jun 2, 2022
ff5de21
Merge branch 'branch-22.08' into support_structs_in_contains
ttnghia Jun 5, 2022
875ee41
Fix merge error
ttnghia Jun 5, 2022
c18aeac
Merge branch 'branch-22.08' into support_structs_in_contains
ttnghia Jun 10, 2022
0d66a8c
Employ heterogeneous map lookup
ttnghia Jun 10, 2022
00017fd
Use customize cuCollection
ttnghia Jun 10, 2022
3a67f5e
Use `static_multimap`
ttnghia Jun 10, 2022
2f92850
Update cuco
ttnghia Jun 10, 2022
a50cd03
Remove redundant code
ttnghia Jun 10, 2022
cd1b5ca
Add comments
ttnghia Jun 10, 2022
5b502dc
Cleanup
ttnghia Jun 10, 2022
a2da86b
Remove unused header
ttnghia Jun 10, 2022
0c33171
Run benchmark for `static_map` vs `static_multimap`
ttnghia Jun 13, 2022
baa3ee7
Use CG in map
ttnghia Jun 13, 2022
94da829
Revert "Use `static_multimap`"
ttnghia Jun 13, 2022
789c6e5
Change benchmark
ttnghia Jun 13, 2022
51db095
Change benchmarks for 2 int columns
ttnghia Jun 13, 2022
7e51b86
Update bench
ttnghia Jun 13, 2022
ff69f00
Merge branch 'branch-22.10' into support_structs_in_contains
ttnghia Jul 27, 2022
2d89874
Merge branch 'branch-22.10' into support_structs_in_contains
ttnghia Jul 29, 2022
9ef1ecc
Reverse `row_operators.cuh`
ttnghia Jul 29, 2022
f8bce36
Reverse `stream_compaction_common.cuh`
ttnghia Jul 29, 2022
5f9e58e
Fix compile issue
ttnghia Jul 29, 2022
3a56e2b
Fix null mask
ttnghia Jul 29, 2022
4ce8673
Reverse `search.cpp`
ttnghia Jul 29, 2022
e597835
Add comment
ttnghia Aug 2, 2022
4870b67
Rewrite comments
ttnghia Aug 2, 2022
852b7ff
Merge branch 'branch-22.10' into support_structs_in_contains
ttnghia Aug 2, 2022
cf16b07
Merge branch 'branch-22.10' into support_structs_in_contains
ttnghia Aug 3, 2022
78f2624
Simplify column construction
ttnghia Aug 3, 2022
9f983df
Merge branch 'branch-22.10' into support_structs_in_contains
ttnghia Aug 9, 2022
d53875e
Update cpp/src/search/contains_column.cu
ttnghia Aug 17, 2022
34221ed
Merge branch 'branch-22.10' into support_structs_in_contains
ttnghia Aug 17, 2022
a962312
Remove line separators
ttnghia Aug 17, 2022
adfe6c6
Unify functions
ttnghia Aug 17, 2022
1f1e009
Remove function declarations
ttnghia Aug 17, 2022
c4a7fde
Rewrite comments
ttnghia Aug 17, 2022
c12a758
Extract `contains_scalar.cu`
ttnghia Aug 17, 2022
659d770
Change `Type` to `Element` to be consistent with other places
ttnghia Aug 17, 2022
28f115f
Merge branch 'branch-22.10' into support_structs_in_contains
ttnghia Aug 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 20 additions & 3 deletions cpp/include/cudf/detail/search.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -105,12 +105,29 @@ rmm::device_uvector<bool> contains(
* This function is designed for nested types only. It can also work with non-nested types
* but with lower performance due to the complexity of the implementation.
*
* @param haystack The column containing search space.
* @param needle A scalar value to check for existence in the search space.
* @return true if the given `needle` value exists in the `haystack` column.
* @param haystack The column containing search space
* @param needle A scalar value to check for existence in the search space
* @return true if the given `needle` value exists in the `haystack` column
*/
bool contains_nested_element(column_view const& haystack,
column_view const& needle,
rmm::cuda_stream_view stream);
PointKernel marked this conversation as resolved.
Show resolved Hide resolved

/**
* @brief Check if each row of the `needles` column exists in the `haystack` column.
*
* This function is designed for nested types only. It can also work with non-nested types
* but with lower performance due to the complexity of the implementation.
*
* @param haystack The column containing search space
* @param needles A column of values to check for existence in the search space
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return A BOOL column indicating if each element in `needles` exists in the search space
*/
std::unique_ptr<column> multi_contains_nested_elements(column_view const& haystack,
column_view const& needles,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

} // namespace cudf::detail
124 changes: 58 additions & 66 deletions cpp/src/search/contains_column.cu
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include <hash/unordered_multiset.cuh>

#include <cudf/column/column_factories.hpp>
#include <cudf/detail/null_mask.hpp>
#include <cudf/detail/nvtx/ranges.hpp>
#include <cudf/detail/search.hpp>
#include <cudf/dictionary/detail/search.hpp>
Expand Down Expand Up @@ -60,12 +61,16 @@ __device__ auto inline get_scalar_value(ScalarDView d_scalar)

struct contains_scalar_dispatch {
template <typename Type>
bool operator()(column_view const& haystack,
scalar const& needle,
rmm::cuda_stream_view stream) const
std::enable_if_t<!is_nested<Type>(), bool> operator()(column_view const& haystack,
ttnghia marked this conversation as resolved.
Show resolved Hide resolved
scalar const& needle,
rmm::cuda_stream_view stream) const
{
CUDF_EXPECTS(haystack.type() == needle.type(), "scalar and column types must match");

// In case the input scalar is invalid, it should be handled at the caller site before
ttnghia marked this conversation as resolved.
Show resolved Hide resolved
// dispatching to this function.
// (Handling such case is very simple: just check if the input haystack column has nulls).

using DType = device_storage_type_t<Type>;
auto const d_haystack = column_device_view::create(haystack, stream);
auto const d_needle =
Expand All @@ -90,28 +95,24 @@ struct contains_scalar_dispatch {
}) > 0;
}
}
};

template <>
bool contains_scalar_dispatch::operator()<cudf::list_view>(column_view const&,
scalar const&,
rmm::cuda_stream_view) const
{
CUDF_FAIL("list_view type not supported yet");
}
template <typename Type>
std::enable_if_t<is_nested<Type>(), bool> operator()(column_view const& haystack,
scalar const& needle,
rmm::cuda_stream_view stream) const
{
CUDF_EXPECTS(haystack.type() == needle.type(), "scalar and column types must match");
// Haystack and needle structure compatibility will be checked by the table comparator
// constructor during call to `contains_nested_element`.

template <>
bool contains_scalar_dispatch::operator()<cudf::struct_view>(column_view const& haystack,
scalar const& needle,
rmm::cuda_stream_view stream) const
{
CUDF_EXPECTS(haystack.type() == needle.type(), "scalar and column types must match");
// Haystack and needle structure compatibility will be checked by the table comparator
// constructor during call to `contains_nested_element`.
// In case the input scalar is invalid, it should be handled at the caller site before
// dispatching to this function.
// (Handling such case is very simple: just check if the input haystack column has nulls).

auto const needle_as_col = make_column_from_scalar(needle, 1, stream);
return contains_nested_element(haystack, needle_as_col->view(), stream);
}
auto const needle_as_col = make_column_from_scalar(needle, 1, stream);
return contains_nested_element(haystack, needle_as_col->view(), stream);
}
};

template <>
bool contains_scalar_dispatch::operator()<cudf::dictionary32>(column_view const& haystack,
Expand All @@ -130,15 +131,32 @@ bool contains_scalar_dispatch::operator()<cudf::dictionary32>(column_view const&
}

struct multi_contains_dispatch {
template <typename Type>
template <typename ElementType, typename Haystack>
struct contains_fn {
bool __device__ operator()(size_type const idx) const
{
if (needles_have_nulls && needles.is_null_nocheck(idx)) {
// `true` or `false`: doesn't matter, this will be masked out as a null element.
ttnghia marked this conversation as resolved.
Show resolved Hide resolved
return true;
}

return haystack.contains(needles.template element<ElementType>(idx));
}

Haystack const haystack;
column_device_view const needles;
bool const needles_have_nulls;
};

template <typename Type, CUDF_ENABLE_IF(!is_nested<Type>())>
std::unique_ptr<column> operator()(column_view const& haystack,
column_view const& needles,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const
{
auto result = make_numeric_column(data_type{type_to_id<bool>()},
needles.size(),
copy_bitmask(needles),
copy_bitmask(needles, stream, mr),
needles.null_count(),
stream,
mr);
Expand All @@ -152,53 +170,27 @@ struct multi_contains_dispatch {
}

auto const haystack_set = cudf::detail::unordered_multiset<Type>::create(haystack, stream);
auto const haystack_set_dv = haystack_set.to_device();
auto const needles_cdv_ptr = column_device_view::create(needles, stream);
auto const needles_it = thrust::make_counting_iterator<size_type>(0);

if (needles.has_nulls()) {
thrust::transform(rmm::exec_policy(stream),
needles_it,
needles_it + needles.size(),
out_begin,
[haystack = haystack_set.to_device(),
needles = *needles_cdv_ptr] __device__(size_type const idx) {
return needles.is_null_nocheck(idx) ||
haystack.contains(needles.template element<Type>(idx));
});
} else {
thrust::transform(rmm::exec_policy(stream),
needles_it,
needles_it + needles.size(),
out_begin,
[haystack = haystack_set.to_device(),
needles = *needles_cdv_ptr] __device__(size_type const index) {
return haystack.contains(needles.template element<Type>(index));
});
}

thrust::transform(rmm::exec_policy(stream),
thrust::make_counting_iterator<size_type>(0),
thrust::make_counting_iterator<size_type>(needles.size()),
out_begin,
contains_fn<Type, decltype(haystack_set_dv)>{
haystack_set_dv, *needles_cdv_ptr, needles.has_nulls()});
return result;
}
};

template <>
std::unique_ptr<column> multi_contains_dispatch::operator()<list_view>(
column_view const&,
column_view const&,
rmm::cuda_stream_view,
rmm::mr::device_memory_resource*) const
{
CUDF_FAIL("list_view type not supported");
}

template <>
std::unique_ptr<column> multi_contains_dispatch::operator()<struct_view>(
column_view const&,
column_view const&,
rmm::cuda_stream_view,
rmm::mr::device_memory_resource*) const
{
CUDF_FAIL("struct_view type not supported");
}
template <typename Type, CUDF_ENABLE_IF(is_nested<Type>())>
std::unique_ptr<column> operator()(column_view const& haystack,
column_view const& needles,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const
{
return multi_contains_nested_elements(haystack, needles, stream, mr);
}
};

template <>
std::unique_ptr<column> multi_contains_dispatch::operator()<dictionary32>(
Expand Down
22 changes: 22 additions & 0 deletions cpp/src/search/contains_nested.cu
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
*/

#include <cudf/detail/iterator.cuh>
#include <cudf/detail/null_mask.hpp>
#include <cudf/detail/search.hpp>
#include <cudf/table/experimental/row_operators.cuh>
#include <cudf/table/table_view.hpp>

Expand Down Expand Up @@ -63,4 +65,24 @@ bool contains_nested_element(column_view const& haystack,
}) > 0;
}

std::unique_ptr<column> multi_contains_nested_elements(column_view const& haystack,
column_view const& needles,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
auto result_v = contains(table_view{{haystack}},
table_view{{needles}},
null_equality::EQUAL,
nan_equality::ALL_EQUAL,
stream,
mr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why this detail function returns a device_uvector instead of a column?
Will the result ever be larger than what a column supports?

Copy link
Contributor Author

@ttnghia ttnghia Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That detail::contains(table_view, table_view) function was initially refactored out of semi-join (#11100). Now, it is also used in several other places but the result is only used for temporary computation and never be returned to the user. Thus, device_uvector is enough at this time.

It is still unclear whether we will expose this function to the public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also say that (especially for internal APIs like this, but occasionally even otherwise) I prefer using a strongly-typed object like a device_uvector rather than the type-erased (and nullable) column. We have more information available and convey it. Even for public APIs I like uvectors when we don't need type erasure or nullability because the returned values are clearly defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This discussion is also related to #11356.


// todo: https://github.com/rapidsai/cudf/pull/11356
auto result =
std::make_unique<column>(data_type{type_to_id<bool>()}, needles.size(), result_v.release());
result->set_null_mask(copy_bitmask(needles, stream, mr), needles.null_count());
ttnghia marked this conversation as resolved.
Show resolved Hide resolved

return result;
}

} // namespace cudf::detail
4 changes: 2 additions & 2 deletions cpp/tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -357,8 +357,8 @@ ConfigureTest(
# ##################################################################################################
# * search test -----------------------------------------------------------------------------------
ConfigureTest(
SEARCH_TEST search/search_dictionary_test.cpp search/search_struct_test.cpp
search/search_test.cpp
SEARCH_TEST search/search_dictionary_test.cpp search/search_list_test.cpp
search/search_struct_test.cpp search/search_test.cpp
)

# ##################################################################################################
Expand Down
Loading