Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor cudf::contains, renaming and switching parameters role #10802

Merged
merged 22 commits into from
May 10, 2022

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented May 5, 2022

This PR does the following:

  • Renaming parameters of cudf::contains, changing from t/values, col/value, etc... into haystack/needle in a consistent way.
  • Switching the role of haystack and needles parameters of the overload cudf::contains(column_view, column_view), which incorrectly searches for haystack inside needles.
  • Update the internal usage of that overloads in cudf.
  • Update unit tests.
  • Rewriting all cudf::contains doxygen.
  • And some minor code cleanup/refactor.

Since the role of parameters is switched, this causes breaking changes.

Closes #10725. In addition, this is also a foundation for more changes in search.cu to support nested types in #10656.

@ttnghia ttnghia added 3 - Ready for Review Ready for review by team code quality libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS improvement Improvement / enhancement to an existing function breaking Breaking change labels May 5, 2022
@ttnghia ttnghia requested a review from a team as a code owner May 5, 2022 20:11
@ttnghia ttnghia self-assigned this May 5, 2022
@github-actions github-actions bot added the Java Affects Java cuDF API. label May 5, 2022
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a few docs nitpicks.

cpp/include/cudf/search.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/search.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/search.hpp Outdated Show resolved Hide resolved
@jrhemstad
Copy link
Contributor

image

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor const nits, but otherwise Java changes lgtm.

java/src/main/native/src/ColumnViewJni.cpp Outdated Show resolved Hide resolved
@github-actions github-actions bot added the Python Affects Python cuDF API. label May 9, 2022
@ttnghia ttnghia requested a review from a team as a code owner May 9, 2022 23:57
@ttnghia ttnghia requested review from trxcllnt and bdice May 9, 2022 23:57
@codecov

This comment was marked as off-topic.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, we should also update the parameter names for the Python bindings for lower and upper bound in this file. Also the second parameter for contains looks like it should be needles, not needle. These bindings are like a C++ header in that they only export declarations, so the parameter names don't actually need to match the definition for the code to function correctly, but it will still be confusing to readers if they don't.

@ttnghia
Copy link
Contributor Author

ttnghia commented May 10, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 4539e5e into rapidsai:branch-22.06 May 10, 2022
@ttnghia ttnghia deleted the rename_parameter_search branch May 10, 2022 18:42
rapids-bot bot pushed a commit that referenced this pull request May 11, 2022
Compile warning introduced with merge of PR #10802 

```
10 warnings like this:
12:43:23 $SRC_DIR/cpp/src/search/search.cu(108): warning #177-D: parameter "args" was declared but never referenced
12:43:23           detected during instantiation of function "lambda [](auto &&...)->auto [with <auto-1>=<rmm::exec_policy, const
thrust::counting_iterator<cudf::size_type, thrust::use_default, thrust::use_default, thrust::use_default> &, 
thrust::counting_iterator<cudf::size_type, thrust::use_default, thrust::use_default, thrust::use_default>, const 
thrust::counting_iterator<cudf::size_type, thrust::use_default, thrust::use_default, thrust::use_default> &, 
thrust::counting_iterator<cudf::size_type, thrust::use_default, thrust::use_default, thrust::use_default>, cudf::size_type *const &, 
const cudf::row_lexicographic_comparator<cudf::nullate::DYNAMIC> &>]" 
12:43:23 (121): here
```

Line 108 has a lambda refactoring that seems to confuse the compiler. 
```
  auto const do_search = [find_first](auto&&... args) {
    if (find_first) {
      thrust::lower_bound(std::forward<decltype(args)>(args)...);
    } else {
      thrust::upper_bound(std::forward<decltype(args)>(args)...);
    }
  };
  do_search(rmm::exec_policy(stream),
            count_it,  count_it + haystack.num_rows(), count_it, count_it + needles.num_rows(),
            out_it, comp);
```

The warning is wrong and the compiler generates the correct code so this is likely a compiler bug.

This PR fixes the warning by replacing the lambda with an if statement.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Yunsong Wang (https://github.com/PointKernel)

URL: #10827
rapids-bot bot pushed a commit that referenced this pull request Aug 17, 2022
This extends the `cudf::contains` API to support nested types (lists + structs) with arbitrarily nested levels. As such, `cudf::contains` will work with literally any type of input data.

In addition, this fixes null handling of `cudf::contains` with structs column + struct scalar input when the structs column contains null rows at the top level while the scalar key is valid but all nulls at children levels.

Closes: #8965
Depends on:
 * #10730
 * #10883
 * #10802
 * #10997
 * NVIDIA/cuCollections#172
 * NVIDIA/cuCollections#173
 * #11037
 * #11356

Authors:
  - Nghia Truong (https://github.com/ttnghia)
  - Devavret Makkar (https://github.com/devavret)
  - Bradley Dice (https://github.com/bdice)
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Bradley Dice (https://github.com/bdice)
  - Yunsong Wang (https://github.com/PointKernel)

URL: #10656
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] cudf::contains(column, column) should be consistent with other contains() APIs
7 participants