-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support contains() on lists of primitives #7039
Conversation
Update CHANGELOG, + headers yaml
Codecov Report
@@ Coverage Diff @@
## branch-0.18 #7039 +/- ##
===============================================
+ Coverage 82.09% 82.20% +0.11%
===============================================
Files 97 98 +1
Lines 16474 16692 +218
===============================================
+ Hits 13524 13722 +198
- Misses 2950 2970 +20
Continue to review full report at Codecov.
|
Review feedback: 1. Optimized case where skey is null 2. Rephrased nested ternary if.
Review: 3. Construct mutable output view only if required.
Full disclosure: I should probably also implement the I'll follow the advice here for that as well. |
Fix null-mask for non-null list rows containing null elements.
Further correction for null_mask
Further correction for null_mask
Added support for skey columns (instead of just scalars).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Seems odd to have an API that only works on 1 level lists. Just for the sake of completeness, I'd've tried to make this generic.
- The only requirement for supported types seems to be an equality operator. This can work for chrono types. Not sure why they're implicitly left out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Hey, thanks for the review, @karthikeyann. I have rebased things. Adding the |
(Argh. @karthikeyann, please ignore the request for re-review. That was by accident.) |
@gpucibot merge |
Adds JNI and Java side bindings for `list_contains` that is being added as part of #7039. Authors: - Kuhu Shukla (@kuhushukla) Approvers: - Robert (Bobby) Evans (@revans2) - MithunR (@mythrocks) URL: #7125
`lists::contains()` (introduced in rapidsai#7039) returns a `BOOL8` column, indicating whether the specified search_key(s) exist at all in each corresponding list row of an input LIST column. It does not return the actual position. This commit introduces `lists::index_of()`, to return the INT32 positions of the specified search_key(s) in a LIST column. The search keys may be searched for using either `FIND_FIRST` (which finds the position of the first occurrence), or `FIND_LAST` (which finds the last occurrence). Both column_view and scalar search keys are supported. As with `lists::contains()`, nested types are not supported as search keys is `lists::index_of()`. If the search_key cannot be found, that output row is set to `-1`. Additionally, the row `output[i]` is set to null if: 1. The search_key(scalar) or search_keys[i](column_view) is null. 2. The list row `lists[i]` is null In all other cases, `output[i]` should contain a non-negative value.
`lists::contains()` (introduced in rapidsai#7039) returns a `BOOL8` column, indicating whether the specified search_key(s) exist at all in each corresponding list row of an input LIST column. It does not return the actual position. This commit introduces `lists::index_of()`, to return the INT32 positions of the specified search_key(s) in a LIST column. The search keys may be searched for using either `FIND_FIRST` (which finds the position of the first occurrence), or `FIND_LAST` (which finds the last occurrence). Both column_view and scalar search keys are supported. As with `lists::contains()`, nested types are not supported as search keys is `lists::index_of()`. If the search_key cannot be found, that output row is set to `-1`. Additionally, the row `output[i]` is set to null if: 1. The search_key(scalar) or search_keys[i](column_view) is null. 2. The list row `lists[i]` is null In all other cases, `output[i]` should contain a non-negative value.
`lists::contains()` (introduced in rapidsai#7039) returns a `BOOL8` column, indicating whether the specified search_key(s) exist at all in each corresponding list row of an input LIST column. It does not return the actual position. This commit introduces `lists::index_of()`, to return the INT32 positions of the specified search_key(s) in a LIST column. The search keys may be searched for using either `FIND_FIRST` (which finds the position of the first occurrence), or `FIND_LAST` (which finds the last occurrence). Both column_view and scalar search keys are supported. As with `lists::contains()`, nested types are not supported as search keys is `lists::index_of()`. If the search_key cannot be found, that output row is set to `-1`. Additionally, the row `output[i]` is set to null if: 1. The search_key(scalar) or search_keys[i](column_view) is null. 2. The list row `lists[i]` is null In all other cases, `output[i]` should contain a non-negative value.
`lists::contains()` (introduced in rapidsai#7039) returns a `BOOL8` column, indicating whether the specified search_key(s) exist at all in each corresponding list row of an input LIST column. It does not return the actual position. This commit introduces `lists::index_of()`, to return the INT32 positions of the specified search_key(s) in a LIST column. The search keys may be searched for using either `FIND_FIRST` (which finds the position of the first occurrence), or `FIND_LAST` (which finds the last occurrence). Both column_view and scalar search keys are supported. As with `lists::contains()`, nested types are not supported as search keys is `lists::index_of()`. If the search_key cannot be found, that output row is set to `-1`. Additionally, the row `output[i]` is set to null if: 1. The search_key(scalar) or search_keys[i](column_view) is null. 2. The list row `lists[i]` is null In all other cases, `output[i]` should contain a non-negative value.
Fixes #9164. ### Prelude `lists::contains()` (introduced in #7039) returns a `BOOL8` column, indicating whether the specified search_key(s) exist at all in each corresponding list row of an input LIST column. It does not return the actual position. ### `index_of()` This commit introduces `lists::index_of()`, to return the INT32 positions of the specified search_key(s) in a LIST column. The search keys may be searched for using either `FIND_FIRST` (which finds the position of the first occurrence), or `FIND_LAST` (which finds the last occurrence). Both column_view and scalar search keys are supported. As with `lists::contains()`, nested types are not supported as search keys in `lists::index_of()`. If the search_key cannot be found, that output row is set to `-1`. Additionally, the row `output[i]` is set to null if: 1. The `search_key`(scalar) or `search_keys[i]`(column_view) is null. 2. The list row `lists[i]` is null In all other cases, `output[i]` should contain a non-negative value. ### Semantic changes for `lists::contains()` This commit also modifies the semantics of `lists::contains()`: it will now return nulls only for the following cases: 1. The `search_key`(scalar) or `search_keys[i]`(column_view) is null. 2. The list row `lists[i]` is null In all other cases, a non-null bool is returned. Specifically `lists::contains()` no longer conforms to SQL semantics of returning `NULL` for list rows that don't contain the search key, while simultaneously containing nulls. In this case, `false` is returned. ### `lists::contains_null_elements()` A new function has been introduced to check if each list row contains null elements. The semantics are similar to `lists::contains()`, in that the column returned is BOOL8 typed: 1. If even 1 element in a list row is null, the returned row is `true`. 2. If no element is null, the returned row is `false`. 3. If the list row is null, the returned row is `null`. 4. If the list row is empty, the returned row is `false`. The current implementation is an inefficient placeholder, to be replaced once (#9588) is available. It is included here to reconstruct the SQL semantics dropped from `lists::contains()`. Authors: - MithunR (https://github.com/mythrocks) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Jason Lowe (https://github.com/jlowe) - Mark Harris (https://github.com/harrism) - Conor Hoekstra (https://github.com/codereport) URL: #9510
Closes #6944.
This commit adds a method (
contains()
) to check whether each row of aLIST
column contains the scalar value specified as an argument. The operation returns aBOOL8
column (with as many rows as the inputLIST
), each row indicatingtrue
if the value is found,false
if not.Output
column[i]
is set to null if even one of the following holds true (in line with the semantics ofarray_contains()
in SQL):skey
is nulllists[i]
is nulllists[i]
contains even one null, andlists[i]
does not contain the search key.This implementation currently supports the operation on lists of numerics or strings.