-
Notifications
You must be signed in to change notification settings - Fork 891
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support contains() on lists of primitives (#7039)
Closes #6944. This commit adds a method (`contains()`) to check whether each row of a `LIST` column contains the scalar value specified as an argument. The operation returns a `BOOL8` column (with as many rows as the input `LIST`), each row indicating `true` if the value is found, `false` if not. Output `column[i]` is set to null if even one of the following holds true (in line with the semantics of `array_contains()` in SQL): 1. The search key `skey` is null 2. The list row `lists[i]` is null 3. The list row `lists[i]` contains even *one* null, *and* `lists[i]` does not contain the search key. This implementation currently supports the operation on lists of numerics or strings. Authors: - MithunR (@mythrocks) Approvers: - AJ Schmidt (@ajschmidt8) - Mark Harris (@harrism) - David (@davidwendt) - Karthikeyan (@karthikeyann) URL: #7039
- Loading branch information
Showing
9 changed files
with
988 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
/* | ||
* Copyright (c) 2021, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
#pragma once | ||
|
||
#include <cudf/column/column.hpp> | ||
#include <cudf/lists/lists_column_view.hpp> | ||
|
||
namespace cudf { | ||
namespace lists { | ||
/** | ||
* @addtogroup lists_contains | ||
* @{ | ||
* @file | ||
*/ | ||
|
||
/** | ||
* @brief Create a column of bool values indicating whether the specified scalar | ||
* is an element of each row of a list column. | ||
* | ||
* The output column has as many elements as the input `lists` column. | ||
* Output `column[i]` is set to true if the lists row `lists[i]` contains the value | ||
* specified in `search_key`. Otherwise, it is set to false. | ||
* | ||
* Output `column[i]` is set to null if one or more of the following are true: | ||
* 1. The search key `search_key` is null | ||
* 2. The list row `lists[i]` is null | ||
* 3. The list row `lists[i]` does not contain the search key, and contains at least | ||
* one null. | ||
* | ||
* @param lists Lists column whose `n` rows are to be searched | ||
* @param search_key The scalar key to be looked up in each list row | ||
* @param mr Device memory resource used to allocate the returned column's device memory. | ||
* @return std::unique_ptr<column> BOOL8 column of `n` rows with the result of the lookup | ||
*/ | ||
std::unique_ptr<column> contains( | ||
cudf::lists_column_view const& lists, | ||
cudf::scalar const& search_key, | ||
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()); | ||
|
||
/** | ||
* @brief Create a column of bool values indicating whether the list rows of the first | ||
* column contain the corresponding values in the second column | ||
* | ||
* The output column has as many elements as the input `lists` column. | ||
* Output `column[i]` is set to true if the lists row `lists[i]` contains the value | ||
* in `search_keys[i]`. Otherwise, it is set to false. | ||
* | ||
* Output `column[i]` is set to null if one or more of the following are true: | ||
* 1. The row `search_keys[i]` is null | ||
* 2. The list row `lists[i]` is null | ||
* 3. The list row `lists[i]` does not contain the `search_keys[i]`, and contains at least | ||
* one null. | ||
* | ||
* @param lists Lists column whose `n` rows are to be searched | ||
* @param search_keys Column of elements to be looked up in each list row | ||
* @param mr Device memory resource used to allocate the returned column's device memory. | ||
* @return std::unique_ptr<column> BOOL8 column of `n` rows with the result of the lookup | ||
*/ | ||
std::unique_ptr<column> contains( | ||
cudf::lists_column_view const& lists, | ||
cudf::column_view const& search_keys, | ||
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()); | ||
|
||
/** @} */ // end of group | ||
} // namespace lists | ||
} // namespace cudf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.