Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support map_filter operator #5436

Merged
merged 3 commits into from
May 14, 2022
Merged

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented May 7, 2022

Closes #5273

Support map_filter operator.

map_filter is used to filter entries in a map.

The usage of map_filter is:

map_filter(map_column_expession, (key, value) -> prediate(key, value))

e.g.:

>>> df = spark.sql("SELECT map_from_arrays(array(1, 2, 3), array(1, 2, 3)) as map")
>>> df.show(truncate = False)
+------------------------+
|map                     |
+------------------------+
|{1 -> 1, 2 -> 2, 3 -> 3}|
+------------------------+

>>> df.selectExpr("map_filter(map, (key, value) -> key + value > 5) as ret").show(truncate = False)
+--------+
|ret     |
+--------+
|{3 -> 3}|
+--------+

This implementation is:

  1. apply the predicate function in map_filter to get a plain boolean column
  2. convert the plain boolean column to a list-of-boolean column according to the offsets column of inputted map column
  3. invoke cuDF apply_boolean_mask method to filter, Refer to Segmented apply_boolean_mask for LIST columns rapidsai/cudf#10773

Signed-off-by: Chong Gao res_life@163.com

Signed-off-by: Chong Gao <res_life@163.com>
@res-life
Copy link
Collaborator Author

res-life commented May 7, 2022

Depending on JNI PR rapidsai/cudf#10812 to merge.

Copy link
Collaborator

@firestarman firestarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me


// set parameter 'valueContainsNull' to the argument's `valueContainsNull`
override def dataType: DataType = MapType(keyType, valueType, valueContainsNull)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
override def dataType: DataType = MapType(keyType, valueType, valueContainsNull)
override def dataType: DataType = argument.dataType

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

revans2
revans2 previously approved these changes May 9, 2022
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the one nit by @firestarman looks really good.

@sameerz sameerz added the feature request New feature or request label May 10, 2022
@res-life
Copy link
Collaborator Author

build

@res-life res-life merged commit b865ecb into NVIDIA:branch-22.06 May 14, 2022
@res-life res-life deleted the map-filter branch May 14, 2022 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support map_filter
4 participants