Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JNI changes for range-extents in window functions. #13199

Merged

Conversation

mythrocks
Copy link
Contributor

This commit adds back-end JNI changes to support explicit range-extents for ranged window functions. The change is analogous to the addition of cudf::range_window_bounds::extent_type, in libcudf. This change will allow for STRING order-by columns for range window functions. It is required because without it, it would be impossible to differentiate between UNBOUNDED and CURRENT ROW for a window function over a STRING order-by column.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

This commit adds back-end JNI changes to support explicit range-extents for ranged window functions.
The change is analogous to the addition of `cudf::range_window_bounds::extent_type`, in `libcudf`.
This change will allow for `STRING` order-by columns for range window functions. It is required because
without it, it would be impossible to differentiate between `UNBOUNDED` and `CURRENT ROW` for a window
function over a `STRING` order-by column.
@mythrocks mythrocks added feature request New feature or request Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Apr 21, 2023
@mythrocks mythrocks self-assigned this Apr 21, 2023
@mythrocks mythrocks requested a review from a team as a code owner April 21, 2023 19:03
@github-actions github-actions bot added the Java Affects Java cuDF API. label Apr 21, 2023
@mythrocks
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 181b946 into rapidsai:branch-23.06 Apr 24, 2023
mythrocks added a commit to NVIDIA/spark-rapids that referenced this pull request Apr 27, 2023
Closes #7883.
Depends on rapidsai/cudf#13143, rapidsai/cudf#13199.

This commit adds support for `STRING` order-by columns for RANGE window functions.

Before this commit, only numeric and timestamp types were supported as order-by columns in window specifications. However, it is possible to specify window frames such as follows:
```sql
SELECT COUNT(1) OVER( PARTITION BY gid ORDER BY str_col )
```
The implicit range here is `UNBOUNDED PRECEDING AND CURRENT ROW`, although explicit bounds may also be specified.
Note that range values cannot be specified here, because `STRING` does not support intervals.

This change should now allow the plugin to support `UNBOUNDED PRECEDING`, `UNBOUNDED FOLLOWING`, and `CURRENT ROW` as range window bounds, when the order-by column is `STRING`.

Signed-off-by: MithunR <mythrocks@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Java Affects Java cuDF API. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants