Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nth_element for window functions #11158

Merged
merged 9 commits into from
Jul 7, 2022

Conversation

mythrocks
Copy link
Contributor

Fixes #9643.

This is to address spark-rapids/issues/4005 and
spark-rapids/issues/5061.

This change adds support for NTH_ELEMENT window aggregations.
This should allow for the implementation of FIRST(), LAST(),
and NTH_VALUE() window functions in Spark SQL.

NTH_ELEMENT in window function returns the Nth element from the
specified window for each row in a column. N is deemed to be
zero based, so NTH_ELEMENT(0) translates to the first element
in a window. Similarly, NTH_ELEMENT(-1) translates to the last.

If for any window of size W, if the specified N falls outside
the range [ -W, W-1 ], a null element is returned for that row.

@mythrocks mythrocks requested a review from a team as a code owner June 27, 2022 23:24
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Jun 27, 2022
@mythrocks mythrocks self-assigned this Jun 27, 2022
@mythrocks mythrocks added feature request New feature or request non-breaking Non-breaking change labels Jun 27, 2022
This is to address spark-rapids/issues/4005 and
spark-rapids/issues/5061.

This change adds support for `NTH_ELEMENT` window aggregations.
This should allow for the implementation of `FIRST()`, `LAST()`,
and `NTH_VALUE()` window functions in Spark SQL.

`NTH_ELEMENT` in window function returns the Nth element from the
specified window for each row in a column. `N` is deemed to be
zero based, so `NTH_ELEMENT(0)` translates to the first element
in a window. Similarly, `NTH_ELEMENT(-1)` translates to the last.

If for any window of size `W`, if the specified `N` falls outside
the range `[ -W, W-1 ]`, a null element is returned for that row.
@codecov
Copy link

codecov bot commented Jun 28, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@b5a59cf). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.08   #11158   +/-   ##
===============================================
  Coverage                ?   86.30%           
===============================================
  Files                   ?      144           
  Lines                   ?    22698           
  Branches                ?        0           
===============================================
  Hits                    ?    19589           
  Misses                  ?     3109           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5a59cf...53ef97f. Read the comment docs.

cpp/src/rolling/rolling.cu Outdated Show resolved Hide resolved
@davidwendt
Copy link
Contributor

Looks like this more than doubles the compile time for rolling.cu.
https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-cpu-cuda-build/CUDA=11.5/10413/Build_20Metrics_20Report/
Could this be broken up into separate source files perhaps?

@mythrocks
Copy link
Contributor Author

Could this be broken up into separate source files perhaps?

The snag is that this is a function template, whose type parameters are transform iterators. I'll take a crack at this.

@mythrocks mythrocks requested a review from a team as a code owner July 1, 2022 23:50
@mythrocks mythrocks requested a review from ttnghia July 5, 2022 19:51
mythrocks added a commit to mythrocks/cudf that referenced this pull request Jul 5, 2022
This commit adds the JNI bindings required to invoke `NTH_ELEMENT`
aggregations as a window aggregation. Java tests are included.

Depends on rapidsai#11158.
@mythrocks
Copy link
Contributor Author

@davidwendt, @ttnghia, thank you for the prompt reviews. I think I've addressed your concerns here.
I'll give it a moment to rerun tests.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake LGTM.

I was a bit taken aback by the _detail named files, but it looks like that's not unique to this PR. We should probably be more consistent about using a detail/ directory, even in src/, rather than putting it in the filename. But out of scope for this PR.

1. Using #pragma once.
2. Doxygen format.
3. Renamed gather map functor.
4. Materialized gather indices.
@mythrocks
Copy link
Contributor Author

mythrocks commented Jul 6, 2022

CMake LGTM.

I was a bit taken aback by the _detail named files, but it looks like that's not unique to this PR. We should probably be more consistent about using a detail/ directory, even in src/, rather than putting it in the filename. But out of scope for this PR.

I started with making this change in this PR, but I now agree that this is peripheral to the current change. I've filed #11211 to track this.

To clarify: The changes required here would end up moving some of the files with substantial changes, and distract from the main change here. I'll address this after the current change has been merged.

Copy link
Contributor

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more minor issue. Otherwise it's good.

Also, switched to use gathered.front().
@mythrocks
Copy link
Contributor Author

Thank you for the reviews, @davidwendt, @ttnghia, @vyasr. I'll merge this now.

@mythrocks
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit cdd4e03 into rapidsai:branch-22.08 Jul 7, 2022
rapids-bot bot pushed a commit that referenced this pull request Jul 8, 2022
Depends on #11158.

This commit adds the JNI bindings required to invoke `NTH_ELEMENT` aggregations as a window aggregation. Java tests are included.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Ryan Lee (https://github.com/rwlee)
  - Raza Jafri (https://github.com/razajafri)

URL: #11201
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support rolling_aggregation for NTH_ELEMENT
4 participants