Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[window] Disable GPU for COUNT(exp) queries #666

Merged
merged 1 commit into from
Sep 5, 2020

Conversation

mythrocks
Copy link
Collaborator

@mythrocks mythrocks commented Sep 3, 2020

Mitigates #218.

GpuWindowExec currently counts null-rows when running COUNT(col)
(or generally COUNT(expr)) window queries, owing to a bug in CUDF/Java.
Left unchecked, this will produce incorrect results for said queries.

This commit disables GPU acceleration for COUNT(expr) queries, while
retaining support for COUNT(1) and COUNT(*).

This may be reverted once we have a fix in CUDF/Java.

Please note that there is already an XFAIL test that checks this condition, in window_function_test.py

Signed-off-by: Mithun RK mythrocks@gmail.com

GpuWindowExec currently counts null-rows when running COUNT(col)
(or generally COUNT(expr)) window queries, owing to a bug in CUDF/Java.
Left unchecked, this will produce incorrect results for said queries.

This commit disables GPU acceleration for COUNT(expr) queries, while
retaining support for COUNT(1) and COUNT(*).

This may be reverted once we have a fix in CUDF/Java.

Signed-off-by: Mithun RK <mythrocks@gmail.com>
@mythrocks mythrocks self-assigned this Sep 3, 2020
@mythrocks mythrocks added the bug Something isn't working label Sep 3, 2020
@mythrocks mythrocks added this to the Aug 31 - Sep 11 milestone Sep 3, 2020
Copy link
Collaborator

@kuhushukla kuhushukla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link the cudf bug issue to this PR? Approving as changes look good to me.

@mythrocks
Copy link
Collaborator Author

mythrocks commented Sep 4, 2020

Can we link the cudf bug issue to this PR? Approving as changes look good to me.

Linking the bug will close it when this PR is merged. (Please correct me if that's wrong.)

I'd like to keep it open, and actually fix it in 0.3, after pushing a fix to cudf/java.

@kuhushukla
Copy link
Collaborator

Oh, I didn't know that cudf issues would close automatically as well if a PR in this repo went in! Thought it was only the rapids plugin issues. I stand corrected.

@mythrocks
Copy link
Collaborator Author

Oh, I didn't know that cudf issues would close automatically as well if a PR in this repo went in! Thought it was only the rapids plugin issues. I stand corrected.

D'oh! Wait, maybe not. I'll raise the CUDF bug and leave a link here.

@mythrocks mythrocks linked an issue Sep 4, 2020 that may be closed by this pull request
@mythrocks
Copy link
Collaborator Author

Can we link the cudf bug issue to this PR? Approving as changes look good to me.

rapidsai/cudf#6156 seems to be the cause. The libcudf implementation seems fine. The COUNT operation is lost in translation, in the Java layer. #218 has been updated to reflect this.

@revans2
Copy link
Collaborator

revans2 commented Sep 4, 2020

For it to be closed you have to say something like fixes hash-bug-number or closes hash-bug-number

@revans2
Copy link
Collaborator

revans2 commented Sep 4, 2020

build

@mythrocks
Copy link
Collaborator Author

For it to be closed you have to say something like fixes hash-bug-number or closes hash-bug-number

The text under "Linked issues" on the right pane seems to suggest that the issues listed there "may" be closed.

In any case, I can't seem to manually link CUDF issues or indeed anything outside of spark-rapids there. 🤷 Oh, well.

@mythrocks mythrocks merged commit 5831400 into NVIDIA:branch-0.2 Sep 5, 2020
@mythrocks mythrocks deleted the window-count-nulls branch September 5, 2020 02:07
@mythrocks
Copy link
Collaborator Author

Thanks for the reviews, all.

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
GpuWindowExec currently counts null-rows when running COUNT(col)
(or generally COUNT(expr)) window queries, owing to a bug in CUDF/Java.
Left unchecked, this will produce incorrect results for said queries.

This commit disables GPU acceleration for COUNT(expr) queries, while
retaining support for COUNT(1) and COUNT(*).

This may be reverted once we have a fix in CUDF/Java.

Signed-off-by: Mithun RK <mythrocks@gmail.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
GpuWindowExec currently counts null-rows when running COUNT(col)
(or generally COUNT(expr)) window queries, owing to a bug in CUDF/Java.
Left unchecked, this will produce incorrect results for said queries.

This commit disables GPU acceleration for COUNT(expr) queries, while
retaining support for COUNT(1) and COUNT(*).

This may be reverted once we have a fix in CUDF/Java.

Signed-off-by: Mithun RK <mythrocks@gmail.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#666)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Window function COUNT(x) includes null-values, when it shouldn't
3 participants