Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] WindowAggregate::count() assumes COUNT_ALL instead of COUNT_VALID #6156

Closed
mythrocks opened this issue Sep 4, 2020 · 2 comments
Closed
Assignees
Labels
bug Something isn't working Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS

Comments

@mythrocks
Copy link
Contributor

In the CUDF Java API, one sees that WindowAggregate::count() translates to COUNT_ALL, with no option to use COUNT_VALID.
While COUNT_ALL has its place, it produces incorrect results when counting a specific column/expression that might contain nulls. This appears to be the cause of NVIDIA/spark-rapids#218.

One option would be to revert f73c600, and switch back to COUNT_VALID. (I am not entirely certain that this will suffice.)
Another would be to introduce a count_valid() operation in WindowAggregate, and consider deprecating WindowAggregate::count() in favour of a new WindowAggregate::count_all(). The choice of call can be left to the caller (e.g. spark-rapids):

  1. SELECT COUNT(*) OVER (...) => count_all()
  2. SELECT COUNT(col) OVER (...) => count_valid()
@mythrocks mythrocks added bug Something isn't working Needs Triage Need team to review and classify Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS labels Sep 4, 2020
@mythrocks mythrocks self-assigned this Sep 4, 2020
@mythrocks mythrocks removed the Needs Triage Need team to review and classify label Sep 4, 2020
@mythrocks mythrocks added the Needs Triage Need team to review and classify label Sep 4, 2020
@kkraus14 kkraus14 removed the Needs Triage Need team to review and classify label Sep 15, 2020
@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@mythrocks
Copy link
Contributor Author

Resolved by #6287

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

No branches or pull requests

2 participants