You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the CUDF Java API, one sees that WindowAggregate::count() translates to COUNT_ALL, with no option to use COUNT_VALID.
While COUNT_ALL has its place, it produces incorrect results when counting a specific column/expression that might contain nulls. This appears to be the cause of NVIDIA/spark-rapids#218.
One option would be to revert f73c600, and switch back to COUNT_VALID. (I am not entirely certain that this will suffice.)
Another would be to introduce a count_valid() operation in WindowAggregate, and consider deprecating WindowAggregate::count() in favour of a new WindowAggregate::count_all(). The choice of call can be left to the caller (e.g. spark-rapids):
SELECT COUNT(*) OVER (...) => count_all()
SELECT COUNT(col) OVER (...) => count_valid()
The text was updated successfully, but these errors were encountered:
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
In the CUDF Java API, one sees that
WindowAggregate::count()
translates toCOUNT_ALL
, with no option to useCOUNT_VALID
.While
COUNT_ALL
has its place, it produces incorrect results when counting a specific column/expression that might contain nulls. This appears to be the cause of NVIDIA/spark-rapids#218.One option would be to revert f73c600, and switch back to
COUNT_VALID
. (I am not entirely certain that this will suffice.)Another would be to introduce a
count_valid()
operation inWindowAggregate
, and consider deprecatingWindowAggregate::count()
in favour of a newWindowAggregate::count_all()
. The choice of call can be left to the caller (e.g.spark-rapids
):SELECT COUNT(*) OVER (...)
=>count_all()
SELECT COUNT(col) OVER (...)
=>count_valid()
The text was updated successfully, but these errors were encountered: