-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] support min and max group by aggregations and reductions on lists of structs and strings #10408
Comments
Can you explain what ordering would look like on a list of struct? Say I have
Would we compare the first child and then the second child? meaning the order of comparisons is Or would we compare in this order |
It is the second one null == null in sorting elements, but null < non-null If one list is longer than another list, then it shorter list is less than the longer list if and only if all of the elements in the shorter list match those in the longer list up to the length of the shorter list. The code here is for sorting list/arrays in Spark If you want some examples I can some up with a few. |
@devavret I suppose to work on this, if nobody has been assigned. |
@ttnghia please hold off. This requires further discussion. |
This issue has been labeled |
Sill needed |
This issue has been labeled |
Still needed. |
Depends on #11129. |
This issue has been labeled |
…uction (#13676) This adds support for list type in `min` and `max` aggregations in groupby and reduction contexts. Closes #13667 and closes #10408. Status: * [X] Implementation. * [X] Unit tests. * [X] Run `compute-sanitizer`. * [X] Test with spark-rapids (NVIDIA/spark-rapids#8689). Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Divye Gala (https://github.com/divyegala) - MithunR (https://github.com/mythrocks) URL: #13676
Is your feature request related to a problem? Please describe.
This is specifically for NVIDIA/spark-rapids#4929 for a customer query.
Describe the solution you'd like
Group by and reduction aggregations for min/max that support lists of structs and lists of strings. Ideally we could solve ordering in general like #5890 is trying to do. This would then be a follow on PR to reuse the row comparison code for min/max in this case.
If more information is needed like null ordering etc I can provide it.
Describe alternatives you've considered
Just ugly hacks that we should not do.
The text was updated successfully, but these errors were encountered: