-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement groupby collect_set #7420
Conversation
Is this only for |
Hi Jake. I just started to work on this thus things are still unclear to put in. I'll update more details for the description along the way. |
Might I suggest a change to the title and description of this issue? Let's just tackle renaming We can address |
Also, if this is a work in progress, it would be best to switch this to a draft PR rather than use |
Changing |
… This paves the way for the upcomming aggregation::Kind::COLLECT_SET.
Rerun tests. |
1 similar comment
Rerun tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmake / conda lgtm, will let @shwina review the Python side 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java approval
Rerun tests. |
2 similar comments
Rerun tests. |
Rerun tests. |
# Conflicts: # cpp/src/groupby/sort/aggregate.cpp
Rerun tests. |
2 similar comments
Rerun tests. |
Rerun tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cython LGTM!
@gpucibot merge |
Thanks all 😃 |
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7420 +/- ##
===============================================
+ Coverage 81.86% 82.47% +0.61%
===============================================
Files 101 101
Lines 16884 17402 +518
===============================================
+ Hits 13822 14353 +531
+ Misses 3062 3049 -13
Continue to review full report at Codecov.
|
This partially addresses #2973.
This PR implements groupby
collect_set
aggregation. The idea of this PR is to simply applydrop_list_duplicates
(#7528) to the result generated by groupbycollect_list
, obtaining collect lists without duplicate entries.Examples:
In this PR, a simple, incomplete Python binding for
collect_set
has been added, and no Java binding is implemented yet. Complete bindings for those Python/Java sides need to be implemented later in some other separate PRs.