Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for collect_list Spark aggregate function #9231
Add support for collect_list Spark aggregate function #9231
Changes from all commits
7870dd9
b89d2ee
c4dd224
36a7595
0f2b0d6
48edcdd
3b0a10a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any particular reasons companion function testing is not included as part of testAggregations? testAggregationsWithCompanion calls appear too verbose and repetitive.
CC: @kagamiori
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liujiayi771 Would you take a look at this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova I think the possible reason is that some aggregate functions have not registered the companion functions due to certain restrictions, such as when
isResultTypeResolvableGivenIntermediateType
is false.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I don't understand is why we need to pass
[](auto& /*builder*/) {},
and{{BIGINT()}},
to testAggregationsWithCompanion and why do we need to call both testAggregations and testAggregationsWithCompanion.Why can't we just call
and have it test both regular functions as well as companion functions.
CC: @kagamiori
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is better, and the config parameter is also not necessary. Right now, many tests are calling
testAggregations
followed bytestAggregationsWithCompanion
. We need to combine these two test functions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this refactoring in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova I think this function should not be compared with DuckDB. If the fuzzer generates a group where all the data is null, DuckDB's result will be null, while Spark will return an empty array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We need to change Fuzzer to verify results against Spark, not DuckDB: #9270