Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] pyspark integration tests basic aggregate tests #107

Closed
revans2 opened this issue Jun 2, 2020 · 1 comment
Closed

[FEA] pyspark integration tests basic aggregate tests #107

revans2 opened this issue Jun 2, 2020 · 1 comment
Assignees
Labels
feature request New feature or request SQL part of the SQL/Dataframe plugin test Only impacts tests

Comments

@revans2
Copy link
Collaborator

revans2 commented Jun 2, 2020

Is your feature request related to a problem? Please describe.
We want a comprehensive set of tests for various aggregation operations across all of the supported types.

count - which we need to try both with null filtering and without, and with distinct and without
min, max - which should be bit for bit the same
sum, average - which has some leeway on floating point operations, but not for the others
first, last - which might be a little difficult because they are not guaranteed to return the same value, because of how shuffle works. We might need to have some specially crafted data sets that we can use to check for consistent answers. or operations to make this test work well in any condition, like pre-partition the data on the aggregation key.

We should also test these without a group-by so we get a single result for the entire data set.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify SQL part of the SQL/Dataframe plugin test Only impacts tests labels Jun 2, 2020
@revans2 revans2 removed the ? - Needs Triage Need team to review and classify label Jun 3, 2020
@kuhushukla
Copy link
Collaborator

Resolved via commit sha = 10922c7

@revans2 revans2 added this to the Release 0.1 milestone Jun 12, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request SQL part of the SQL/Dataframe plugin test Only impacts tests
Projects
None yet
Development

No branches or pull requests

2 participants