Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partial and final only hash aggregate tests and fix nulls corner case for Average #157

Merged
merged 7 commits into from
Jun 12, 2020

Conversation

kuhushukla
Copy link
Collaborator

@kuhushukla kuhushukla commented Jun 11, 2020

Please write a description in this text box of the changes that are being made.

This PR fixes #110 and #154
and adds tests for final and partial modes for hash aggregate tests. It highlights a couple of failures/bugs caught during testing (marked xfail with comments)uses GpuNvl/GpuCoalesce to stop averages to pass down nulls to the CPU. Additionally contains minor nits to the python test file for hash aggregates.

@kuhushukla kuhushukla added bug Something isn't working SQL part of the SQL/Dataframe plugin labels Jun 11, 2020
@kuhushukla kuhushukla self-assigned this Jun 11, 2020
@kuhushukla kuhushukla changed the title [WIP] Add partial and final only hash aggregate tests and fix nulls corner case for Average Add partial and final only hash aggregate tests and fix nulls corner case for Average Jun 12, 2020
}

override def doColumnar(lhs: Scalar, rhs: GpuColumnVector): GpuColumnVector = {
throw new IllegalStateException("Should not be used with lhs as scalar")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to explore this and see if we need to add this to GpuOverrides as a follow on since it is even more outside the scope of this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvl is a Coalesce that only takes 2 args. the nvl function gets turned into a Coalesce, which we turn into a GpuCoalesce. We should probably document that her to make it clear what is happening, or we should just sue GpuCoalesce directly instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that pointer as my Nvl knowledge is limited. I will try out GpuCoalesce.

@kuhushukla
Copy link
Collaborator Author

build

@kuhushukla kuhushukla requested a review from revans2 June 12, 2020 15:26
}

override def doColumnar(lhs: Scalar, rhs: GpuColumnVector): GpuColumnVector = {
throw new IllegalStateException("Should not be used with lhs as scalar")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvl is a Coalesce that only takes 2 args. the nvl function gets turned into a Coalesce, which we turn into a GpuCoalesce. We should probably document that her to make it clear what is happening, or we should just sue GpuCoalesce directly instead.

integration_tests/src/main/python/hash_aggregate_test.py Outdated Show resolved Hide resolved
@revans2
Copy link
Collaborator

revans2 commented Jun 12, 2020

The tests are getting really complicated with the parameters, etc, and at this point I would almost rather see more tests, that look like near duplicates of each other, just so it is more readable.

@kuhushukla
Copy link
Collaborator Author

The tests are getting really complicated with the parameters, etc, and at this point I would almost rather see more tests, that look like near duplicates of each other, just so it is more readable.

That makes sense. I will try and make it a bit more easy to grok.

@kuhushukla
Copy link
Collaborator Author

kuhushukla commented Jun 12, 2020

@revans2 , I have done the following to address:

The tests are getting really complicated with the parameters, etc

I removed get_params for data_gen markers and added incompat and approximate_float markers at test level. This significantly improves the readability IMHO. I have used get_params for confs, however, to avoid rewriting tests as the markers list is 2 elements long. Hope that seems acceptable. Additionally I made the partial and final only confs declarations a bit more concise.

To address:

or we should just sue GpuCoalesce directly instead.

I replaced GpuNvl with this and it works.

Thanks for the reviews and inputs. Appreciate it.

@kuhushukla kuhushukla linked an issue Jun 12, 2020 that may be closed by this pull request
@revans2
Copy link
Collaborator

revans2 commented Jun 12, 2020

build

@kuhushukla kuhushukla merged commit 8c27f70 into NVIDIA:branch-0.1 Jun 12, 2020
@kuhushukla kuhushukla added this to the Release 0.1 milestone Jun 12, 2020
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…case for Average (NVIDIA#157)

* make config passing configurable in hash aggregate tests

Co-authored-by: Kuhu Shukla <kuhus@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…case for Average (NVIDIA#157)

* make config passing configurable in hash aggregate tests

Co-authored-by: Kuhu Shukla <kuhus@nvidia.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SQL part of the SQL/Dataframe plugin
Projects
None yet
2 participants