Add partial and final only hash aggregate tests and fix nulls corner case for Average #157

kuhushukla · 2020-06-11T22:28:49Z

Please write a description in this text box of the changes that are being made.

This PR fixes #110 and #154
and adds tests for final and partial modes for hash aggregate tests. It highlights a couple of failures/bugs caught during testing (marked xfail with comments)uses GpuNvl/GpuCoalesce to stop averages to pass down nulls to the CPU. Additionally contains minor nits to the python test file for hash aggregates.

kuhushukla · 2020-06-12T13:15:41Z

sql-plugin/src/main/scala/ai/rapids/spark/nullExpressions.scala

+  }
+
+  override def doColumnar(lhs: Scalar, rhs: GpuColumnVector): GpuColumnVector = {
+    throw new IllegalStateException("Should not be used with lhs as scalar")


I would like to explore this and see if we need to add this to GpuOverrides as a follow on since it is even more outside the scope of this PR.

Nvl is a Coalesce that only takes 2 args. the nvl function gets turned into a Coalesce, which we turn into a GpuCoalesce. We should probably document that her to make it clear what is happening, or we should just sue GpuCoalesce directly instead.

Thanks for that pointer as my Nvl knowledge is limited. I will try out GpuCoalesce.

kuhushukla · 2020-06-12T15:08:56Z

build

revans2 · 2020-06-12T15:39:44Z

sql-plugin/src/main/scala/ai/rapids/spark/nullExpressions.scala

+  }
+
+  override def doColumnar(lhs: Scalar, rhs: GpuColumnVector): GpuColumnVector = {
+    throw new IllegalStateException("Should not be used with lhs as scalar")


Nvl is a Coalesce that only takes 2 args. the nvl function gets turned into a Coalesce, which we turn into a GpuCoalesce. We should probably document that her to make it clear what is happening, or we should just sue GpuCoalesce directly instead.

integration_tests/src/main/python/hash_aggregate_test.py

revans2 · 2020-06-12T15:48:59Z

The tests are getting really complicated with the parameters, etc, and at this point I would almost rather see more tests, that look like near duplicates of each other, just so it is more readable.

kuhushukla · 2020-06-12T16:06:24Z

The tests are getting really complicated with the parameters, etc, and at this point I would almost rather see more tests, that look like near duplicates of each other, just so it is more readable.

That makes sense. I will try and make it a bit more easy to grok.

kuhushukla · 2020-06-12T17:30:11Z

@revans2 , I have done the following to address:

The tests are getting really complicated with the parameters, etc

I removed get_params for data_gen markers and added incompat and approximate_float markers at test level. This significantly improves the readability IMHO. I have used get_params for confs, however, to avoid rewriting tests as the markers list is 2 elements long. Hope that seems acceptable. Additionally I made the partial and final only confs declarations a bit more concise.

To address:

or we should just sue GpuCoalesce directly instead.

I replaced GpuNvl with this and it works.

Thanks for the reviews and inputs. Appreciate it.

revans2 · 2020-06-12T18:06:37Z

build

…case for Average (NVIDIA#157) * make config passing configurable in hash aggregate tests Co-authored-by: Kuhu Shukla <kuhus@nvidia.com>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

kuhushukla added 6 commits June 11, 2020 20:03

make config passing configurable in hash aggregate tests

890d639

Avg with nvl

e113093

add tests with issues

1cacb44

cleanup 1

e932342

filter xfail

679295f

nit

b1c0d92

kuhushukla added bug Something isn't working SQL part of the SQL/Dataframe plugin labels Jun 11, 2020

kuhushukla self-assigned this Jun 11, 2020

kuhushukla changed the title ~~[WIP] Add partial and final only hash aggregate tests and fix nulls corner case for Average~~ Add partial and final only hash aggregate tests and fix nulls corner case for Average Jun 12, 2020

kuhushukla commented Jun 12, 2020

View reviewed changes

kuhushukla requested a review from revans2 June 12, 2020 15:26

revans2 reviewed Jun 12, 2020

View reviewed changes

Address review comments

1e037e6

kuhushukla linked an issue Jun 12, 2020 that may be closed by this pull request

[BUG] Incorrect output from partial-only averages with nulls #154

Closed

revans2 approved these changes Jun 12, 2020

View reviewed changes

kuhushukla merged commit 8c27f70 into NVIDIA:branch-0.1 Jun 12, 2020

kuhushukla added this to the Release 0.1 milestone Jun 12, 2020

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023

Update submodule cudf to adec535 (NVIDIA#157)

2036bc5

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add partial and final only hash aggregate tests and fix nulls corner case for Average #157

Add partial and final only hash aggregate tests and fix nulls corner case for Average #157

kuhushukla commented Jun 11, 2020 •

edited

Loading

kuhushukla Jun 12, 2020

revans2 Jun 12, 2020

kuhushukla Jun 12, 2020

kuhushukla commented Jun 12, 2020

revans2 Jun 12, 2020

revans2 commented Jun 12, 2020

kuhushukla commented Jun 12, 2020

kuhushukla commented Jun 12, 2020 •

edited

Loading

revans2 commented Jun 12, 2020

Add partial and final only hash aggregate tests and fix nulls corner case for Average #157

Add partial and final only hash aggregate tests and fix nulls corner case for Average #157

Conversation

kuhushukla commented Jun 11, 2020 • edited Loading

kuhushukla Jun 12, 2020

Choose a reason for hiding this comment

revans2 Jun 12, 2020

Choose a reason for hiding this comment

kuhushukla Jun 12, 2020

Choose a reason for hiding this comment

kuhushukla commented Jun 12, 2020

revans2 Jun 12, 2020

Choose a reason for hiding this comment

revans2 commented Jun 12, 2020

kuhushukla commented Jun 12, 2020

kuhushukla commented Jun 12, 2020 • edited Loading

revans2 commented Jun 12, 2020

kuhushukla commented Jun 11, 2020 •

edited

Loading

kuhushukla commented Jun 12, 2020 •

edited

Loading