Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized #181

mythrocks · 2020-06-15T23:53:19Z

Removed INCOMPAT flag for FP Normalization functions.

kuhushukla

Is it worth considering removing the has nan check in hash aggregate at this time?

revans2 · 2020-06-16T13:47:34Z

build

revans2 · 2020-06-16T13:48:56Z

I wish we had a way to flag if a test was marked as incompat, but didn't need it. The pyspark has aggregate tests also have incompat on several of them for this reason.

mythrocks · 2020-06-16T16:56:04Z

I wish we had a way to flag if a test was marked as incompat...

I should try and remove @incompat for the obvious ones. I'll update here.

(Removed incompat in Python integration test for HashAggregateExec, where obvious.)

…zeroes

mythrocks · 2020-06-16T17:56:28Z

I have removed the @incompat for the NaN/Zero test cases added in #160.

I tried going through and removing the @allowed_on_cpu for other tests in hash_aggregate_test.py, but those are more involved. (The NOT_ON_GPU results from other reasons.)

revans2 · 2020-06-16T18:23:50Z

buil

revans2 · 2020-06-16T18:23:56Z

build

…zeroes

mythrocks · 2020-06-16T20:44:32Z

Is it worth considering removing the has nan check in hash aggregate at this time?

@kuhushukla, I'm looking into this now.

Removed obviated check for NaNs from GpuHashAggregateMeta

mythrocks · 2020-06-16T20:55:24Z

@kuhushukla, I have now removed the NaNs check from GpuHashAggregateMeta. HashAggregateSuite and hash_aggregate_test.py pass.

That's the only change in the last commit (0a241a8). Could you please confirm that I haven't missed anything?

mythrocks · 2020-06-16T20:55:31Z

build

kuhushukla · 2020-06-16T21:07:56Z

@kuhushukla, I have now removed the NaNs check from GpuHashAggregateMeta. HashAggregateSuite and hash_aggregate_test.py pass.

That's the only change in the last commit (0a241a8). Could you please confirm that I haven't missed anything?

I dont think any of the tests from before what u have just added test for the grouping key as Nan. I suspect we will hit some issues around that but since all of the legacy tests were written with has_nans=false in mind I would probably not rely on them too much and possibly look at adding more tests for it. This change then makes me think if it should be a separate PR at this point. We want coherency when we say this data has nans and now we can normalize them but we have hit issues with sorting nans and such before so maybe testing and identifying what does or does not work with it on different operators and turning it off at aggregate operator level would be better. I apologize if I have mislead you as far as this PR goes but we need to look at how to use has_nans properly here or as a follow on.

Added more aggregation functions to groupby tests.

mythrocks · 2020-06-16T23:47:43Z

The hasNans flag is fraught with peril. :/ This flag conflates a couple of scenarios:

NaN in GBY expressions
NaN in the aggregation column

I'd be delighted to revert my change to GpuHashAggregateMeta. But based on my testing, I think it might actually be safe for GpuHashAggregate to bank on normalization to do the GBY correctly. The first point above should be covered.

I have added more aggregation functions to the NaN/Zero normalization tests. These tests succeed with and without the hasNans check in GpuHashAggregateMeta.
Incidentally, I've uncovered an unrelated issue (#194) with COUNT(DISTINCT float_column) that I've added an xfail test for. This fails regardless or the GpuHashAggregateMeta hasNans check.

mythrocks · 2020-06-16T23:47:55Z

build

…zeroes

revans2 · 2020-06-18T13:22:32Z

build

…NVIDIA#181)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized

9e3705c

mythrocks self-assigned this Jun 16, 2020

mythrocks added this to the Jun 8 - Jun 19 milestone Jun 16, 2020

mythrocks added feature request New feature or request test Only impacts tests labels Jun 16, 2020

kuhushukla reviewed Jun 16, 2020

View reviewed changes

mythrocks mentioned this pull request Jun 16, 2020

Integration tests for normalizing NaN/zeroes. #160

Merged

mythrocks added 2 commits June 16, 2020 10:50

Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized

11bd233

(Removed incompat in Python integration test for HashAggregateExec, where obvious.)

Merge remote-tracking branch 'origin/branch-0.1' into normalize-nans-…

0874681

…zeroes

revans2 previously approved these changes Jun 16, 2020

View reviewed changes

revans2 removed the test Only impacts tests label Jun 16, 2020

Merge remote-tracking branch 'origin/branch-0.1' into normalize-nans-…

0226599

…zeroes

Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized

0a241a8

Removed obviated check for NaNs from GpuHashAggregateMeta

mythrocks dismissed revans2’s stale review via 0a241a8 June 16, 2020 20:50

mythrocks mentioned this pull request Jun 16, 2020

[BUG] count(distinct float_col) produces different results from CPU, for Float columns with NaNs #194

Closed

Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized

f427e5f

Added more aggregation functions to groupby tests.

kuhushukla approved these changes Jun 17, 2020

View reviewed changes

Merge remote-tracking branch 'origin/branch-0.1' into normalize-nans-…

b0c38d6

…zeroes

revans2 approved these changes Jun 18, 2020

View reviewed changes

revans2 merged commit e0209e8 into NVIDIA:branch-0.1 Jun 18, 2020

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized (…

d1a9fb3

…NVIDIA#181)

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized (…

d189800

…NVIDIA#181)

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023

Update submodule cudf to 0ea6f8e (NVIDIA#181)

a1085cf

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized #181

Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized #181

mythrocks commented Jun 15, 2020

kuhushukla left a comment

revans2 commented Jun 16, 2020

revans2 commented Jun 16, 2020

mythrocks commented Jun 16, 2020

mythrocks commented Jun 16, 2020

revans2 commented Jun 16, 2020

revans2 commented Jun 16, 2020

mythrocks commented Jun 16, 2020

mythrocks commented Jun 16, 2020

mythrocks commented Jun 16, 2020

kuhushukla commented Jun 16, 2020

mythrocks commented Jun 16, 2020

mythrocks commented Jun 16, 2020

revans2 commented Jun 18, 2020

Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized #181

Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized #181

Conversation

mythrocks commented Jun 15, 2020

kuhushukla left a comment

Choose a reason for hiding this comment

revans2 commented Jun 16, 2020

revans2 commented Jun 16, 2020

mythrocks commented Jun 16, 2020

mythrocks commented Jun 16, 2020

revans2 commented Jun 16, 2020

revans2 commented Jun 16, 2020

mythrocks commented Jun 16, 2020

mythrocks commented Jun 16, 2020

mythrocks commented Jun 16, 2020

kuhushukla commented Jun 16, 2020

mythrocks commented Jun 16, 2020

mythrocks commented Jun 16, 2020

revans2 commented Jun 18, 2020