Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized #181

Merged
merged 7 commits into from
Jun 18, 2020

Conversation

mythrocks
Copy link
Collaborator

Removed INCOMPAT flag for FP Normalization functions.

@mythrocks mythrocks self-assigned this Jun 16, 2020
@mythrocks mythrocks added this to the Jun 8 - Jun 19 milestone Jun 16, 2020
@mythrocks mythrocks added feature request New feature or request test Only impacts tests labels Jun 16, 2020
Copy link
Collaborator

@kuhushukla kuhushukla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth considering removing the has nan check in hash aggregate at this time?

@revans2
Copy link
Collaborator

revans2 commented Jun 16, 2020

build

@revans2
Copy link
Collaborator

revans2 commented Jun 16, 2020

I wish we had a way to flag if a test was marked as incompat, but didn't need it. The pyspark has aggregate tests also have incompat on several of them for this reason.

@mythrocks
Copy link
Collaborator Author

I wish we had a way to flag if a test was marked as incompat...

I should try and remove @incompat for the obvious ones. I'll update here.

@mythrocks
Copy link
Collaborator Author

I have removed the @incompat for the NaN/Zero test cases added in #160.

I tried going through and removing the @allowed_on_cpu for other tests in hash_aggregate_test.py, but those are more involved. (The NOT_ON_GPU results from other reasons.)

@revans2
Copy link
Collaborator

revans2 commented Jun 16, 2020

buil

@revans2
Copy link
Collaborator

revans2 commented Jun 16, 2020

build

revans2
revans2 previously approved these changes Jun 16, 2020
@revans2 revans2 removed the test Only impacts tests label Jun 16, 2020
@mythrocks
Copy link
Collaborator Author

Is it worth considering removing the has nan check in hash aggregate at this time?

@kuhushukla, I'm looking into this now.

Removed obviated check for NaNs from GpuHashAggregateMeta
@mythrocks
Copy link
Collaborator Author

@kuhushukla, I have now removed the NaNs check from GpuHashAggregateMeta. HashAggregateSuite and hash_aggregate_test.py pass.

That's the only change in the last commit (0a241a8). Could you please confirm that I haven't missed anything?

@mythrocks
Copy link
Collaborator Author

build

@kuhushukla
Copy link
Collaborator

@kuhushukla, I have now removed the NaNs check from GpuHashAggregateMeta. HashAggregateSuite and hash_aggregate_test.py pass.

That's the only change in the last commit (0a241a8). Could you please confirm that I haven't missed anything?

I dont think any of the tests from before what u have just added test for the grouping key as Nan. I suspect we will hit some issues around that but since all of the legacy tests were written with has_nans=false in mind I would probably not rely on them too much and possibly look at adding more tests for it. This change then makes me think if it should be a separate PR at this point. We want coherency when we say this data has nans and now we can normalize them but we have hit issues with sorting nans and such before so maybe testing and identifying what does or does not work with it on different operators and turning it off at aggregate operator level would be better. I apologize if I have mislead you as far as this PR goes but we need to look at how to use has_nans properly here or as a follow on.

Added more aggregation functions to groupby tests.
@mythrocks
Copy link
Collaborator Author

The hasNans flag is fraught with peril. :/ This flag conflates a couple of scenarios:

  1. NaN in GBY expressions
  2. NaN in the aggregation column

I'd be delighted to revert my change to GpuHashAggregateMeta. But based on my testing, I think it might actually be safe for GpuHashAggregate to bank on normalization to do the GBY correctly. The first point above should be covered.

I have added more aggregation functions to the NaN/Zero normalization tests. These tests succeed with and without the hasNans check in GpuHashAggregateMeta.
Incidentally, I've uncovered an unrelated issue (#194) with COUNT(DISTINCT float_column) that I've added an xfail test for. This fails regardless or the GpuHashAggregateMeta hasNans check.

@mythrocks
Copy link
Collaborator Author

build

@revans2
Copy link
Collaborator

revans2 commented Jun 18, 2020

build

@revans2 revans2 merged commit e0209e8 into NVIDIA:branch-0.1 Jun 18, 2020
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants