Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to equal NaNs in make_collect_set_aggregation. #11621

Merged
merged 9 commits into from
Oct 20, 2022

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Aug 29, 2022

Description

Partially resolves #11329. This helps to align our default behaviors for null and NaN equality across APIs, specifically for make_collect_set_aggregation in this PR. All functions should default to treating null values as equal to one another and NaN values as equal to one another.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Aug 29, 2022
@bdice bdice self-assigned this Aug 29, 2022
@bdice bdice added improvement Improvement / enhancement to an existing function breaking Breaking change labels Aug 29, 2022
@bdice
Copy link
Contributor Author

bdice commented Aug 29, 2022

rerun tests

@bdice
Copy link
Contributor Author

bdice commented Sep 9, 2022

rerun tests

@bdice bdice changed the base branch from branch-22.10 to branch-22.12 October 18, 2022 21:05
@bdice bdice marked this pull request as ready for review October 20, 2022 16:18
@bdice bdice requested a review from a team as a code owner October 20, 2022 16:18
Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov
Copy link

codecov bot commented Oct 20, 2022

Codecov Report

Base: 87.40% // Head: 88.13% // Increases project coverage by +0.72% 🎉

Coverage data is based on head (643d02f) compared to base (f72c4ce).
Patch coverage: 89.92% of modified lines in pull request are covered.

❗ Current head 643d02f differs from pull request most recent head 3411708. Consider uploading reports for the commit 3411708 to get more accurate results

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #11621      +/-   ##
================================================
+ Coverage         87.40%   88.13%   +0.72%     
================================================
  Files               133      133              
  Lines             21833    21987     +154     
================================================
+ Hits              19084    19379     +295     
+ Misses             2749     2608     -141     
Impacted Files Coverage Δ
python/cudf/cudf/core/dataframe.py 93.77% <ø> (ø)
python/cudf/cudf/core/indexed_frame.py 92.03% <ø> (ø)
python/cudf/cudf/core/udf/__init__.py 97.05% <ø> (+47.05%) ⬆️
python/cudf/cudf/io/orc.py 92.94% <ø> (-0.09%) ⬇️
python/cudf/cudf/testing/dataset_generator.py 72.83% <ø> (-0.42%) ⬇️
...thon/dask_cudf/dask_cudf/tests/test_distributed.py 18.86% <ø> (+4.94%) ⬆️
python/cudf/cudf/core/_base_index.py 82.20% <43.75%> (-3.35%) ⬇️
python/cudf/cudf/io/text.py 91.66% <66.66%> (-8.34%) ⬇️
python/strings_udf/strings_udf/__init__.py 84.31% <76.00%> (-12.57%) ⬇️
python/dask_cudf/dask_cudf/backends.py 84.90% <82.92%> (-0.37%) ⬇️
... and 27 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN was defaulted to UNEQUAL because that (collect_set) was implemented as requested for Spark. Anyway, since we have the parameter and Spark explicitly passes in that parameter, the changes in this PR will not cause any breaking.

@bdice
Copy link
Contributor Author

bdice commented Oct 20, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit ee9ffd0 into rapidsai:branch-22.12 Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent default values for null equality and NaN equality
3 participants