Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement mixed equality/conditional semi/anti joins #10037

Merged
merged 20 commits into from
Jan 24, 2022

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Jan 13, 2022

This PR is a follow-up to #9917 and should be merged after that PR. This resolves #9695 and resolves #5401. The implementation here is only a first pass, but in the interest of prioritizing a working feature for the upcoming release I'm postponing making various additional changes (including some breaking ones).

@vyasr vyasr added feature request New feature or request 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Jan 13, 2022
@vyasr vyasr added this to the Conditional Joins milestone Jan 13, 2022
@vyasr vyasr requested a review from a team as a code owner January 13, 2022 18:56
@vyasr vyasr self-assigned this Jan 13, 2022
@vyasr vyasr requested a review from a team as a code owner January 13, 2022 18:56
@github-actions github-actions bot added the CMake CMake build issue label Jan 13, 2022
@vyasr
Copy link
Contributor Author

vyasr commented Jan 13, 2022

To reviewers (@karthikeyann and @davidwendt) I would recommend waiting to review until #9917 is merged. I've opened this to allow the Spark team to start testing but this PR currently shows twice the changeset it needs.

@codecov
Copy link

codecov bot commented Jan 13, 2022

Codecov Report

Merging #10037 (7a545a1) into branch-22.04 (e24fa8f) will increase coverage by 0.04%.
The diff coverage is 0.00%.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.04   #10037      +/-   ##
================================================
+ Coverage         10.37%   10.42%   +0.04%     
================================================
  Files               119      119              
  Lines             20149    20607     +458     
================================================
+ Hits               2091     2148      +57     
- Misses            18058    18459     +401     
Impacted Files Coverage Δ
python/cudf/cudf/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/_fuzz_testing/io.py 0.00% <0.00%> (ø)
python/cudf/cudf/_fuzz_testing/orc.py 0.00% <ø> (ø)
python/cudf/cudf/_fuzz_testing/parquet.py 0.00% <ø> (ø)
python/cudf/cudf/_fuzz_testing/utils.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/_base_index.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/column.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/decimal.py 0.00% <0.00%> (ø)
... and 82 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cfb6cbe...7a545a1. Read the comment docs.

cpp/src/join/mixed_join.cu Outdated Show resolved Hide resolved
@vyasr vyasr force-pushed the feature/mixed_semi_anti_joins branch from 58c16f2 to 5f0378e Compare January 18, 2022 19:32
@vyasr
Copy link
Contributor Author

vyasr commented Jan 19, 2022

Sounds good. Also, it may be worth looking making the has_nulls a runtime parameter using the nullate pattern (like we do in the row-operators).

Yeah we can look into that. We'll have to make a call on what sort of performance hit is acceptable for the compile-time improvement, I can look at previous nullate PRs to assess how we've decided that in the past.

@vyasr vyasr requested a review from davidwendt January 19, 2022 22:54
@vyasr
Copy link
Contributor Author

vyasr commented Jan 20, 2022

rerun tests

@vyasr vyasr requested review from a team as code owners January 24, 2022 18:08
@vyasr vyasr requested review from rgsl888prabhu and brandon-b-miller and removed request for a team January 24, 2022 18:08
@vyasr vyasr changed the base branch from branch-22.02 to branch-22.04 January 24, 2022 18:08
@vyasr vyasr added 3 - Ready for Review Ready for review by team and removed 3 - Ready for Review Ready for review by team labels Jan 24, 2022
Copy link
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake changes LGTM

@vyasr
Copy link
Contributor Author

vyasr commented Jan 24, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit b8d2969 into rapidsai:branch-22.04 Jan 24, 2022
rapids-bot bot pushed a commit that referenced this pull request Jan 25, 2022
This depends on #9941 and #10037.  Adds Java bindings for mixed left semi join and mixed left anti join.

Authors:
  - Jason Lowe (https://github.com/jlowe)

Approvers:
  - Alessandro Bellina (https://github.com/abellina)
  - Thomas Graves (https://github.com/tgravescs)
  - Jim Brennan (https://github.com/jbrennan333)

URL: #10040
@vyasr vyasr deleted the feature/mixed_semi_anti_joins branch March 9, 2022 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Conditional hash join for left semi and left anti joins [FEA] Hash-based equality+inequality join
4 participants