-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add accurate hash join size functions #8453
Add accurate hash join size functions #8453
Conversation
…emporary solution
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8453 +/- ##
================================================
+ Coverage 82.53% 82.91% +0.38%
================================================
Files 110 110
Lines 17739 18094 +355
================================================
+ Hits 14640 15002 +362
+ Misses 3099 3092 -7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API looks great for what we would want.
@gpucibot merge |
Addresses #8237
This PR adds 3 join size APIs (
hash_join::inner_join_size
,hash_join::left_join_size
andhash_join::full_join_size
) intohash_join
class, one for each type of join that returns the exact number of matches with the specified probe table. It completely removed the deprecated size estimation logic in the current implementation.Also, this PR updates the existing join APIs by adding an optional
output_size
as an argument. Ifoutput_size.has_value()
, we take that value directly for further computation. Otherwise, the target join will internally invoke its corresponding size function.TODO
: the currentfull_join_size
uses a 2-step algorithm similar to what's used inhash_join::full_join
. It duplicates certain computations withfull_join
also thus should be refactored duringcuco
integration.