Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCALAR_IN_ARRAY: Optimization and behavioral follow-ups. #16311

Merged
merged 3 commits into from
Apr 26, 2024

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Apr 19, 2024

Four changes to scalar_in_array as follow-ups to #16306:

  1. Align behavior for null scalars to the behavior of the native in and inType filters: return true if the array itself contains null, else return null.

  2. Rename the class to more closely match the function name.

  3. Add a specialization for constant arrays, where we build a HashSet.

  4. Use castForEqualityComparison to properly handle cross-type comparisons.
    Additional tests verify comparisons between LONG and DOUBLE are now
    handled properly.

@gianm gianm changed the title scalar_in_array: constant-array version, properly cast for comparison. scalar_in_array optimization and behavioral follow-ups Apr 20, 2024
@gianm gianm changed the title scalar_in_array optimization and behavioral follow-ups SCALAR_IN_ARRAY: Optimization and behavioral follow-ups. Apr 20, 2024
1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`.

2) Rename the class to more closely match the function name.

3) Add a specialization for constant arrays, where we build a `HashSet`.

4) Use `castForEqualityComparison` to properly handle cross-type comparisons.
   Additional tests verify comparisons between LONG and DOUBLE are now
   handled properly.
return ExprEval.ofLongBoolean(Arrays.asList(array).contains(scalarExpr.value()));

if (scalarEval.value() == null) {
return Arrays.asList(array).contains(null) ? ExprEval.ofLongBoolean(true) : ExprEval.ofLong(null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the array contains null - then doesn't all places when false would have been returned should return null? for example:

c IN (2,null)
c = 2 OR c = null
c = 2 OR null

this last rewrite essentially mean that false is no more
....other way around is to think about = as IS NOT DISTINCT FROM ; however in that case the return value will never be null as <=> is a 2-valued

this is not a blocking comment; I was just wondering...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that if the array contains null then the comparison acts like IS NOT DISTINCT FROM (always returns true or false), whereas if the array does not contain null, then the comparison acts like = (returns null if the lhs is null). It's the same way the native in and inType filters behave, so this is designed to align with those.

A future patch would convert from SQL IN to this SCALAR_IN_ARRAY function. We'll need to make sure to handle NULL appropriately in that patch-- it would not be ok to rewrite c IN (2, NULL) to SCALAR_IN_ARRAY(c, ARRAY[2, NULL]).

ExprEval doApply(final ExprEval arrayExpr, final ExprEval scalarEval)
{
if (matchType == null) {
return ExprEval.ofLong(null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: matchType is known at asSingleThreaded call time; a strategy pattern could create a class returning only null unconditionally
it may make it a bit more readable and matchType may not be null - other benefits are possibly negligable...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a specialization for WithNullArray, so now the WithConstantArray specialization requires a non-null constant array.

While doing this I also realized I had implemented WithConstantArray inefficiently: by overriding doApply rather than apply, the constant array expr would still have eval() called on it. I changed it to override apply instead.

@gianm gianm merged commit db82adc into apache:master Apr 26, 2024
87 checks passed
@gianm gianm deleted the sia-follow-up branch April 26, 2024 23:01
adarshsanjeev pushed a commit to adarshsanjeev/druid that referenced this pull request May 6, 2024
* Four changes to scalar_in_array as follow-ups to apache#16306:

1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`.

2) Rename the class to more closely match the function name.

3) Add a specialization for constant arrays, where we build a `HashSet`.

4) Use `castForEqualityComparison` to properly handle cross-type comparisons.
   Additional tests verify comparisons between LONG and DOUBLE are now
   handled properly.

* Fix spelling.

* Adjustments from review.
cryptoe pushed a commit that referenced this pull request May 8, 2024
…6398)

* Four changes to scalar_in_array as follow-ups to #16306:

1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`.

2) Rename the class to more closely match the function name.

3) Add a specialization for constant arrays, where we build a `HashSet`.

4) Use `castForEqualityComparison` to properly handle cross-type comparisons.
   Additional tests verify comparisons between LONG and DOUBLE are now
   handled properly.

* Fix spelling.

* Adjustments from review.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants