fix greatest/least function non-vectorized processing to ignore null argument types #16649

clintropolis · 2024-06-24T22:46:14Z

Description

Fixes a flaw in greatest and least functions non-vectorized expression processing to ignore null valued arguments for determining the output type. For example, given an expression like least(1.0, 1, null), the getOutputType method correctly chooses DOUBLE as the output type, however during non-vectorized processing, we incorrectly consider it, which ends up with the default STRING output type, resulting in incorrectly using the STRING comparator instead of the double comparator as expected. This can cause odd effects and incorrect results because values like 1.0 and 1 are considered distinct when using the string comparator, while the same when using a numeric comparator.

The added tests fail without the changes in this PR.

This PR has:

been self-reviewed.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
been tested in a test Druid cluster.

…argument types

kgyrtkirk · 2024-06-25T07:46:35Z

I think it would be great to have consistency check testcase which ensures that from now on the contract that least should behave the same in non-vectorized mode and in vectorized mode .... (or that's already there with some trick I've missed?)

I understand that this patch fixes a consistency issue - but I'm a bit confused to see that it will be allowed to have DOUBLE typed null for the least function; shouldn't others like + also follow the same principle?

for example: with the proposed changes the following testLeast1() will pass; however testAdd will not.

  @Test
  public void testLeast1()
  {
    ObjectBinding a = new SingleInputBindings(ExpressionType.DOUBLE);
    ExpressionType t1 = Parser.parse("least(1.0 , 2.0)", ExprMacroTable.nil()).eval(a).type();
    ExpressionType t2 = Parser.parse("least(1.0 , null)", ExprMacroTable.nil()).eval(a).type();
    assertEquals(t1, t2);
  }

  @Test
  public void testAdd()
  {
    ObjectBinding a = new SingleInputBindings(ExpressionType.DOUBLE);
    ExpressionType t1 = Parser.parse("1.0 + 2.0", ExprMacroTable.nil()).eval(a).type();
    ExpressionType t2 = Parser.parse("1.0 + null", ExprMacroTable.nil()).eval(a).type();
    assertEquals(t1, t2);
  }

is that okay? couldn't the same happen for + ?

clintropolis · 2024-06-25T08:39:18Z

I think it would be great to have consistency check testcase which ensures that from now on the contract that least should behave the same in non-vectorized mode and in vectorized mode .... (or that's already there with some trick I've missed?)

Technically neither of these functions are vectorized yet and just rely on the fallback stuff introduced in #16366. There are such tests that fit this purpose, but none of the fallback stuff has been added here https://github.com/apache/druid/blob/master/processing/src/test/java/org/apache/druid/math/expr/VectorExprSanityTest.java https://github.com/apache/druid/blob/master/sql/src/test/java/org/apache/druid/sql/calcite/SqlVectorizedExpressionSanityTest.java because its basically like.. all the functions i guess.

I understand that this patch fixes a consistency issue - but I'm a bit confused to see that it will be allowed to have DOUBLE typed null for the least function; shouldn't others like + also follow the same principle?

So, the greatest and least functions do not follow the same behavior as math operations, for whatever reason a null argument does not result in the output of the function being null the same way add does. See comment in https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/math/expr/Function.java#L635.

The reason the output type is double in your first test is because the result is 1.0, while the reason it is string in the second one is a sort of legacy behavior of non-vectorized expressions where the output type of null values is STRING. Generally, when dealing with ExprEval, the type is ignored if the value is null because it isn't trusted. Vectorized expressions on the other hand will create a 'null' vector of whatever the inferred output type of the expression is, so that addition example when vectorized has an output type of double, with null value vectors.

kgyrtkirk

thank you for the explanation!

one more question: is it true that a null will always be of type STRING?

clintropolis · 2024-06-25T23:00:42Z

one more question: is it true that a null will always be of type STRING?

Null constants always will be currently, while null values from columns will be whatever column type they came from (e.g. long column make ExprEval with ExpressionType.LONG with null values). The exception to from columns is in cases where we have no type information and are using expressions, which discover the types on the fly with ExprEval.bestEffortOf, in which case they also will be string like constants.

fix greatest/least function non-vectorized processing to ignore null …

0d7c78f

…argument types

clintropolis added the Bug label Jun 24, 2024

fix tests

7554301

github-actions bot added the Area - Querying label Jun 25, 2024

kgyrtkirk approved these changes Jun 25, 2024

View reviewed changes

clintropolis merged commit d4f2636 into apache:master Jun 26, 2024
87 checks passed

clintropolis deleted the fix-greatest-least-nonvectorized branch June 26, 2024 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix greatest/least function non-vectorized processing to ignore null argument types #16649

fix greatest/least function non-vectorized processing to ignore null argument types #16649

clintropolis commented Jun 24, 2024

kgyrtkirk commented Jun 25, 2024

clintropolis commented Jun 25, 2024

kgyrtkirk left a comment

clintropolis commented Jun 25, 2024

fix greatest/least function non-vectorized processing to ignore null argument types #16649

fix greatest/least function non-vectorized processing to ignore null argument types #16649

Conversation

clintropolis commented Jun 24, 2024

Description

kgyrtkirk commented Jun 25, 2024

clintropolis commented Jun 25, 2024

kgyrtkirk left a comment

Choose a reason for hiding this comment

clintropolis commented Jun 25, 2024