-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Part 2 of changes for SQL Compatible Null Handling #5958
Conversation
@leventov : Created a new PR, please review. |
@nishantmonu51 as far as I can see you didn't address all comments that I left in the last batch in the previous PR, please answer to them there (if you are able to open that PR) |
93f41e8
to
6a6d35a
Compare
@leventov: last batch of comments was re lines more than 120 cols, add line breaks and reformatted code. Also replied to your comments on previous PR, If you have any more comments can you please add them here ? |
Since this PR is tagged as |
@nishantmonu51 is this the last PR for #4349? Would you let me know which are not done yet in #5278 and this PR? |
Some comments and questions for the PR description.
This should be moved to #5278.
I haven't checked through the whole patch, so am wondering this is true only for the SQL layer or the entire Druid query processing system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great work. Thanks @nishantmonu51!
Reviewed up to LoadingLookup
and still reviewing.
@@ -99,10 +98,10 @@ public Object value() | |||
return value; | |||
} | |||
|
|||
public boolean isNull() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it's better to call this method rather than doing value() == null
in call sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is replaced with isNumericNull as that is what we need to check for Expressions.
isNumericNull will check whether the primitive long/float/double equivalent is null or not.
It is possible for an expression to have a non-null String value but it can return null when parsed as a primitive long/float/double.
return ExprEval.of(null); | ||
} | ||
|
||
if (leftVal.type() == ExprType.STRING && rightVal.type() == ExprType.STRING) { | ||
return evalString(leftVal.asString(), rightVal.asString()); | ||
} else if (leftVal.type() == ExprType.LONG && rightVal.type() == ExprType.LONG) { | ||
if (NullHandling.sqlCompatible() && (leftVal.isNumericNull() || rightVal.isNumericNull())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder why we check this here instead of changing evalLong()
or evalDouble
to return a nullable Long
or Double
, respectively. Is this to avoid boxing/unboxing? If so, I wonder how much it affects to the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about how much performance effect boxing/unboxing will have, in this form atleast the change is on safer side and does not affect performance for non-sql case.
We can run more benchmarks in a follow up PR and change this api to return boxed values if the performance diff is not much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
return theLong == null ? 0L : theLong; | ||
Number number = asNumber(); | ||
if (number == null) { | ||
assert NullHandling.replaceWithDefault(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about checking and throwing an Exception rather than a assert
statement? The assert statement wouldn't be super helpful for users if it happens. Same for all other assert statements in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, I was actually doing that before, but changed this to assert based on discussions on previous PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, would you point me out where the discussion is? I guess it's in #5278?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unable to find the discussion among lots of review comments.
I believe the intent here was to catch any coding errors/places where we may have missed handling nulls properly.
@leventov: can probably add more context here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -59,7 +60,9 @@ public ApproximateHistogramAggregator( | |||
@Override | |||
public void aggregate() | |||
{ | |||
histogram.offer(selector.getFloat()); | |||
if (NullHandling.replaceWithDefault() || !selector.isNull()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does NullHandling.replaceWithDefault()
mean here? Does that mean nulls have replaced with some default value at ingestion time? If so, I guess select.isNull()
always return false
and thus this check can be simplified into if (!selector.isNull())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means whether nulls should be replaced with default value.
In case of ExpressionColumnValueSelector isNull will compute the expression and then give the result, the check is there to not have any performance impact of calling isNull for default case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks. Would you add a comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
{ | ||
if (key == null) { | ||
String keyEquivalent = NullHandling.nullToEmptyIfNeeded(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this kind of null value conversion is needed only when retrieving values from the storage layer (like in ColumnValueSelector
), so that we expect key
is already converted to some default value if needed. But, apparently it seems not. Would you explain why this is needed after retrieving values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is to preserve backwards compatibility,
previously, nulls and empty strings were considered same.
and if a looking has mapping "" -> "someval" then all nulls/empty strings will be resolved to "someval".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Thanks @jihoonson for helping with the review and taking this PR forward. |
import java.nio.ByteBuffer; | ||
import java.util.Arrays; | ||
import java.util.Comparator; | ||
import java.util.List; | ||
import java.util.Map; | ||
import java.util.Objects; | ||
|
||
public class FloatFirstAggregatorFactory extends AggregatorFactory | ||
public class FloatFirstAggregatorFactory extends NullableAggregatorFactory<ColumnValueSelector> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment: #5452 (comment) is not addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a blocker for this PR, would like to do it a follow up PR - #6039
@@ -73,7 +74,14 @@ public static PeriodGranularity toPeriodGranularity( | |||
} else { | |||
Chronology chronology = timeZone == null ? ISOChronology.getInstanceUTC() : ISOChronology.getInstance(timeZone); | |||
final Object value = originArg.eval(bindings).value(); | |||
origin = value != null ? new DateTime(value, chronology) : null; | |||
if (value instanceof String && StringUtils.isEmpty((String) value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment: #5452 (comment) is not addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#6040
Will fix this line in this PR and prohibit the api in future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -90,7 +95,7 @@ public SelectorDimFilter( | |||
@Override | |||
public DimFilter optimize() | |||
{ | |||
return new InDimFilter(dimension, ImmutableList.of(value), extractionFn).optimize(); | |||
return new InDimFilter(dimension, Arrays.asList(value), extractionFn).optimize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collections.singletonList()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -27,5 +29,7 @@ | |||
// converted to strings. We should also add functions | |||
// for these and modify ColumnComparisonFilter to handle | |||
// comparing Long and Float columns to eachother. | |||
// Returns null when the underlying Long/Float value is null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not addressed: #5452 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -269,12 +270,22 @@ public int compare(Row left, Row right) | |||
|
|||
private Ordering<Row> metricOrdering(final String column, final Comparator comparator) | |||
{ | |||
return Ordering.from(Comparator.comparing((Row row) -> row.getRaw(column), Comparator.nullsLast(comparator))); | |||
if (NullHandling.sqlCompatible()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a comment explaining this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment.
@@ -93,7 +93,7 @@ public String getAnnouncementPath(String listenerName) | |||
{ | |||
return ZKPaths.makePath( | |||
getListenersPath(), Preconditions.checkNotNull( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each argument should be on a separate line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -107,6 +108,7 @@ public static SchemaPlus createRootSchema(final Schema druidSchema, final Author | |||
|
|||
public static String escapeStringLiteral(final String s) | |||
{ | |||
Preconditions.checkNotNull(s); | |||
if (s == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check immediately after Preconditions.checkNotNull?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the == null check, it was not needed.
@@ -294,7 +295,11 @@ public RelOptCost computeSelfCost(final RelOptPlanner planner, final RelMetadata | |||
|
|||
for (int i : rightKeys) { | |||
final Object value = row[i]; | |||
final String stringValue = value != null ? String.valueOf(value) : ""; | |||
if (value == null) { | |||
// NULLS are not supposed to match NULLs in a join. So ignore them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NULLs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -84,7 +84,7 @@ public void reduce( | |||
if (sqlTypeName == SqlTypeName.BOOLEAN) { | |||
literal = rexBuilder.makeLiteral(exprResult.asBoolean(), constExp.getType(), true); | |||
} else if (sqlTypeName == SqlTypeName.DATE) { | |||
if (!constExp.getType().isNullable() && exprResult.isNull()) { | |||
if (!constExp.getType().isNullable() && exprResult.isNumericNull()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible for an expression to have a non-null String value but it can return null when parsed as a primitive long/float/double.
ExprEval.isNumericNull checks whether the parsed primitive value is null or not.
Added comment to code.
getCluster().getRexBuilder().makeLiteral(value) | ||
) | ||
); | ||
// NULLS are not supposed to match NULLs in a join. So ignore them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NULLs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@leventov @jihoonson : Thanks for the review comments, addressed them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nishantmonu51 I finished my first review. You did a really nice job. I would like to thank you.
Two more comments:
- There are some checkstyle errors including wrong license. Please fix them.
- Would you please check my previous comments?
if (NullHandling.sqlCompatible()) { | ||
// empty string should also have same behavior | ||
Assert.assertEquals(null, loadingLookup.apply("")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be Assert.assertNull()
.
Also, do we need to test for non-sql-compatible mode? Like:
Assert.assertNull(loadingLookup.apply(null));
if (NullHandling.sqlCompatible()) {
// empty string should also have same behavior
Assert.assertNull(loadingLookup.apply(""));
} else {
Assert.assertEquals("", loadingLookup.apply(""));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
if (NullHandling.sqlCompatible()) { | ||
Assert.assertEquals(Collections.emptyList(), loadingLookup.unapply(null)); | ||
} else { | ||
Assert.assertEquals(null, loadingLookup.unapply(null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert.assertNull()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Assert.assertEquals(ImmutableMap.of("value", Lists.newArrayList("key")), loadingLookup.unapplyAll(ImmutableSet.<String>of("value"))); | ||
Assert.assertEquals( | ||
ImmutableMap.of("value", Lists.newArrayList("key")), | ||
loadingLookup.unapplyAll(ImmutableSet.<String>of("value")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please remove the unnecessary type argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
@@ -1466,7 +1505,7 @@ public void close() | |||
); | |||
} | |||
|
|||
public long sumMetric(final Task task, final DimFilter filter, final String metric) | |||
public Long sumMetric(final Task task, final DimFilter filter, final String metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please annotate @Nullable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
@@ -1081,7 +1088,7 @@ public void close() | |||
return toolboxFactory.build(task); | |||
} | |||
|
|||
public long sumMetric(final Task task, final DimFilter filter, final String metric) | |||
public Long sumMetric(final Task task, final DimFilter filter, final String metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please annotate @Nullable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
valuesBytesSize += valuesBytes[index].length + 1; | ||
++index; | ||
} | ||
byte[] extractionFnBytes = extractionFn == null ? new byte[0] : extractionFn.getCacheKey(); | ||
|
||
ByteBuffer filterCacheKey = ByteBuffer.allocate(3 | ||
ByteBuffer filterCacheKey = ByteBuffer.allocate(5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use CacheKeyBuilder
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -77,10 +81,12 @@ public SelectorDimFilter( | |||
byte[] valueBytes = (value == null) ? new byte[]{} : StringUtils.toUtf8(value); | |||
byte[] extractionFnBytes = extractionFn == null ? new byte[0] : extractionFn.getCacheKey(); | |||
|
|||
return ByteBuffer.allocate(3 + dimensionBytes.length + valueBytes.length + extractionFnBytes.length) | |||
return ByteBuffer.allocate(5 + dimensionBytes.length + valueBytes.length + extractionFnBytes.length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use CacheKeyBuilder
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -1700,6 +1712,78 @@ public BufferComparator getBufferComparator() | |||
return bufferComparator; | |||
} | |||
} | |||
|
|||
// This class is only used when SQL compatible null handling is enabled. | |||
private class NullableRowBasedKeySerdeHelper implements RowBasedKeySerdeHelper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment about the memory layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -90,7 +90,10 @@ public boolean supportsSelectivityEstimation( | |||
{ | |||
if (isSimpleEquals()) { | |||
// Verify that dimension equals prefix. | |||
return ImmutableList.of(selector.getBitmapIndex(dimension, likeMatcher.getPrefix())); | |||
return ImmutableList.of(selector.getBitmapIndex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format
return ImmutableList.of(
selector.getBitmapIndex(
dimension,
NullHandling.emptyToNullIfNeeded(likeMatcher.getPrefix())
)
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -80,7 +80,7 @@ public DruidExpression toDruidExpression( | |||
// So there is no simple extraction for this operator. | |||
return DruidExpression.fromFunctionCall( | |||
"timestamp_ceil", | |||
ImmutableList.of( | |||
Arrays.asList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Stream.of()
is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed because ImmutableList
doesn't support nulls, I suppose. Anyway, a parallel PR #5980 handles this in the whole codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides @jihoonson's comments, and if you also assign #6039 to yourself and do that after this PR.
Removed it from this PR.
it applies to whole query processing. updated PR description to add a note. |
@jihoonson : For the pending work, I have labelled the issues with Area : Null Handling Have updated the PR based on your comments, please check if i missed any comments. |
Thanks @nishantmonu51. Would you please check this comment: #5958 (comment)? Also, I'm not sure what have done in #5278 + this PR and not. I think at least |
this actually works now for Expressions, See BinaryEvalOpExprBase.eval, if any of the values being compared is null it returns null. |
@jihoonson In terms of functionality, After this PR, all things mentioned in the proposal except following should be working -
|
@jihoonson any more comments here ? |
@nishantmonu51 awesome! Thanks for clarifying. Everything looks good except the travis failure. Would you check it please? |
@@ -67,7 +82,14 @@ public void testUnapplyAll() throws ExecutionException | |||
.andReturn(Collections.singletonList("key")) | |||
.once(); | |||
EasyMock.replay(reverseLookupCache); | |||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please resolve the conflict..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@jihoonson: travis is passing now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@leventov @jihoonson @b-slim Thanks for the review. |
Part 2 of changes for SQL Compatible Null Handling -
This PR contains changes to improve handling of null values in druid by treating nulls as missing values which don’t actually exist. This will make it more compatible to SQL standard and help with integration with other existing BI systems which support ODBC/JDBC. (Detailed description in proposal - #4349)
NOTE - NULL and empty strings are treated differently in entire druid query processing system.
Nullability for Aggregators/Metrics
As per the current implementation, most aggregators are coded to replace null values with default e.g. sum treats them as zeroes, min treats them as positive infinity, etc.
To match the new semantics and make aggregators nullable, where if an aggregator encounters only the null values the result will be null value following changes are made -
Math Expressions Null Handling
Filtering on Null values
Changes to Druid build-in SQL layer
Backwards Compatibility
Testing
Pending work -
https://github.com/apache/incubator-druid/labels/Area%20-%20Null%20Handling