Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast the values read in the EXTERN function to the type in the row signature #15183

Merged
merged 6 commits into from
Oct 19, 2023

Conversation

LakshSingla
Copy link
Contributor

@LakshSingla LakshSingla commented Oct 17, 2023

Description

Currently, when reading values from an external data source (EXTERN), we don't cast the values to the signature passed by the user. This can cause an issue when reading values that are not casted by the underlying selectors.

For example, when reading a boolean array [true, false], we may run into a bug where the array is not casted to ARRAY<LONG> (internal representation for boolean) by the underlying selector, we pass it to methods assuming the getObject() returns a valid array type (Object[] with long values). Therefore it's important to cast the externally read values to the row signature passed by the user. This PR addresses the issue and preserves the type that is entered by the user in the query, regardless of what gets read.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@LakshSingla LakshSingla added this to the 28.0 milestone Oct 17, 2023
@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 17, 2023
Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 i suppose there is going to be a perf cost to this, but since InputFormat are not currently responsible for coercing values to common druid types I'm not sure what else we can do here.

I think since this is going to be sitting on top of a row based column selector factory getLong/getDouble/getFloat should probably be handled ok and don't need the same help.

I suppose this coercion could also be done in RowBasedColumnSelectorFactory, though the impact might be potentially larger because it is used in other places too, so we would want to be cautious probably. It is already doing some similar coercions, but nothing for getObject.

@@ -97,7 +103,11 @@ public void inspectRuntimeShape(RuntimeShapeInspector inspector)
public Object getObject()
{
try {
return delegateDimensionSelector.getObject();
ExpressionType expressionType = ExpressionType.fromColumnType(dimensionSpec.getOutputType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably save this outside of getObject so don't have to recreate every time (it uses interned objects so it won't be making new ones really, but could still save some overhead)

@@ -195,7 +205,13 @@ public boolean isNull()
public Object getObject()
{
try {
return delegateColumnValueSelector.getObject();
ExpressionType expressionType = ExpressionType.fromColumnType(rowSignature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment on saving outside of getObject

@@ -857,6 +858,53 @@ public void testScanWithOrderByOnDoubleArray()
.verifyResults();
}

@Test
public void testScanExternBooleanArray()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to add some other mismatched types, maybe even some that are not convertible to show what happens (like maybe json input to a double?)

@LakshSingla LakshSingla merged commit fa311dd into apache:master Oct 19, 2023
53 checks passed
@LakshSingla LakshSingla deleted the cast-extern-selector branch October 19, 2023 04:54
LakshSingla added a commit to LakshSingla/druid that referenced this pull request Oct 19, 2023
CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants