Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix 'is not null' is invalid for array-type in druid-sql #10921

Closed
wants to merge 2 commits into from

Conversation

chenyuzhi459
Copy link
Contributor

@chenyuzhi459 chenyuzhi459 commented Feb 24, 2021

Fixes #10525

Description

Fix the bug : incorrect results (including nulls) when querying string-array column(let's say it's col) with col <> '' and col is not null

Reason: When we use is not null filter in SQL, Calcite will treat it as truefilter through call stack as beflow:
DruidPlanner.plan(final String sql) -> PlannerImpl.rel(SqlNode sql) ->SqlToRelConverter.convertQuery(SqlNode query, boolean needsValidation, boolean top) ->SqlToRelConverter.convertQueryRecursive(SqlNode query, boolean top, RelDataType targetRowType)->SqlToRelConverter.convertSelect(SqlSelect select, boolean top) -> SqlToRelConverter.convertSelectImpl(SqlToRelConverter.Blackboard bb, SqlSelect select) -> SqlToRelConverter.convertWhere(SqlToRelConverter.Blackboard bb, SqlNode where) -> RexCall.isAlwaysTrue()-> AbstractSqlType.isNullable()

The last method AbstractSqlType.isNullable() will return false in our arrayColumnType. Because in the method Calcites.createSqlArrayTypeWithNullability(), we only specify nullable praram for elementType in arrayColumnType,
however, typeFactory.createArrayType() will invoke new ArraySqlType(elementType, false) to return a new RelDataType.
Thus, we will got incorrect results (including nulls) when use is not null

Solution: support specify nullable praram for arrayColumnType


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • org.apache.druid.sql.calcite.planner.Calcites

@chenyuzhi459 chenyuzhi459 changed the title fix is not null is invalid for array-type in druid-sql fix 'is not null' is invalid for array-type in druid-sql Feb 24, 2021
Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks for the fix, this was my bad I think 😅

Any chance you could add a test for this case? https://github.com/apache/druid/blob/master/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java is the catch-all bucket where we typically add functional query tests, and https://github.com/apache/druid/blob/master/sql/src/test/java/org/apache/druid/sql/calcite/util/CalciteTests.java is where you can find the data that these tests operate on, if it helps.

@chenyuzhi459
Copy link
Contributor Author

thanks for the fix, this was my bad I think

Any chance you could add a test for this case? https://github.com/apache/druid/blob/master/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java is the catch-all bucket where we typically add functional query tests, and https://github.com/apache/druid/blob/master/sql/src/test/java/org/apache/druid/sql/calcite/util/CalciteTests.java is where you can find the data that these tests operate on, if it helps.

Hey, i had try to add a test in https://github.com/apache/druid/blob/master/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java.
However, I can not simulate a filter of string_array column,because we treat string_array column as string column in the org.apache.druid.sql.calcite.schema.DruidSchema.analysisToRowSignature(). For example, dim3 column in table 'foo' is a multivalue column, and in the schema ,it convert to a string column simply.

@clintropolis
Copy link
Member

Hey, i had try to add a test in https://github.com/apache/druid/blob/master/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java.
However, I can not simulate a filter of string_array column,because we treat string_array column as string column in the org.apache.druid.sql.calcite.schema.DruidSchema.analysisToRowSignature(). For example, dim3 column in table 'foo' is a multivalue column, and in the schema ,it convert to a string column simply.

Hmm, yeah, true array types aren't quite fully supported throughout the query engine yet. We've got some functions to interact with multi-value string dimensions as if they were arrays, as well as array constants, but the result row signatures are still for the most part non-array types because it's not quite there yet (no grouping on the rows as complete arrays for example). You might be able to use an array constant on a scan query to get the right row signature, but I'm not certain about that because in the native druid query, array types will still currently end up as STRING virtual columns I believe.

@chenyuzhi459
Copy link
Contributor Author

chenyuzhi459 commented Mar 11, 2021

Hmm, yeah, true array types aren't quite fully supported throughout the query engine yet. We've got some functions to interact with multi-value string dimensions as if they were arrays, as well as array constants, but the result row signatures are still for the most part non-array types because it's not quite there yet (no grouping on the rows as complete arrays for example). You might be able to use an array constant on a scan query to get the right row signature, but I'm not certain about that because in the native druid query, array types will still currently end up as STRING virtual columns I believe.

Thanks for your tips, I have try to run testSelectConstantArrayExpressionFromTable() in CalciteQueryTest.java, however i found in the rowsignature, all column is treated as asingle-string column。And I found if i want to change the rowsignature of multi-string value, it will have a huge change on the source code. In this situation, it's hard to add a test for my code in createSqlArrayTypeWithNullability(typeFactory,elementTypeName,nullable).

@clintropolis
Copy link
Member

And I found if i want to change the rowsignature of multi-string value, it will have a huge change on the source code.

Ah yeah, don't do that 😅

Hmm, I guess i'm having trouble determining how this bug is hit if we can't replicate it in the tests. I guess this area of the code has seem a fair bit of refactor over the last few releases, so maybe it used to be possible? That said, it seems like it should be a harmless change. I'll have a closer look and see if I can think of a way to trigger it.

@chenyuzhi459
Copy link
Contributor Author

And I found if i want to change the rowsignature of multi-string value, it will have a huge change on the source code.

Ah yeah, don't do that

Hmm, I guess i'm having trouble determining how this bug is hit if we can't replicate it in the tests. I guess this area of the code has seem a fair bit of refactor over the last few releases, so maybe it used to be possible? That said, it seems like it should be a harmless change. I'll have a closer look and see if I can think of a way to trigger it.

I have checked the latest code in master branch, and it's still return dataType' directly in methond Calcites.createSqlArrayTypeWithNullability`, thus it's still a potential problem there i think. And it'll be my pleasure if could provide more help for your check and trigger work.

@clintropolis
Copy link
Member

I have checked the latest code in master branch, and it's still return dataType' directly in methond Calcites.createSqlArrayTypeWithNullability`, thus it's still a potential problem there i think. And it'll be my pleasure if could provide more help for your check and trigger work.

Oops, forgot to look into this, sorry! I will try to have a look this weekend.

@stale
Copy link

stale bot commented Apr 28, 2022

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Apr 28, 2022
@github-actions
Copy link

github-actions bot commented Oct 8, 2023

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect results (including nulls) when querying string column with col <> '' and col is not null
2 participants