Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix sql results mixed array and scalar values #16105

Merged

Conversation

clintropolis
Copy link
Member

Description

Fixes an issue which can occur when using schema discovery on columns with a mix of array and scalar values and querying with scan queries, where SQL result coercion does not expect scalar values to be present and so fails the results. This issue does not occur when grouping, since that homogenizes the results to the least restrictive type.

An easy way to repro this is to use the 'kttm' nested example data with full schema discovery, where the language column has a mix of scalar and array values, and then select * style query. The error would occur in broker error logs like:

org.apache.druid.error.DruidException: Cannot coerce field [language] from type [java.lang.String] to type [ARRAY]
	at org.apache.druid.error.DruidException$DruidExceptionBuilder.build(DruidException.java:460) ~[classes/:?]
	at org.apache.druid.sql.calcite.run.SqlResults.cannotCoerce(SqlResults.java:243) ~[classes/:?]

and present in the web-console something like this:

Query results were truncated midstream! This may indicate a server-side error or a client-side issue. Try re-running your query, or using a lower limit or a longer timeout.

which.. is not a great error message for this scenario I suppose, but is indeed what happens when we explode while coercing results on the fly like this. I'm unsure what a better way to handle this case would be, since it isn't really something the user can retry we might want to figure out a better way to handle this, but maybe out of scope of this PR since we are returning a 200 response code and just happen to fail midway through returning them.

This PR doesn't change any behaviors, it just modifies things to be more permissive and allows some queries with mixed array schemas to succeed now where they previously might have failed by wrapping the scalar values into a single element array (which is consistent with the native layer behavior and why none of the other query types have this issue).


This PR has:

  • been self-reviewed.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@gianm gianm merged commit 795e342 into apache:master Mar 13, 2024
83 checks passed
@clintropolis clintropolis deleted the fix-sql-array-result-mixed-array-scalar branch March 13, 2024 06:50
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants