Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with array, UNNEST and JSON_PARSE #16543

Closed
fabricebaranski opened this issue Jun 4, 2024 · 2 comments · Fixed by #16551
Closed

Problem with array, UNNEST and JSON_PARSE #16543

fabricebaranski opened this issue Jun 4, 2024 · 2 comments · Fixed by #16551

Comments

@fabricebaranski
Copy link

Please provide a detailed title (e.g. "Broker crashes when using TopN query with Bound filter" instead of just "Broker crashes").

Affected Version

Druid version 29.0.1

Description

Step 1: Create a datasource using this query (select array ingest mode: array)

INSERT INTO "jsondatasource"
SELECT ARRAY['{"key": "value1"}'] as "jsonarray"
PARTITIONED BY ALL

Step 2: Query your datasource
SELECT PARSE_JSON("json") FROM "jsondatasource" CROSS JOIN UNNEST("jsonarray") AS "json"

I get the following error:

Error: RUNTIME_FAILURE (OPERATOR)

Selector must have a dictionary

org.apache.druid.java.util.common.ISE
@clintropolis
Copy link
Member

thanks for the report, this is a bug caused by an odd interaction between the underlying array column and unnest functionality when used with a string expression on the unnested value. The problem only occurs with string functions with a single input column, which internally tries to use a caching expression selector, which must be used with a dictionary encoded column.

The underlying array column is actually dictionary encoded, but after unnesting it isn't really directly usable anymore (currently at least), but the column metadata is still marked as such, so the expression planning thinks it can use the caching selector because it thinks its dealing with a regular string column, resulting in this error.

In the future i think unnest will be improved to take advantage of the underlying dictionary encoded nature of the array column, but for now I will probably make a simpler fix to just not take that code path (dictionary encoded selectors are for historical reasons pretty coupled with string types at the moment, so its a bit of effort to make that not be the case).

@fabricebaranski
Copy link
Author

Thanks Clint

gianm added a commit to gianm/druid that referenced this issue Jun 5, 2024
UnnestStorageAdapter and its cursors did not return capabilities correctly
for the output column. This patch fixes two problems:

1) UnnestStorageAdapter returned the capabilities of the unnest virtual
   column prior to unnesting. It should return the post-unnest capabilities.

2) UnnestColumnValueSelectorCursor passed through isDictionaryEncoded from
   the unnest virtual column. This is incorrect, because the dimension selector
   created by this class never has a dictionary. This is the cause of apache#16543.
gianm added a commit that referenced this issue Jun 5, 2024
UnnestStorageAdapter and its cursors did not return capabilities correctly
for the output column. This patch fixes two problems:

1) UnnestStorageAdapter returned the capabilities of the unnest virtual
   column prior to unnesting. It should return the post-unnest capabilities.

2) UnnestColumnValueSelectorCursor passed through isDictionaryEncoded from
   the unnest virtual column. This is incorrect, because the dimension selector
   created by this class never has a dictionary. This is the cause of #16543.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants