fix unnest bugs #16690

clintropolis · 2024-07-04T00:59:11Z

This PR fixes a regression caused by #16551 which modified the UnnestStorageAdapter to return the output capabilities of the unnest column instead of the capabilities pre-unnest, which was totally cool and correct but was leaving out setting setDictionaryValuesUnique which is also used by unnest to check if it can use a dimension cursor.

Though admittedly totally contrived, if used again in an UNNEST operator this would result in different behavior than prior to this change because the loss of areDictionaryValuesUnique would result in using the column value selector cursor instead of dimension selector based cursor which have different handling of null values due to the dimension cursors attempts to be compatible with the implicit unnest used by group-by and topN queries on mvds.

While writing tests for this, I found another bug that could happen for any mvd rows with 0 size against any segment where the null value was not dictionary id 0 (such as realtime segments or segments without null values in the column). I fixed this by reporting 0 as the size of the unnested row if the underlying row is 0, which aligns behavior with the implicit unnest done by group-by and topN queries (even though they are not consistent with each other, see #5897).

Finally, another bug with MSQ frames which has some mismatch where the writers are hard-coded with multi-value true while the window 'rows and columns' readers are hard-coded to use false, which results in some errors. It looks like window stuff doesn't really support multi-value strings so maybe the solution there is to just explode? But i didn't do that for now, in this case i just made the field writer detect if the input is multi-value or not so it isn't hard-coded to true (which handle 0 length rows differently than single valued)

…umns dictionary uniqueness when allowing dimension selector cursor, fixes a bug with unnest on realtime segments with empty rows incorrectly specifying index 0 as the row dictionary value

…r-capabilities-more

…-value

clintropolis · 2024-07-11T12:01:16Z

closing in favor of #16723 for now, will revisit this later

fixes a bug with unnest storage adapter not preserving underlying col…

89f858d

…umns dictionary uniqueness when allowing dimension selector cursor, fixes a bug with unnest on realtime segments with empty rows incorrectly specifying index 0 as the row dictionary value

clintropolis added Bug Area - Querying labels Jul 4, 2024

github-actions bot added the Area - Segment Format and Ser/De label Jul 4, 2024

clintropolis added 2 commits July 3, 2024 18:35

dont rollup to make test results easier to understand

8718182

fix one test

e6c2e25

github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jul 4, 2024

clintropolis added 7 commits July 8, 2024 13:57

Merge remote-tracking branch 'upstream/master' into fix-unnest-adapte…

4d7c361

…r-capabilities-more

only write multi-value StringFieldWriter if data is detected as multi…

ce64a31

…-value

less aggro multi-value

f4fe598

dont plan array_to_mv into a postagg

f411e7b

fix complaining bot

1e36917

fix test

4b31d3b

better test fix

cfaedc8

clintropolis added the WIP label Jul 11, 2024

clintropolis mentioned this pull request Jul 11, 2024

fix unnest bugs #16723

Merged

6 tasks

clintropolis closed this Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix unnest bugs #16690

fix unnest bugs #16690

clintropolis commented Jul 4, 2024 •

edited

Loading

clintropolis commented Jul 11, 2024

fix unnest bugs #16690

fix unnest bugs #16690

Conversation

clintropolis commented Jul 4, 2024 • edited Loading

clintropolis commented Jul 11, 2024

clintropolis commented Jul 4, 2024 •

edited

Loading