Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix unnest bugs #16690

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Jul 4, 2024

This PR fixes a regression caused by #16551 which modified the UnnestStorageAdapter to return the output capabilities of the unnest column instead of the capabilities pre-unnest, which was totally cool and correct but was leaving out setting setDictionaryValuesUnique which is also used by unnest to check if it can use a dimension cursor.

Though admittedly totally contrived, if used again in an UNNEST operator this would result in different behavior than prior to this change because the loss of areDictionaryValuesUnique would result in using the column value selector cursor instead of dimension selector based cursor which have different handling of null values due to the dimension cursors attempts to be compatible with the implicit unnest used by group-by and topN queries on mvds.

While writing tests for this, I found another bug that could happen for any mvd rows with 0 size against any segment where the null value was not dictionary id 0 (such as realtime segments or segments without null values in the column). I fixed this by reporting 0 as the size of the unnested row if the underlying row is 0, which aligns behavior with the implicit unnest done by group-by and topN queries (even though they are not consistent with each other, see #5897).

Finally, another bug with MSQ frames which has some mismatch where the writers are hard-coded with multi-value true while the window 'rows and columns' readers are hard-coded to use false, which results in some errors. It looks like window stuff doesn't really support multi-value strings so maybe the solution there is to just explode? But i didn't do that for now, in this case i just made the field writer detect if the input is multi-value or not so it isn't hard-coded to true (which handle 0 length rows differently than single valued)

…umns dictionary uniqueness when allowing dimension selector cursor, fixes a bug with unnest on realtime segments with empty rows incorrectly specifying index 0 as the row dictionary value
@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jul 4, 2024
@clintropolis clintropolis mentioned this pull request Jul 11, 2024
6 tasks
@clintropolis
Copy link
Member Author

closing in favor of #16723 for now, will revisit this later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying Area - Segment Format and Ser/De Bug WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant