Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ParquetFooter parsing of legacy array-of-struct format #1475

Merged
merged 1 commit into from
Oct 4, 2023

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Oct 4, 2023

While testing with cudf and RAPIDS Accelerator fixes to allow proper parsing of the parquet-testing repeated_no_annotation.parquet file, I discovered ParquetFooter had issues parsing the Parquet schema of this file. The comments enumerating the various ways an array can appear state that a list could contain more than one child directly, but the code did not handle that case.

This PR updates the native Parquet parsing logic to handle the case where a list contains more than one child. Verified this helps fix the RAPIDS Accelerator parsing of the problematic repeated_no_annotation.parquet file.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe requested a review from revans2 October 4, 2023 17:12
@jlowe jlowe self-assigned this Oct 4, 2023
@jlowe
Copy link
Member Author

jlowe commented Oct 4, 2023

build

@jlowe jlowe merged commit 05326c9 into NVIDIA:branch-23.12 Oct 4, 2023
3 checks passed
@jlowe jlowe deleted the jni-footer-array-fix branch October 4, 2023 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants