Fix ParquetFooter parsing of legacy array-of-struct format #1475
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While testing with cudf and RAPIDS Accelerator fixes to allow proper parsing of the parquet-testing repeated_no_annotation.parquet file, I discovered ParquetFooter had issues parsing the Parquet schema of this file. The comments enumerating the various ways an array can appear state that a list could contain more than one child directly, but the code did not handle that case.
This PR updates the native Parquet parsing logic to handle the case where a list contains more than one child. Verified this helps fix the RAPIDS Accelerator parsing of the problematic repeated_no_annotation.parquet file.