-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Parquet unsigned int scan test failure #7213
Comments
I can make this reliably fail on my desktop by applying the following patch:
and running:
I tried running this with compute-sanitizer, but it did not find any errors. |
Tracked this down to some bad offsets for a LIST column being generated from the chunked Parquet reader. @nvdbaranec was able to reproduce the problem in pure C++ code for libcudf. |
Triaged this. It turns out this is another case of a malformed (but plausible in the wild) parquet file. Essentially, we have a table with 2 rows in it. However, 3 of the columns (members of a struct) contain 4 values which we translate as 4 rows. This causes the figure-out-chunks code to blow up quietly. We had a similar issue a while back that got fixed in the reader itself but it looks like it has re-appeared in this new code path. The filename jogged my memory too. nested-unsigned.parquet. It turns out this was the same file that cause the earlier issue |
…2360) Fixes: NVIDIA/spark-rapids#7213 NVIDIA/spark-rapids#7228 This adds code to detect a subset of possible malformed parquet page data. Specifically: where the input file contains N rows, but the page data for some (non-list) columns contains a number of values != N. This is a very lightweight check. There is an associated PR for the spark plugin that should be merged immediately after this one (otherwise builds will fail) so I'm adding the Do Not Merge tag. Authors: - https://github.com/nvdbaranec Approvers: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) - Mike Wilson (https://github.com/hyperbolic2346) URL: #12360
Have seen the following failure in premerge and nightly builds of 23.02 where one of the tests in ParquetScanSuite fails:
The text was updated successfully, but these errors were encountered: