Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reading spark log json file leads to PanicException #6703

Closed
2 tasks done
romanovacca opened this issue Feb 6, 2023 · 2 comments · Fixed by #6785
Closed
2 tasks done

reading spark log json file leads to PanicException #6703

romanovacca opened this issue Feb 6, 2023 · 2 comments · Fixed by #6785
Labels
bug Something isn't working rust Related to Rust Polars

Comments

@romanovacca
Copy link
Contributor

romanovacca commented Feb 6, 2023

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

I'm trying to read autogenerated spark log files which automatically get created when a spark job runs. The log file consists of different 'events' and for some reason a panic exception occurs when I try to read the "org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart" event provided in the example.

Screenshot 2023-02-06 at 20 56 17

Included is the data on which the error occurs
test-spark.zip

Note that other parts of the file are read fine by the parser and i left them out to make it easier to debug.

Reproducible example

import polars as pl
pl.read_ndjson("test-spark")

Expected behavior

I would expect this to work and return all the values, and if a part of the json is empty, it should return it as empty/null

Let say this is intended, then at least I would expect to not get a panic exception.
To me it sounds like at some point in the processing we are indexing an empty vector. So adding some code to check this and prevent the issue would be a better solution.

Installed versions

'0.16.1'

@romanovacca romanovacca added bug Something isn't working rust Related to Rust Polars labels Feb 6, 2023
@cmdlineluser
Copy link
Contributor

cmdlineluser commented Feb 10, 2023

It looks to be because of the empty list in "children": [] on line 26.

This appears to be a MRE:

import io
import polars as pl

pl.read_ndjson(io.StringIO("""{"foo": {"bar": []}}"""))
PanicException: index out of bounds: the len is 0 but the index is 0
thread '<unnamed>' panicked at 
'index out of bounds: the len is 0 but the index is 0', 
../polars-io/src/ndjson_core/buffer.rs:171:41

https://github.com/pola-rs/polars/blob/master/polars/polars-io/src/ndjson_core/buffer.rs#L170

@ritchie46
Copy link
Member

Thanks for the breakdown @cmdlineluser 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rust Related to Rust Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants