Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(rust, polars): use iterator instead of loop polars_io::csv::parser::skip_condition #5157

Merged
merged 1 commit into from
Dec 27, 2022
Merged

Conversation

dannyvankooten
Copy link
Contributor

This guarantees the removal of bounds checking inside the loop, which allows for a very minor performance gain.

Benchmark was run using Criterion using 1M rows.

Before (using while with indexing):

parse csv               time:   [391.85 ms 394.01 ms 396.03 ms]

After (using iter().position(...)

parse csv               time:   [387.96 ms 389.62 ms 391.05 ms]
                       change: [-1.7745% -1.1159% -0.4741%] (p = 0.00 < 0.05)

Sidenote: I re-used behavior from the old implementation but there is actually a potential OOB if the predicate does not satisfy any of the remaining bytes in the input buffer. Since input.len() is returned in those cases, the consecutive line will trigger an OOB. (This was also the case in the old implementation.) I am unsure how to fix, since a pointer has to be returned...

@ritchie46
Copy link
Member

Nice one. Just shows how often a minor code snippet is executed. :)

Sidenote: I re-used behavior from the old implementation but there is actually a potential OOB if the predicate does not satisfy any of the remaining bytes in the input buffer. Since input.len() is returned in those cases, the consecutive line will trigger an OOB. (This was also the case in the old implementation.) I am unsure how to fix, since a pointer has to be returned...

Right.. We could return an Option. by calling input.get(read..). We already do a bounds check as we do a normal indexing [] operation, so we might as well use a better OOB path than a panic.

@ritchie46 ritchie46 changed the title in polars_io::csv::parser::skip_condition, use iterator instead of loop (very minor perf gain) perf(rust, polars): use iterator instead of loop polars_io::csv::parser::skip_condition Dec 27, 2022
@ritchie46 ritchie46 merged commit 52295bb into pola-rs:master Dec 27, 2022
@github-actions github-actions bot added performance Performance issues or improvements rust Related to Rust Polars labels Dec 27, 2022
zundertj pushed a commit to zundertj/polars that referenced this pull request Jan 7, 2023
…ser::skip_condition` (pola-rs#5157)

Co-authored-by: Danny van Kooten <dannyvankooten@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants