Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Introduce MemReader to file buffer in Parquet reader #17712

Merged
merged 1 commit into from
Jul 19, 2024

Conversation

coastalwhite
Copy link
Collaborator

@coastalwhite coastalwhite commented Jul 18, 2024

This PR introduces three structures:

  • MemReader: Abstraction over part of a Parquet file loaded into memory
  • MemReaderSlice: A slice of a MemReader. This should should not be kept around outside the Parquet crate.
  • CowBuffer: A Cow that abstracts between a MemReaderSlice and a Vec<u8>.

Following this PR, we can avoid copying the memory around in the Parquet crate. This also allows us to guarantee that the memory is in RAM and does not need to be loaded from disk anymore. Leading to less page faults.

Later it might be useful to make the reads a bit more granular so that we don't need to load unnecessary data pages.

This PR introduces three structures:

- `MemReader`: Abstraction over part of a Parquet file loaded into memory
- `MemReaderSlice`: A slice of a `MemReader`. This should should not be kept around outside the Parquet crate.
- `CowBuffer`: A Cow that abstracts between a `MemReaderSlice` and a `Vec<u8>`.

Following this crate, we can avoid copying the memory around in the Parquet crate. This also allows us to guarantee that the memory is in RAM and does not need to be loaded from disk anymore. Leading to less page faults.

Later it might be useful to make the reads a bit more granular so that we don't need to load unnecessary data pages.
@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Jul 18, 2024
Copy link

codecov bot commented Jul 18, 2024

Codecov Report

Attention: Patch coverage is 67.93249% with 76 lines in your changes missing coverage. Please review.

Project coverage is 80.38%. Comparing base (ebba58d) to head (dec9f03).
Report is 10 commits behind head on main.

Files Patch % Lines
.../polars-parquet/src/parquet/read/page/memreader.rs 67.30% 34 Missing ⚠️
...rates/polars-parquet/src/arrow/write/dictionary.rs 40.00% 12 Missing ⚠️
...tes/polars-parquet/src/parquet/read/page/reader.rs 71.87% 9 Missing ⚠️
crates/polars-parquet/src/parquet/page/mod.rs 76.19% 5 Missing ⚠️
...arrow/read/deserialize/fixed_size_binary/nested.rs 0.00% 4 Missing ⚠️
...s/polars-parquet/src/arrow/read/deserialize/mod.rs 0.00% 3 Missing ⚠️
...tes/polars-parquet/src/parquet/read/compression.rs 57.14% 3 Missing ⚠️
.../arrow/read/deserialize/fixed_size_binary/basic.rs 33.33% 2 Missing ⚠️
...w/read/deserialize/fixed_size_binary/dictionary.rs 0.00% 1 Missing ⚠️
...quet/src/arrow/read/deserialize/primitive/basic.rs 0.00% 1 Missing ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17712      +/-   ##
==========================================
- Coverage   80.46%   80.38%   -0.09%     
==========================================
  Files        1498     1501       +3     
  Lines      196457   196723     +266     
  Branches     2790     2793       +3     
==========================================
+ Hits       158075   158131      +56     
- Misses      37869    38079     +210     
  Partials      513      513              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 merged commit a212ce9 into pola-rs:main Jul 19, 2024
23 checks passed
@c-peters c-peters added the accepted Ready for implementation label Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants