Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(rust, python): Improve rechunk check #6268

Merged
merged 2 commits into from
Jan 18, 2023
Merged

Conversation

c-peters
Copy link
Collaborator

I was walking through the core and found this rechunk check to be somewhat convoluted. Ran some benchmarks with criterion (I can add them if preferred) and this method was faster as it does not require the hash computations. I'm relatively new to rust / polars so any feedback is welcome.

Additionally I think this could be a little bit faster if we use get_unchecked. This however requires that that the chunks can never have length 0 otherwise it will go out of bounds

@ghuls
Copy link
Collaborator

ghuls commented Jan 16, 2023

Can you benchmark it when you have a few 100k columns and 20k rows and multiple chunks?
Some Polars code was relatively slow with 1M columns (while being very fast for a reasonable number of columns).

@c-peters
Copy link
Collaborator Author

Added the benchmarks for more clarity. Spoke to @ritchie46, will try to create a version without allocating the vec for the first column and/or one where we implement a fast path for single chunk columns

@c-peters c-peters marked this pull request as draft January 16, 2023 17:50
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank your @c-peters. I took some inspiration from your approach and made a suggestion for variant which elides the Vec allocation.

I think this will perform better when we deal with real world allocations.

polars/benches/rechunk.rs Outdated Show resolved Hide resolved
polars/Cargo.toml Outdated Show resolved Hide resolved
polars/polars-core/src/frame/mod.rs Show resolved Hide resolved
@c-peters
Copy link
Collaborator Author

Benchmarks

Table of Contents

Benchmark Results

All tests were performed with 20.000 rows. The rows of the tables denote the number of columns used for the benchmark

Equal 1 Chunk

New + FastPath New Current
5 12.42 ns (✅ 1.00x) 31.69 ns (❌ 2.55x slower) 60.83 ns (❌ 4.90x slower)
100 186.04 ns (✅ 1.00x) 437.88 ns (❌ 2.35x slower) 771.80 ns (❌ 4.15x slower)
1000 2.15 us (✅ 1.00x) 7.61 us (❌ 3.54x slower) 14.91 us (❌ 6.93x slower)
10000 60.02 us (✅ 1.00x) 99.15 us (❌ 1.65x slower) 216.40 us (❌ 3.61x slower)
100000 1.95 ms (✅ 1.00x) 4.72 ms (❌ 2.43x slower) 8.19 ms (❌ 4.21x slower)

Equal 100 Chunks

New + FastPath New Current
5 853.60 ns (✅ 1.00x) 928.92 ns (✅ 1.09x slower) 1.97 us (❌ 2.31x slower)
100 30.43 us (✅ 1.00x) 27.39 us (✅ 1.11x faster) 73.20 us (❌ 2.41x slower)
1000 621.77 us (✅ 1.00x) 569.57 us (✅ 1.09x faster) 1.40 ms (❌ 2.26x slower)
10000 9.91 ms (✅ 1.00x) 10.16 ms (✅ 1.03x slower) 26.22 ms (❌ 2.65x slower)
100000 93.16 ms (✅ 1.00x) 88.38 ms (✅ 1.05x faster) 262.00 ms (❌ 2.81x slower)

Unequal Early

The columns of the dataframe alternate between having 1 and 2 chunks.

New + FastPath New Current
5 5.02 ns (✅ 1.00x) 18.69 ns (❌ 3.72x slower) 40.68 ns (❌ 8.10x slower)
100 5.00 ns (✅ 1.00x) 18.31 ns (❌ 3.66x slower) 40.72 ns (❌ 8.14x slower)
1000 4.94 ns (✅ 1.00x) 19.63 ns (❌ 3.97x slower) 40.44 ns (❌ 8.18x slower)
10000 5.00 ns (✅ 1.00x) 18.36 ns (❌ 3.67x slower) 40.32 ns (❌ 8.06x slower)

Unequal 10 Chunks Late

The last two chunks of the last column in the dataframe are altered such that they do not align with the others.

New + FastPath New Current
5 115.81 ns (✅ 1.00x) 110.91 ns (✅ 1.04x faster) 219.90 ns (❌ 1.90x slower)
100 2.83 us (✅ 1.00x) 2.33 us (✅ 1.22x faster) 5.18 us (❌ 1.83x slower)
1000 54.34 us (✅ 1.00x) 50.92 us (✅ 1.07x faster) 108.70 us (❌ 2.00x slower)
10000 1.16 ms (✅ 1.00x) 987.85 us (✅ 1.18x faster) 3.20 ms (❌ 2.75x slower)
100000 21.58 ms (✅ 1.00x) 17.93 ms (✅ 1.20x faster) 46.58 ms (❌ 2.16x slower)

Made with criterion-table

@c-peters c-peters marked this pull request as ready for review January 18, 2023 12:38
@ritchie46
Copy link
Member

Cool table! Thanks a lot Chiel!

@ritchie46 ritchie46 changed the title refactor(rust): Improve rechunk check perf(rust, python): Improve rechunk check Jan 18, 2023
@ritchie46 ritchie46 merged commit 7331f6e into pola-rs:master Jan 18, 2023
@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Jan 18, 2023
@c-peters c-peters deleted the rechunk branch July 14, 2023 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants