-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve autovectorization of to_lowercase / to_uppercase functions #123778
base: master
Are you sure you want to change the base?
Improve autovectorization of to_lowercase / to_uppercase functions #123778
Conversation
r? @the8472 The assembly for x86 and aarch64 can also be seen at https://rust.godbolt.org/z/x6T65nE8E |
Hi @jhorstmann do you have any benchmark for this? Thx! |
@Marcondiro only the microbenchmark included in this PR. On my machine (Intel i9-11900KB) the performance increases by nearly 3x. This is without any target-specific compiler flags, rerunning them now with:
Before
After
|
This comment was marked as outdated.
This comment was marked as outdated.
Thanks for running the benchmarks, glad that there is no regression on arm. The improvement on x86 mostly comes from the usage of the |
This comment was marked as outdated.
This comment was marked as outdated.
f20f9e6
to
2a52a46
Compare
Thanks again, I was able to reproduce this on an aws c7g instance. It seems there were some bounds check remaining in the generated assembly, which were removed by the compiler on x86_64 and also in the simplified version I checked in compiler explorer. Adding some Before
After
|
☔ The latest upstream changes (presumably #124773) made this pull request unmergeable. Please resolve the merge conflicts. |
2a52a46
to
6c58e74
Compare
@the8472 can you take a look at this PR? It might have been lost in the review queue. |
A couple of thoughts:
|
6c58e74
to
b03d939
Compare
Updated benchmark results:
Benchmark results in the initial PR comments were using a different input, these are using the same input strings as several existing benchmarks. I did not get around yet to rerunning on aarch64. Update: Above results were with
|
Benchmark results on aarch64 (Apple M1)
Great results even without |
@bors r+ |
@the8472 thank you for the patience. I updated the PR and also squashed the previous commits. I could not get the codegen tests to work with test revisions, since it requires the std or at least core library. All other codegen tests with revisions that I looked at are actually using Another change is that I removed the code duplication in the codegen test and instead made |
…-vectorization, r=the8472 Improve autovectorization of to_lowercase / to_uppercase functions Refactor the code in the `convert_while_ascii` helper function to make it more suitable for auto-vectorization and also process the full ascii prefix of the string. The generic case conversion logic will only be invoked starting from the first non-ascii character. The runtime on a microbenchmark with a small ascii-only input decreases from ~55ns to ~18ns per iteration. The new implementation also reduces the amount of unsafe code and encapsulates all unsafe inside the helper function. Fixes rust-lang#123712
💔 Test failed - checks-actions |
This comment has been minimized.
This comment has been minimized.
looks like a network error @bors retry |
…-vectorization, r=the8472 Improve autovectorization of to_lowercase / to_uppercase functions Refactor the code in the `convert_while_ascii` helper function to make it more suitable for auto-vectorization and also process the full ascii prefix of the string. The generic case conversion logic will only be invoked starting from the first non-ascii character. The runtime on a microbenchmark with a small ascii-only input decreases from ~55ns to ~18ns per iteration. The new implementation also reduces the amount of unsafe code and encapsulates all unsafe inside the helper function. Fixes rust-lang#123712
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
@bors r+ |
…-vectorization, r=the8472 Improve autovectorization of to_lowercase / to_uppercase functions Refactor the code in the `convert_while_ascii` helper function to make it more suitable for auto-vectorization and also process the full ascii prefix of the string. The generic case conversion logic will only be invoked starting from the first non-ascii character. The runtime on a microbenchmark with a small ascii-only input decreases from ~55ns to ~18ns per iteration. The new implementation also reduces the amount of unsafe code and encapsulates all unsafe inside the helper function. Fixes rust-lang#123712
⌛ Testing commit 60a13dd with merge dac43b914138744e2d158dbf140385a5ffd62638... |
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
Refactor the code in the
convert_while_ascii
helper function to make it more suitable for auto-vectorization and also process the full ascii prefix of the string. The generic case conversion logic will only be invoked starting from the first non-ascii character.The runtime on a microbenchmark with a small ascii-only input decreases from ~55ns to ~18ns per iteration. The new implementation also reduces the amount of unsafe code and encapsulates all unsafe inside the helper function.
Fixes #123712