Improve autovectorization of to_lowercase / to_uppercase functions #123778

jhorstmann · 2024-04-11T08:03:33Z

Refactor the code in the convert_while_ascii helper function to make it more suitable for auto-vectorization and also process the full ascii prefix of the string. The generic case conversion logic will only be invoked starting from the first non-ascii character.

The runtime on a microbenchmark with a small ascii-only input decreases from ~55ns to ~18ns per iteration. The new implementation also reduces the amount of unsafe code and encapsulates all unsafe inside the helper function.

Fixes #123712

rustbot · 2024-04-11T08:03:41Z

r? @cuviper

rustbot has assigned @cuviper.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

jhorstmann · 2024-04-11T08:18:46Z

r? @the8472

The assembly for x86 and aarch64 can also be seen at https://rust.godbolt.org/z/x6T65nE8E

Marcondiro · 2024-05-05T21:37:25Z

Hi @jhorstmann do you have any benchmark for this? Thx!

jhorstmann · 2024-05-06T07:16:10Z

@Marcondiro only the microbenchmark included in this PR. On my machine (Intel i9-11900KB) the performance increases by nearly 3x. This is without any target-specific compiler flags, rerunning them now with:

./x.py bench library/alloc/ --stage 0 --test-args to_lowercase

Before

benchmarks:
    string::bench_to_lowercase 57.00ns/iter +/- 1.00ns

After

benchmarks:
    string::bench_to_lowercase 20.00ns/iter +/- 1.00ns

library/alloc/benches/string.rs

jhorstmann · 2024-05-08T07:26:08Z

Thanks for running the benchmarks, glad that there is no regression on arm. The improvement on x86 mostly comes from the usage of the pmovmskb instruction, the equivalent on arm requires multiple instructions with higher latency/lower throughput.

jhorstmann · 2024-05-09T12:55:13Z

Thanks again, I was able to reproduce this on an aws c7g instance. It seems there were some bounds check remaining in the generated assembly, which were removed by the compiler on x86_64 and also in the simplified version I checked in compiler explorer. Adding some assume fixes that and the benchmarks on c7g / Neoverse-V1 are now

Before

benchmarks:
    string::bench_to_lowercase 61.00ns/iter +/- 0.00ns

After

benchmarks:
    string::bench_to_lowercase 28.00ns/iter +/- 0.00ns

bors · 2024-05-09T19:52:49Z

☔ The latest upstream changes (presumably #124773) made this pull request unmergeable. Please resolve the merge conflicts.

jhorstmann · 2024-05-13T07:44:10Z

@the8472 can you take a look at this PR? It might have been lost in the review queue.

Marcondiro · 2024-05-13T15:04:06Z

A couple of thoughts:

A similar thing is done also here:

rust/library/core/src/slice/ascii.rs

Lines 328 to 341 in abb9563

    
           /// Optimized ASCII test that will use usize-at-a-time operations instead of 
        
           /// byte-at-a-time operations (when possible). 
        
           /// 
        
           /// The algorithm we use here is pretty simple. If `s` is too short, we just 
        
           /// check each byte and be done with it. Otherwise: 
        
           /// 
        
           /// - Read the first word with an unaligned load. 
        
           /// - Align the pointer, read subsequent words until end with aligned loads. 
        
           /// - Read the last `usize` from `s` with an unaligned load. 
        
           /// 
        
           /// If any of these loads produces something for which `contains_nonascii` 
        
           /// (above) returns true, then we know the answer is false. 
        
           #[inline] 
        
           const fn is_ascii(s: &[u8]) -> bool {

maybe would be nice to somehow merge the two?

Would be interesting to understand why the compiler doesn't use pmovmskb if not writing the code in this specific way (eg. using & 0x8080808080808080)

library/alloc/src/str.rs

jhorstmann · 2024-06-02T20:54:05Z

Updated benchmark results:

master (#eda9d7f987de76b9d61c633a6ac328936e1b94f0)
benchmarks:
    str::to_lowercase::long_lorem_ipsum  534.48/iter +/- 33.86
    str::to_lowercase::short_ascii        20.11/iter  +/- 0.53
    str::to_lowercase::short_mixed       240.21/iter  +/- 3.40
    str::to_lowercase::short_pile_of_poo 262.54/iter  +/- 8.33

PR (#b03d93962816fd82afb619e0cf2083dc67e218e8)
benchmarks:
    str::to_lowercase::long_lorem_ipsum  148.49/iter +/- 1.24
    str::to_lowercase::short_ascii        12.14/iter +/- 0.13
    str::to_lowercase::short_mixed       240.31/iter +/- 3.51
    str::to_lowercase::short_pile_of_poo 259.42/iter +/- 7.45

Benchmark results in the initial PR comments were using a different input, these are using the same input strings as several existing benchmarks. I did not get around yet to rerunning on aarch64.

Update: Above results were with -Ctarget-cpu=native, with default target the improvements are bigger:

master
    str::to_lowercase::long_lorem_ipsum  1027.63/iter +/- 39.76
    str::to_lowercase::short_ascii         34.39/iter  +/- 0.73
    str::to_lowercase::short_mixed        262.95/iter +/- 16.07
    str::to_lowercase::short_pile_of_poo  261.71/iter +/- 15.26

PR
    str::to_lowercase::long_lorem_ipsum  175.39/iter  +/- 3.18
    str::to_lowercase::short_ascii        12.25/iter  +/- 0.30
    str::to_lowercase::short_mixed       237.15/iter +/- 11.36
    str::to_lowercase::short_pile_of_poo 262.57/iter  +/- 8.37

library/alloc/src/str.rs

library/alloc/tests/str.rs

Marcondiro · 2024-06-04T09:44:07Z

Benchmark results on aarch64 (Apple M1)

master:
    str::to_lowercase::long_lorem_ipsum  523.29/iter  +/- 3.10
    str::to_lowercase::short_ascii        33.31/iter  +/- 0.29
    str::to_lowercase::short_mixed       299.64/iter +/- 23.58
    str::to_lowercase::short_pile_of_poo 295.03/iter  +/- 9.97

PR:
    str::to_lowercase::long_lorem_ipsum  129.03/iter  +/- 2.37
    str::to_lowercase::short_ascii        23.27/iter  +/- 0.49
    str::to_lowercase::short_mixed       271.17/iter +/- 31.33
    str::to_lowercase::short_pile_of_poo 272.03/iter  +/- 3.60

Great results even without pmovmskb 👍

the8472 · 2024-06-25T22:27:29Z

@bors r+

jhorstmann · 2024-09-18T21:38:38Z

@the8472 thank you for the patience. I updated the PR and also squashed the previous commits. I could not get the codegen tests to work with test revisions, since it requires the std or at least core library. All other codegen tests with revisions that I looked at are actually using no_core, probably for that reason. Instead it is now only run for x86-64, I think this should be ok since the autovectorization on llvm ir level should be mostly backend independent, assuming the backend has some form of simd support.

Another change is that I removed the code duplication in the codegen test and instead made convert_while_ascii public under the str_internals feature. There is precedent for this in #111222 for the is_ascii_simple function.

…-vectorization, r=the8472 Improve autovectorization of to_lowercase / to_uppercase functions Refactor the code in the `convert_while_ascii` helper function to make it more suitable for auto-vectorization and also process the full ascii prefix of the string. The generic case conversion logic will only be invoked starting from the first non-ascii character. The runtime on a microbenchmark with a small ascii-only input decreases from ~55ns to ~18ns per iteration. The new implementation also reduces the amount of unsafe code and encapsulates all unsafe inside the helper function. Fixes rust-lang#123712

bors · 2024-09-19T00:31:28Z

⌛ Testing commit 58b23cb with merge 2136b65...

bors · 2024-09-19T00:35:25Z

💔 Test failed - checks-actions

the8472 · 2024-09-19T09:57:24Z

looks like a network error

@bors retry

…-vectorization, r=the8472 Improve autovectorization of to_lowercase / to_uppercase functions Refactor the code in the `convert_while_ascii` helper function to make it more suitable for auto-vectorization and also process the full ascii prefix of the string. The generic case conversion logic will only be invoked starting from the first non-ascii character. The runtime on a microbenchmark with a small ascii-only input decreases from ~55ns to ~18ns per iteration. The new implementation also reduces the amount of unsafe code and encapsulates all unsafe inside the helper function. Fixes rust-lang#123712

bors · 2024-09-19T11:55:00Z

⌛ Testing commit 58b23cb with merge 2c75a64...

bors · 2024-09-19T12:48:46Z

💔 Test failed - checks-actions

… builds

the8472 · 2024-09-19T22:49:00Z

@bors r+

bors · 2024-09-19T22:49:02Z

📌 Commit 60a13dd has been approved by the8472

It is now in the queue for this repository.

…-vectorization, r=the8472 Improve autovectorization of to_lowercase / to_uppercase functions Refactor the code in the `convert_while_ascii` helper function to make it more suitable for auto-vectorization and also process the full ascii prefix of the string. The generic case conversion logic will only be invoked starting from the first non-ascii character. The runtime on a microbenchmark with a small ascii-only input decreases from ~55ns to ~18ns per iteration. The new implementation also reduces the amount of unsafe code and encapsulates all unsafe inside the helper function. Fixes rust-lang#123712

bors · 2024-09-19T23:56:29Z

⌛ Testing commit 60a13dd with merge dac43b914138744e2d158dbf140385a5ffd62638...

rust-log-analyzer · 2024-09-20T00:17:32Z

The job x86_64-gnu-llvm-19 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

failures:

---- [codegen] tests/codegen/issues/issue-111508-vec-tryinto-array.rs stdout ----

error: verification with 'FileCheck' failed
status: exit status: 1
command: "/usr/lib/llvm-19/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issues/issue-111508-vec-tryinto-array/issue-111508-vec-tryinto-array.ll" "/checkout/tests/codegen/issues/issue-111508-vec-tryinto-array.rs" "--check-prefix=CHECK" "--check-prefix" "NONMSVC" "--allow-unused-prefixes" "--dump-input-context" "100"
--- stderr -------------------------------
/checkout/tests/codegen/issues/issue-111508-vec-tryinto-array.rs:12:15: error: CHECK-NOT: excluded string found in input
/checkout/tests/codegen/issues/issue-111508-vec-tryinto-array.rs:12:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: unwrap_failed
              ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issues/issue-111508-vec-tryinto-array/issue-111508-vec-tryinto-array.ll:162:24: note: found here
; invoke core::result::unwrap_failed

Input file: /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issues/issue-111508-vec-tryinto-array/issue-111508-vec-tryinto-array.ll
Check file: /checkout/tests/codegen/issues/issue-111508-vec-tryinto-array.rs


-dump-input=help explains the following input dump.
Input was:
<<<<<<
        .
        .
        .
        .
       62:  tail call void @__rust_dealloc(ptr noundef nonnull %self4.i.i.i.i3, i64 noundef %_5.i.i.i.i1, i64 noundef 1) #8, !noalias !24 
       63:  br label %"_ZN4core3ptr53drop_in_place$LT$alloc..raw_vec..RawVec$LT$u8$GT$$GT$17h6c178d1dea818e7dE.exit4" 
       64:  
       65: "_ZN4core3ptr53drop_in_place$LT$alloc..raw_vec..RawVec$LT$u8$GT$$GT$17h6c178d1dea818e7dE.exit4": ; preds = %bb4, %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2" 
       66:  ret void 
       67: } 
       68:  
       69: ; <alloc::vec::Vec<T,A> as core::fmt::Debug>::fmt 
       70: ; Function Attrs: nonlazybind uwtable 
       71: define internal noundef zeroext i1 @"_ZN65_$LT$alloc..vec..Vec$LT$T$C$A$GT$$u20$as$u20$core..fmt..Debug$GT$3fmt17h071a08ce47b992ffE"(ptr noalias nocapture noundef readonly align 8 dereferenceable(24) %self, ptr noalias noundef align 8 dereferenceable(64) %f) unnamed_addr #0 personality ptr @rust_eh_personality { 
       72: start: 
       73:  %entry.i.i = alloca [8 x i8], align 8 
       74:  %_5.i = alloca [16 x i8], align 8 
       75:  %self1 = load ptr, ptr %self, align 8, !nonnull !3, !noundef !3 
       76:  %0 = getelementptr inbounds i8, ptr %self, i64 16 
       77:  %len = load i64, ptr %0, align 8, !noundef !3 
       78:  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %_5.i), !noalias !25 
       79: ; call core::fmt::Formatter::debug_list 
       80:  call void @_ZN4core3fmt9Formatter10debug_list17h4f2f427b0842a3ebE(ptr noalias nocapture noundef nonnull sret([16 x i8]) align 8 dereferenceable(16) %_5.i, ptr noalias noundef nonnull align 8 dereferenceable(64) %f), !noalias !29 
       81:  %_11.i = getelementptr inbounds i8, ptr %self1, i64 %len 
       82:  %1 = icmp eq i64 %len, 0 
       83:  br i1 %1, label %"_ZN48_$LT$$u5b$T$u5d$$u20$as$u20$core..fmt..Debug$GT$3fmt17hd710647428764915E.exit", label %bb5.i.i 
       84:  
       85: bb5.i.i: ; preds = %start, %bb5.i.i 
       86:  %iter.sroa.4.06.i.i = phi ptr [ %_24.i.i.i, %bb5.i.i ], [ %self1, %start ] 
       87:  %_24.i.i.i = getelementptr inbounds i8, ptr %iter.sroa.4.06.i.i, i64 1 
       88:  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %entry.i.i), !noalias !30 
       89:  store ptr %iter.sroa.4.06.i.i, ptr %entry.i.i, align 8, !noalias !30 
       90: ; call core::fmt::builders::DebugList::entry 
       91:  %_9.i.i = call noundef align 8 dereferenceable(16) ptr @_ZN4core3fmt8builders9DebugList5entry17h0e93e15c1edda619E(ptr noalias noundef nonnull align 8 dereferenceable(16) %_5.i, ptr noundef nonnull align 1 %entry.i.i, ptr noalias noundef nonnull readonly align 8 dereferenceable(32) @vtable.0) 
       92:  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %entry.i.i), !noalias !30 
       93:  %2 = icmp eq ptr %_24.i.i.i, %_11.i 
       94:  br i1 %2, label %"_ZN48_$LT$$u5b$T$u5d$$u20$as$u20$core..fmt..Debug$GT$3fmt17hd710647428764915E.exit", label %bb5.i.i 
       95:  
       96: "_ZN48_$LT$$u5b$T$u5d$$u20$as$u20$core..fmt..Debug$GT$3fmt17hd710647428764915E.exit": ; preds = %bb5.i.i, %start 
       97: ; call core::fmt::builders::DebugList::finish 
       98:  %_0.i = call noundef zeroext i1 @_ZN4core3fmt8builders9DebugList6finish17hc2340632c9bfa6bfE(ptr noalias noundef nonnull align 8 dereferenceable(16) %_5.i) 
       99:  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %_5.i), !noalias !25 
      100:  ret i1 %_0.i 
      101: } 
      102:  
      103: ; Function Attrs: nonlazybind uwtable 
      104: define noundef i8 @example(ptr noalias nocapture noundef readonly align 8 dereferenceable(24) %a) unnamed_addr #0 personality ptr @rust_eh_personality { 
      105: start: 
      106:  %e.i = alloca [24 x i8], align 8 
      107:  %_5.sroa.5 = alloca [16 x i8], align 8 
      108:  %0 = getelementptr inbounds i8, ptr %a, i64 16 
      109:  %_2 = load i64, ptr %0, align 8, !noundef !3 
      110:  %1 = icmp eq i64 %_2, 32 
      111:  br i1 %1, label %bb2, label %bb1 
      112:  
      113: bb2: ; preds = %start 
      114:  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %_5.sroa.5) 
      115:  %_5.sroa.0.0.copyload = load ptr, ptr %a, align 8 
      116:  %_5.sroa.5.0.a.sroa_idx = getelementptr inbounds i8, ptr %a, i64 8 
      117:  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %_5.sroa.5, ptr noundef nonnull align 8 dereferenceable(16) %_5.sroa.5.0.a.sroa_idx, i64 16, i1 false) 
      118:  tail call void @llvm.experimental.noalias.scope.decl(metadata !33) 
      119:  tail call void @llvm.experimental.noalias.scope.decl(metadata !36) 
      120:  %_5.sroa.5.8.sroa_idx = getelementptr inbounds i8, ptr %_5.sroa.5, i64 8 
      121:  %_5.sroa.5.8._5.sroa.5.8._5.sroa.5.8._5.sroa.5.16._3.i = load i64, ptr %_5.sroa.5.8.sroa_idx, align 8 
      122:  %_2.not.i = icmp eq i64 %_5.sroa.5.8._5.sroa.5.8._5.sroa.5.8._5.sroa.5.16._3.i, 32 
      123:  br i1 %_2.not.i, label %bb6.i, label %bb2.i 
      124:  
      125: bb6.i: ; preds = %bb2 
      126:  %2 = icmp ne ptr %_5.sroa.0.0.copyload, null 
      127:  tail call void @llvm.assume(i1 %2) 
      128:  %_4.sroa.9.1.self.i.sroa_idx = getelementptr inbounds i8, ptr %_5.sroa.0.0.copyload, i64 15 
      129:  %_4.sroa.9.1.copyload = load i8, ptr %_4.sroa.9.1.self.i.sroa_idx, align 1, !noalias !36 
      130:  %_4.sroa.11.1.self.i.sroa_idx = getelementptr inbounds i8, ptr %_5.sroa.0.0.copyload, i64 24 
      131:  %_4.sroa.11.1.copyload = load i8, ptr %_4.sroa.11.1.self.i.sroa_idx, align 1, !noalias !36 
      132:  tail call void @llvm.experimental.noalias.scope.decl(metadata !38) 
      133:  tail call void @llvm.experimental.noalias.scope.decl(metadata !41) 
      134:  tail call void @llvm.experimental.noalias.scope.decl(metadata !44) 
      135:  tail call void @llvm.experimental.noalias.scope.decl(metadata !47) 
      136:  %_5.sroa.5.0._5.sroa.5.0._5.sroa.5.0._5.sroa.5.8._5.i.i.i.i1.i.i = load i64, ptr %_5.sroa.5, align 8, !alias.scope !50, !noalias !53 
      137:  %3 = icmp eq i64 %_5.sroa.5.0._5.sroa.5.0._5.sroa.5.0._5.sroa.5.8._5.i.i.i.i1.i.i, 0 
      138:  br i1 %3, label %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h443f41158dd09e33E.exit", label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i.i" 
      139:  
      140: "_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i.i": ; preds = %bb6.i 
      141:  tail call void @__rust_dealloc(ptr noundef nonnull %_5.sroa.0.0.copyload, i64 noundef %_5.sroa.5.0._5.sroa.5.0._5.sroa.5.0._5.sroa.5.8._5.i.i.i.i1.i.i, i64 noundef 1) #8, !noalias !55 
      142:  br label %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h443f41158dd09e33E.exit" 
      143:  
      144: bb2.i: ; preds = %bb2 
      145:  %4 = lshr i64 %_5.sroa.5.8._5.sroa.5.8._5.sroa.5.8._5.sroa.5.16._3.i, 8 
      146:  %5 = trunc i64 %4 to i8 
      147:  %_5.sroa.5.0._5.sroa.5.0._5.sroa.5.0._5.sroa.5.8._4.sroa.9.8.copyload8 = load i8, ptr %_5.sroa.5, align 8, !alias.scope !56 
      148:  %_5.sroa.5.1.sroa_idx = getelementptr inbounds i8, ptr %_5.sroa.5, i64 1 
      149:  %_5.sroa.5.1._5.sroa.5.1._5.sroa.5.1._5.sroa.5.9._4.sroa.10.8.copyload9 = load i64, ptr %_5.sroa.5.1.sroa_idx, align 1, !alias.scope !56 
      150:  %6 = getelementptr inbounds i8, ptr %a, i64 18 
      151:  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %_5.sroa.5) 
      152:  call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %e.i), !noalias !57 
      153:  store ptr %_5.sroa.0.0.copyload, ptr %e.i, align 8, !noalias !61 
      154:  %_4.sroa.9.8.e.i.sroa_idx = getelementptr inbounds i8, ptr %e.i, i64 8 
      155:  store i8 %_5.sroa.5.0._5.sroa.5.0._5.sroa.5.0._5.sroa.5.8._4.sroa.9.8.copyload8, ptr %_4.sroa.9.8.e.i.sroa_idx, align 8, !noalias !61 
      156:  %_4.sroa.10.8.e.i.sroa_idx = getelementptr inbounds i8, ptr %e.i, i64 9 
      157:  store i64 %_5.sroa.5.1._5.sroa.5.1._5.sroa.5.1._5.sroa.5.9._4.sroa.10.8.copyload9, ptr %_4.sroa.10.8.e.i.sroa_idx, align 1, !noalias !61 
      158:  %_4.sroa.11.8.e.i.sroa_idx = getelementptr inbounds i8, ptr %e.i, i64 17 
      159:  store i8 %5, ptr %_4.sroa.11.8.e.i.sroa_idx, align 1, !noalias !61 
      160:  %_4.sroa.12.8.e.i.sroa_idx = getelementptr inbounds i8, ptr %e.i, i64 18 
      161:  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 2 dereferenceable(6) %_4.sroa.12.8.e.i.sroa_idx, ptr noundef nonnull align 2 dereferenceable(6) %6, i64 6, i1 false) 
      162: ; invoke core::result::unwrap_failed 
not:12                            !~~~~~~~~~~~~  error: no match expected
      163:  invoke void @_ZN4core6result13unwrap_failed17hda82ba412d85e1ccE(ptr noalias noundef nonnull readonly align 1 @alloc_00ae4b301f7fab8ac9617c03fcbd7274, i64 noundef 43, ptr noundef nonnull align 1 %e.i, ptr noalias noundef nonnull readonly align 8 dereferenceable(32) @vtable.1, ptr noalias noundef nonnull readonly align 8 dereferenceable(32) @alloc_fdf1acc6022b418ce8fc31757b7c7478) #9 
      164:  to label %unreachable.i unwind label %cleanup.i, !noalias !57 
      165:  
      166: cleanup.i: ; preds = %bb2.i 
      167:  %7 = landingpad { ptr, i32 } 
      168:  cleanup 
      169:  call void @llvm.experimental.noalias.scope.decl(metadata !62) 
      170:  call void @llvm.experimental.noalias.scope.decl(metadata !65), !noalias !57 
      171:  call void @llvm.experimental.noalias.scope.decl(metadata !68), !noalias !57 
      172:  call void @llvm.experimental.noalias.scope.decl(metadata !71), !noalias !57 
      173:  %_5.i.i.i.i1.i = load i64, ptr %_4.sroa.9.8.e.i.sroa_idx, align 8, !alias.scope !74, !noalias !77 
      174:  %8 = icmp eq i64 %_5.i.i.i.i1.i, 0 
      175:  br i1 %8, label %bb5.i, label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i" 
      176:  
      177: "_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i": ; preds = %cleanup.i 
      178:  %self4.i.i.i.i3.i = load ptr, ptr %e.i, align 8, !alias.scope !74, !noalias !77, !nonnull !3, !noundef !3 
      179:  call void @__rust_dealloc(ptr noundef nonnull %self4.i.i.i.i3.i, i64 noundef %_5.i.i.i.i1.i, i64 noundef 1) #8, !noalias !79 
      180:  br label %bb5.i 
      181:  
      182: unreachable.i: ; preds = %bb2.i 
      183:  unreachable 
      184:  
      185: bb5.i: ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i", %cleanup.i 
      186:  resume { ptr, i32 } %7 
      187:  
      188: "_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h443f41158dd09e33E.exit": ; preds = %bb6.i, %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i.i" 
      189:  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %_5.sroa.5) 
      190:  %9 = add i8 %_4.sroa.11.1.copyload, %_4.sroa.9.1.copyload 
      191:  br label %bb4 
      192:  
      193: bb1: ; preds = %start 
      194:  tail call void @llvm.experimental.noalias.scope.decl(metadata !80) 
      195:  tail call void @llvm.experimental.noalias.scope.decl(metadata !83) 
      196:  tail call void @llvm.experimental.noalias.scope.decl(metadata !86) 
      197:  tail call void @llvm.experimental.noalias.scope.decl(metadata !89) 
      198:  %10 = getelementptr inbounds i8, ptr %a, i64 8 
      199:  %_5.i.i.i.i1.i2 = load i64, ptr %10, align 8, !alias.scope !92, !noalias !95 
      200:  %11 = icmp eq i64 %_5.i.i.i.i1.i2, 0 
      201:  br i1 %11, label %bb4, label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i3" 
      202:  
      203: "_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i3": ; preds = %bb1 
      204:  %self4.i.i.i.i3.i4 = load ptr, ptr %a, align 8, !alias.scope !92, !noalias !95, !nonnull !3, !noundef !3 
      205:  tail call void @__rust_dealloc(ptr noundef nonnull %self4.i.i.i.i3.i4, i64 noundef %_5.i.i.i.i1.i2, i64 noundef 1) #8, !noalias !97 
      206:  br label %bb4 
      207:  
      208: bb4: ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i3", %bb1, %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h443f41158dd09e33E.exit" 
      209:  %_0.sroa.0.0 = phi i8 [ %9, %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h443f41158dd09e33E.exit" ], [ 0, %bb1 ], [ 0, %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i3" ] 
      210:  ret i8 %_0.sroa.0.0 
      211: } 
      213: ; core::fmt::Formatter::debug_list 
      213: ; core::fmt::Formatter::debug_list 
      214: ; Function Attrs: nonlazybind uwtable 
      215: declare void @_ZN4core3fmt9Formatter10debug_list17h4f2f427b0842a3ebE(ptr dead_on_unwind noalias nocapture noundef writable sret([16 x i8]) align 8 dereferenceable(16), ptr noalias noundef align 8 dereferenceable(64)) unnamed_addr #0 
      217: ; core::fmt::builders::DebugList::finish 
      217: ; core::fmt::builders::DebugList::finish 
      218: ; Function Attrs: nonlazybind uwtable 
      219: declare noundef zeroext i1 @_ZN4core3fmt8builders9DebugList6finish17hc2340632c9bfa6bfE(ptr noalias noundef align 8 dereferenceable(16)) unnamed_addr #0 
      220:  
      221: ; core::fmt::num::imp::<impl core::fmt::Display for u8>::fmt 
      222: ; Function Attrs: nonlazybind uwtable 
      223: declare noundef zeroext i1 @"_ZN4core3fmt3num3imp51_$LT$impl$u20$core..fmt..Display$u20$for$u20$u8$GT$3fmt17h40283dbe3b45cc8aE"(ptr noalias noundef readonly align 1 dereferenceable(1), ptr noalias noundef align 8 dereferenceable(64)) unnamed_addr #0 
      224:  
      225: ; core::fmt::num::<impl core::fmt::UpperHex for u8>::fmt 
      226: ; Function Attrs: nonlazybind uwtable 
      227: declare noundef zeroext i1 @"_ZN4core3fmt3num52_$LT$impl$u20$core..fmt..UpperHex$u20$for$u20$u8$GT$3fmt17hbb637f4ec75fe0b4E"(ptr noalias noundef readonly align 1 dereferenceable(1), ptr noalias noundef align 8 dereferenceable(64)) unnamed_addr #0 
      228:  
      229: ; core::fmt::num::<impl core::fmt::LowerHex for u8>::fmt 
      230: ; Function Attrs: nonlazybind uwtable 
      231: declare noundef zeroext i1 @"_ZN4core3fmt3num52_$LT$impl$u20$core..fmt..LowerHex$u20$for$u20$u8$GT$3fmt17hbae9e1526842e35cE"(ptr noalias noundef readonly align 1 dereferenceable(1), ptr noalias noundef align 8 dereferenceable(64)) unnamed_addr #0 
      232:  
      233: ; Function Attrs: nounwind nonlazybind uwtable 
      234: declare noundef range(i32 0, 10) i32 @rust_eh_personality(i32 noundef, i32 noundef range(i32 1, 17), i64 noundef, ptr noundef, ptr noundef) unnamed_addr #1 
      236: ; core::fmt::builders::DebugList::entry 
      236: ; core::fmt::builders::DebugList::entry 
      237: ; Function Attrs: nonlazybind uwtable 
      238: declare noundef align 8 dereferenceable(16) ptr @_ZN4core3fmt8builders9DebugList5entry17h0e93e15c1edda619E(ptr noalias noundef align 8 dereferenceable(16), ptr noundef nonnull align 1, ptr noalias noundef readonly align 8 dereferenceable(32)) unnamed_addr #0 
      239:  
      240: ; Function Attrs: mustprogress nocallback nofree nounwind willreturn memory(argmem: readwrite) 
      241: declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #2 
      243: ; core::result::unwrap_failed 
      243: ; core::result::unwrap_failed 
      244: ; Function Attrs: cold noinline noreturn nonlazybind uwtable 
      245: declare void @_ZN4core6result13unwrap_failed17hda82ba412d85e1ccE(ptr noalias noundef nonnull readonly align 1, i64 noundef, ptr noundef nonnull align 1, ptr noalias noundef readonly align 8 dereferenceable(32), ptr noalias noundef readonly align 8 dereferenceable(32)) unnamed_addr #3 
      246:  
      247: ; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) 
      248: declare void @llvm.assume(i1 noundef) #4 
      249:  
      250: ; Function Attrs: nounwind nonlazybind allockind("free") uwtable 
      251: declare void @__rust_dealloc(ptr allocptr noundef, i64 noundef, i64 noundef) unnamed_addr #5 
      252:  
      253: ; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) 
      254: declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #6 
      255:  
      256: ; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) 
      257: declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #6 
      258:  
      259: ; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite) 
      260: declare void @llvm.experimental.noalias.scope.decl(metadata) #7 
      261:  
      262: attributes #0 = { nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" } 
        .
        .
>>>>>>
------------------------------------------

bors · 2024-09-20T00:18:11Z

💔 Test failed - checks-actions

rustbot assigned cuviper Apr 11, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 11, 2024

rustbot assigned the8472 and unassigned cuviper Apr 11, 2024

jhorstmann mentioned this pull request May 5, 2024

Sometimes 'Σ' is not handled correctly in str.to_lowercase() #124714

Closed

Marcondiro reviewed May 7, 2024

View reviewed changes

library/alloc/benches/string.rs Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

jhorstmann force-pushed the optimize-upper-lower-auto-vectorization branch from f20f9e6 to 2a52a46 Compare May 9, 2024 12:49

jhorstmann force-pushed the optimize-upper-lower-auto-vectorization branch from 2a52a46 to 6c58e74 Compare May 10, 2024 19:55

Marcondiro reviewed May 13, 2024

View reviewed changes

library/alloc/src/str.rs Outdated Show resolved Hide resolved

the8472 requested changes Jun 1, 2024

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 1, 2024

jhorstmann force-pushed the optimize-upper-lower-auto-vectorization branch from 6c58e74 to b03d939 Compare June 2, 2024 20:49

the8472 reviewed Jun 2, 2024

View reviewed changes

library/alloc/src/str.rs Outdated Show resolved Hide resolved

Marcondiro reviewed Jun 3, 2024

View reviewed changes

library/alloc/tests/str.rs Outdated Show resolved Hide resolved

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 18, 2024

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 19, 2024

This comment has been minimized.

Sign in to view

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 19, 2024

This comment has been minimized.

Sign in to view

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 19, 2024

Do not specify target triple for codegen test so it also runs on msvc…

60a13dd

… builds

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 19, 2024

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 20, 2024

nikic mentioned this pull request Sep 21, 2024

issue-111508-vec-tryinto-array.rs fails spuriously on x86_64-gnu-llvm-19 #130656

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve autovectorization of to_lowercase / to_uppercase functions #123778

Improve autovectorization of to_lowercase / to_uppercase functions #123778

jhorstmann commented Apr 11, 2024

rustbot commented Apr 11, 2024

jhorstmann commented Apr 11, 2024

Marcondiro commented May 5, 2024

jhorstmann commented May 6, 2024

This comment was marked as outdated.

jhorstmann commented May 8, 2024

This comment was marked as outdated.

jhorstmann commented May 9, 2024

bors commented May 9, 2024

jhorstmann commented May 13, 2024

Marcondiro commented May 13, 2024

jhorstmann commented Jun 2, 2024 •

edited

Loading

Marcondiro commented Jun 4, 2024

the8472 commented Jun 25, 2024

jhorstmann commented Sep 18, 2024

bors commented Sep 19, 2024

bors commented Sep 19, 2024

This comment has been minimized.

the8472 commented Sep 19, 2024

bors commented Sep 19, 2024

This comment has been minimized.

bors commented Sep 19, 2024

the8472 commented Sep 19, 2024

bors commented Sep 19, 2024

bors commented Sep 19, 2024

rust-log-analyzer commented Sep 20, 2024

bors commented Sep 20, 2024

Improve autovectorization of to_lowercase / to_uppercase functions #123778

Are you sure you want to change the base?

Improve autovectorization of to_lowercase / to_uppercase functions #123778

Conversation

jhorstmann commented Apr 11, 2024

rustbot commented Apr 11, 2024

jhorstmann commented Apr 11, 2024

Marcondiro commented May 5, 2024

jhorstmann commented May 6, 2024

This comment was marked as outdated.

jhorstmann commented May 8, 2024

This comment was marked as outdated.

jhorstmann commented May 9, 2024

bors commented May 9, 2024

jhorstmann commented May 13, 2024

Marcondiro commented May 13, 2024

jhorstmann commented Jun 2, 2024 • edited Loading

Marcondiro commented Jun 4, 2024

the8472 commented Jun 25, 2024

jhorstmann commented Sep 18, 2024

bors commented Sep 19, 2024

bors commented Sep 19, 2024

This comment has been minimized.

the8472 commented Sep 19, 2024

bors commented Sep 19, 2024

This comment has been minimized.

bors commented Sep 19, 2024

the8472 commented Sep 19, 2024

bors commented Sep 19, 2024

bors commented Sep 19, 2024

rust-log-analyzer commented Sep 20, 2024

bors commented Sep 20, 2024

jhorstmann commented Jun 2, 2024 •

edited

Loading