Performance regression in tight loop since rust 1.25 #53340

pedrocr · 2018-08-14T10:22:20Z

I've finally gotten around to doing some proper benchmarking of rust versions for my crate:

http://chimper.org/rawloader-rustc-benchmarks/

As can be seen in the graph on that page there's a general performance improvement over time but there are some very negative outliers. Most (maybe all) of them seem to be very simple loops that decode packed formats. Since rust 1.25 those are seeing 30-40% degradations in performance. I've extracted a minimal test case that shows the issue:

fn decode_12le(buf: &[u8], width: usize, height: usize) -> Vec<u16> {
  let mut out: Vec<u16> = vec![0; width*height];

  for (row, line) in out.chunks_mut(width).enumerate() {
    let inb = &buf[(row*width*12/8)..];

    for (o, i) in line.chunks_mut(2).zip(inb.chunks(3)) {
      let g1: u16 = i[0] as u16;
      let g2: u16 = i[1] as u16;
      let g3: u16 = i[2] as u16;

      o[0] = ((g2 & 0x0f) << 8) | g1;
      o[1] = (g3 << 4) | (g2 >> 4);
    }
  }
  out
}

fn main() {
  let width = 5000;
  let height = 4000;

  let buffer: Vec<u8> = vec![0; width*height*12/8];
  
  for _ in 0..100 {
    decode_12le(&buffer, width, height);
  }
}

Here's a test run on my machine:

$ rustc +1.24.0 -C opt-level=3 bench_decode.rs 
$ time ./bench_decode 

real	0m4.817s
user	0m3.581s
sys	0m1.236s
$ rustc +1.25.0 -C opt-level=3 bench_decode.rs 
$ time ./bench_decode 

real	0m6.263s
user	0m5.067s
sys	0m1.196s

pedrocr · 2018-08-14T11:13:33Z

godbolt shows quite a big diff between 1.24 and 1.25:

https://godbolt.org/g/fbpuHp

pedrocr · 2018-10-27T09:17:21Z

I've scripted the checking of this across versions and the regression is present all the way through to nightly:

packed12le : 1.24.1  BASE 3.53
packed12le : 1.25.0  FAIL 4.84 (+37%)
packed12le : 1.26.2  FAIL 4.80 (+35%)
packed12le : 1.27.2  FAIL 4.81 (+36%)
packed12le : 1.28.0  FAIL 4.87 (+37%)
packed12le : 1.29.2  FAIL 4.77 (+35%)
packed12le : 1.30.0  FAIL 4.83 (+36%)
packed12le : beta    FAIL 4.83 (+36%)
packed12le : nightly FAIL 4.95 (+40%)

The 35-40% increase in runtime is very consistent.

pedrocr · 2018-12-15T16:37:58Z

The regression is still present in 1.31 and all the way to current nightly:

packed12le : 1.24.1  BASE 3.51
packed12le : 1.25.0  FAIL 4.84 (+37%)
packed12le : 1.26.2  FAIL 4.83 (+37%)
packed12le : 1.27.2  FAIL 4.80 (+36%)
packed12le : 1.28.0  FAIL 4.86 (+38%)
packed12le : 1.29.2  FAIL 4.98 (+41%)
packed12le : 1.30.1  FAIL 5.00 (+42%)
packed12le : 1.31.0  FAIL 4.90 (+39%)
packed12le : beta    FAIL 4.88 (+39%)
packed12le : nightly FAIL 4.92 (+40%)

The same ~40% regression seen in the minimal test case is also seen in the full benchmark:

http://chimper.org/rawloader-rustc-benchmarks/version-1.31.0.html
(see the bottom of the page)

bluss · 2018-12-19T10:24:07Z

@pedrocr Thanks for careful benchmarks. I'd suspect that the zip specialization for chunks mut is causing this, it was introduced between 1.24 and 1.25 in PR #47142

It's not far fetched that it's not actually an optimization for these iterators, and that the implementation should be revisited.

bluss · 2018-12-19T10:29:01Z

What's the performance if you compare this version with something based on the newer chunks_exact/_mut ?

pedrocr · 2018-12-19T13:16:32Z

Using chunks_exact_mut the regression is completely reverted and converted into a ~10% improvement:

packed12le : 1.24.1  BASE 3.53
packed12le : 1.25.0  FAIL 4.93 (+39%)
packed12le : 1.26.2  FAIL 4.82 (+36%)
packed12le : 1.27.2  FAIL 4.85 (+37%)
packed12le : 1.28.0  FAIL 4.93 (+39%)
packed12le : 1.29.2  FAIL 4.94 (+39%)
packed12le : 1.30.1  FAIL 5.03 (+42%)
packed12le : 1.31.0  OK   3.19 (-9%)
packed12le : beta    OK   3.18 (-9%)
packed12le : nightly OK   3.08 (-12%)

Starting with 1.31 the chunks_exact_mut is used instead of chunks_mut. At some point I'll probably make 1.31 (and Rust 2018) a new baseline required version to be able to use this. But it's probably nice to fix this regression anyway as the exact versions are not always usable.

bluss · 2018-12-21T06:50:04Z

@pedrocr just to clarify, did you update all occurrences of chunks/_mut to be the exact version?

pedrocr · 2018-12-21T06:58:21Z

@bluss Both the inner and outer loop. Here's the full code:

fn decode_12le(buf: &[u8], width: usize, height: usize) -> Vec<u16> {
  let mut out: Vec<u16> = vec![0; width*height];

  for (row, line) in out.chunks_exact_mut(width).enumerate() {
    let inb = &buf[(row*width*12/8)..];

    for (o, i) in line.chunks_exact_mut(2).zip(inb.chunks(3)) {
      let g1: u16 = i[0] as u16;
      let g2: u16 = i[1] as u16;
      let g3: u16 = i[2] as u16;

      o[0] = ((g2 & 0x0f) << 8) | g1;
      o[1] = (g3 << 4) | (g2 >> 4);
    }
  }
  out
}

fn main() {
  let width = 5000;
  let height = 4000;

  let mut buffer: Vec<u8> = vec![0; width*height*12/8];
  // Make sure we don't get optimized out by writing some data into the buffer
  for (i, val) in buffer.chunks_mut(1).enumerate() {
    val[0] = i as u8;
  }
  
  for _ in 0..100 {
    decode_12le(&buffer, width, height);
  }
}

I've also initialized the buffer with some data to avoid optimization. I think that became needed after one of the LLVM upgrades since 1.25.

bluss · 2018-12-21T09:18:58Z

There's still a chunks in there, why not try with it exact too?

pedrocr · 2018-12-21T09:21:25Z

I did, and it doesn't make a difference, it's just the initialization which doesn't take much time. I didn't change that one because it's not in the code that's actually being benchmarked, but it doesn't really make a difference.

bluss · 2018-12-21T10:38:44Z

Maybe change the chunks to chunks exact on this line. I'm just curious.

for (o, i) in line.chunks_exact_mut(2).zip(inb.chunks(3)) {

pedrocr · 2018-12-21T11:24:18Z

Ah, right, missed that one. I'll check.

pedrocr · 2018-12-21T11:37:05Z

It either works extremely well or somehow LLVM figured out how to optimize away too much:

packed12le : 1.24.1  BASE 3.46
packed12le : 1.25.0  FAIL 5.00 (+44%)
packed12le : 1.26.2  FAIL 4.87 (+40%)
packed12le : 1.27.2  FAIL 4.91 (+41%)
packed12le : 1.28.0  FAIL 4.83 (+39%)
packed12le : 1.29.2  FAIL 4.99 (+44%)
packed12le : 1.30.1  FAIL 5.01 (+44%)
packed12le : 1.31.0  OK   1.80 (-47%)
packed12le : beta    OK   1.80 (-47%)
packed12le : nightly OK   1.83 (-47%)

bluss · 2018-12-21T11:53:04Z

That's exactly what we want :)

bluss · 2018-12-21T20:27:16Z

The exact chunks code looks ok in godbolt. Nothing spectacular, just a clean inner loop with no redundant bounds checks, and a single loop exit conditional.

A minor boring trick to the old code is to change the order to this:

      let g3: u16 = i[2] as u16;
      let g2: u16 = i[1] as u16;
      let g1: u16 = i[0] as u16;

      o[1] = (g3 << 4) | (g2 >> 4);
      o[0] = ((g2 & 0x0f) << 8) | g1;

With the bounds check at i[2], the other bound checks on i become redundant etc. But the exact chunks versions are much better: no bound checks. Not sure that bound checks are the biggest drag: the old loop also has a jumble of conditional moves that compute each slice's length.

pedrocr · 2018-12-21T21:06:27Z

The 10% improvement was already interesting but the 40% one definitely made me want to use this. I have almost 50 chunks() and chunks_mut() like these in rawloader so I'll definitely be benchmarking that. If I remember correctly the original C++ code did have better performance in these kinds of very simple formats so this probably closes one of the few gaps I saw in performance when writing the rust code. Just need to figure out what to do about older versions of rust. Maybe just have a dummy implementation that calls the non-exact versions.

The bounds check trick is interesting but a little disappointing that the compiler doesn't figure that out itself. I've intentionally kept the code clean instead of trying to make it fast by being clever. It seemed to pay well in terms of productivity and maintainability.

But I don't think this closes the regression itself though, or does it? I assume chunks_exact() doesn't always fit and there's performance to be gained in other cases.

bluss · 2018-12-21T21:10:21Z

I agree, it looks like we should try to fix the performance of chunks.zip(chunks) where it's any combination of chunks/chunks_mut, to fix this regression.

sfackler · 2018-12-21T21:30:36Z

The bounds check trick is interesting but a little disappointing that the compiler doesn't figure that out itself.

It can't make that change itself since it would change the visible behavior of the program: the panic message includes the index.

pedrocr · 2018-12-21T22:14:29Z

@sfackler ah, that's annoying but makes perfect sense, thanks.

pedrocr · 2019-01-16T21:33:27Z

1.32 maintains the same performance:

packed12le : 1.24.1  BASE 3.48
packed12le : 1.25.0  FAIL 4.93 (+41%)
packed12le : 1.26.2  FAIL 4.87 (+39%)
packed12le : 1.27.2  FAIL 4.87 (+39%)
packed12le : 1.28.0  FAIL 4.89 (+40%)
packed12le : 1.29.2  FAIL 4.99 (+43%)
packed12le : 1.30.1  FAIL 5.00 (+43%)
packed12le : 1.31.1  FAIL 4.95 (+42%)
packed12le : 1.32.0  FAIL 4.90 (+40%)
packed12le : beta    FAIL 4.91 (+41%)
packed12le : nightly FAIL 4.88 (+40%)

The chunks_exact version is still very fast.

pedrocr · 2019-02-25T20:05:57Z

1.33 maintains the same regression:

packed12le : 1.24.1  BASE 3.77
packed12le : 1.25.0  FAIL 5.21 (+38%)
packed12le : 1.26.2  FAIL 5.15 (+36%)
packed12le : 1.27.2  FAIL 5.13 (+36%)
packed12le : 1.28.0  FAIL 5.11 (+35%)
packed12le : 1.29.2  FAIL 5.25 (+39%)
packed12le : 1.30.1  FAIL 5.25 (+39%)
packed12le : 1.31.1  FAIL 5.16 (+36%)
packed12le : 1.32.0  FAIL 5.13 (+36%)
packed12le : 1.33.0  FAIL 5.12 (+35%)
packed12le : beta    FAIL 5.17 (+37%)
packed12le : nightly FAIL 5.19 (+37%)

pedrocr · 2019-05-21T16:35:50Z

I don't know if this is useful but 1.34/1.35 and current beta/nightly still have the same regression:

packed12le : 1.24.1  BASE 3.59
packed12le : 1.25.0  FAIL 4.97 (+38%)
packed12le : 1.26.2  FAIL 4.89 (+36%)
packed12le : 1.27.2  FAIL 4.98 (+38%)
packed12le : 1.28.0  FAIL 4.99 (+38%)
packed12le : 1.29.2  FAIL 5.15 (+43%)
packed12le : 1.30.1  FAIL 4.95 (+37%)
packed12le : 1.31.1  FAIL 5.03 (+40%)
packed12le : 1.32.0  FAIL 5.02 (+39%)
packed12le : 1.33.0  FAIL 5.10 (+42%)
packed12le : 1.34.2  FAIL 5.08 (+41%)
packed12le : 1.35.0  FAIL 5.01 (+39%)
packed12le : beta    FAIL 5.06 (+40%)
packed12le : nightly FAIL 5.16 (+43%)

pedrocr · 2019-09-27T11:05:10Z

1.38 recovers roughly half this regression:

packed12le : 1.24.1  BASE 3.67
packed12le : 1.25.0  FAIL 5.09 (+38%)
packed12le : 1.26.2  FAIL 5.19 (+41%)
packed12le : 1.27.2  FAIL 5.10 (+38%)
packed12le : 1.28.0  FAIL 5.15 (+40%)
packed12le : 1.29.2  FAIL 5.24 (+42%)
packed12le : 1.30.1  FAIL 5.10 (+38%)
packed12le : 1.31.1  FAIL 5.27 (+43%)
packed12le : 1.32.0  FAIL 5.16 (+40%)
packed12le : 1.33.0  FAIL 5.19 (+41%)
packed12le : 1.34.2  FAIL 5.35 (+45%)
packed12le : 1.35.0  FAIL 5.18 (+41%)
packed12le : 1.36.0  FAIL 5.20 (+41%)
packed12le : 1.37.0  FAIL 5.11 (+39%)
packed12le : 1.38.0  FAIL 4.50 (+22%)
packed12le : beta    FAIL 4.50 (+22%)
packed12le : nightly FAIL 4.48 (+22%)

The chunks_exact version is unchanged so it doesn't seem to be some other thing that has improved to make this faster.

pedrocr · 2021-06-03T11:57:45Z

As an update it seems the latest rust versions have gotten this down to a ~4% penalty only:

packed12le : 1.24.1  BASE 4.69
packed12le : 1.25.0  FAIL 6.36 (+35%)
packed12le : 1.26.2  FAIL 6.34 (+35%)
packed12le : 1.27.2  FAIL 6.07 (+29%)
packed12le : 1.28.0  FAIL 6.03 (+28%)
packed12le : 1.29.2  FAIL 6.08 (+29%)
packed12le : 1.30.1  FAIL 6.12 (+30%)
packed12le : 1.31.1  FAIL 6.03 (+28%)
packed12le : 1.32.0  FAIL 6.09 (+29%)
packed12le : 1.33.0  FAIL 6.06 (+29%)
packed12le : 1.34.2  FAIL 6.01 (+28%)
packed12le : 1.35.0  FAIL 6.04 (+28%)
packed12le : 1.36.0  FAIL 6.01 (+28%)
packed12le : 1.37.0  FAIL 6.12 (+30%)
packed12le : 1.38.0  FAIL 5.32 (+13%)
packed12le : 1.39.0  FAIL 5.39 (+14%)
packed12le : 1.40.0  FAIL 5.68 (+21%)
packed12le : 1.41.1  FAIL 5.75 (+22%)
packed12le : 1.42.0  FAIL 5.27 (+12%)
packed12le : 1.43.1  FAIL 5.34 (+13%)
packed12le : 1.44.1  FAIL 5.32 (+13%)
packed12le : 1.45.2  FAIL 5.31 (+13%)
packed12le : 1.46.0  FAIL 5.32 (+13%)
packed12le : 1.47.0  FAIL 5.28 (+12%)
packed12le : 1.48.0  FAIL 4.95 (+5%)
packed12le : 1.49.0  FAIL 5.35 (+14%)
packed12le : 1.50.0  FAIL 4.93 (+5%)
packed12le : 1.51.0  FAIL 4.93 (+5%)
packed12le : 1.52.1  FAIL 4.90 (+4%)
packed12le : beta    FAIL 4.92 (+4%)
packed12le : nightly FAIL 4.88 (+4%)

bstrie · 2021-07-03T19:42:50Z

You say the modern versions have almost closed the gap in performance; can you look at the modern assembly and see what still might be worse than 1.24, and what has improved since 1.25?

pedrocr · 2021-07-03T19:54:21Z

My knowledge of x86 assembly is rudimentary, sorry.

the8472 · 2021-07-03T22:51:24Z

I have a potential fix in #86823 if you want to benchmark it you can grab the try build for 5c392fe307a7b9c6ca1d328ad7dbed69fb03897d

pedrocr · 2021-07-05T00:09:39Z

Does the build system save its artifacts anywhere? I don't think I can currently build rustc on this machine.

the8472 · 2021-07-05T07:51:56Z

You can use https://github.com/kennytm/rustup-toolchain-install-master to install CI builds as rustup toolchain.

pedrocr · 2021-07-06T18:15:40Z

That tool worked really well. Gave me an installed toolchain and then the benchmarking automation just worked. Here's the result of running that branch compared to the other recent toolchains:

packed12le : 1.24.1  BASE 4.59
packed12le : 1.52.1  FAIL 5.00 (+8%)
packed12le : 1.53.0  FAIL 4.91 (+6%)
packed12le : beta    FAIL 4.91 (+6%)
packed12le : nightly FAIL 5.06 (+10%)
packed12le : 5c392fe OK   4.21 (-8%)

Hopefully the patch doesn't have any soundness issues because it seems to fix things completely. The same code now becomes ~8% faster.

pedrocr · 2022-12-08T16:42:28Z

Just as a confirmation for this here are full results:

packed12le : 1.24.1  BASE 4.50
packed12le : 1.25.0  FAIL 6.39 (+41%)
packed12le : 1.26.2  FAIL 6.31 (+40%)
packed12le : 1.27.2  FAIL 6.01 (+33%)
packed12le : 1.28.0  FAIL 5.95 (+32%)
packed12le : 1.29.2  FAIL 5.95 (+32%)
packed12le : 1.30.1  FAIL 6.07 (+34%)
packed12le : 1.31.1  FAIL 6.01 (+33%)
packed12le : 1.32.0  FAIL 5.99 (+33%)
packed12le : 1.33.0  FAIL 5.97 (+32%)
packed12le : 1.34.2  FAIL 6.02 (+33%)
packed12le : 1.35.0  FAIL 5.98 (+32%)
packed12le : 1.36.0  FAIL 6.05 (+34%)
packed12le : 1.37.0  FAIL 6.01 (+33%)
packed12le : 1.38.0  FAIL 5.22 (+15%)
packed12le : 1.39.0  FAIL 5.30 (+17%)
packed12le : 1.40.0  FAIL 5.60 (+24%)
packed12le : 1.41.1  FAIL 5.69 (+26%)
packed12le : 1.42.0  FAIL 5.29 (+17%)
packed12le : 1.43.1  FAIL 5.22 (+15%)
packed12le : 1.44.1  FAIL 5.31 (+17%)
packed12le : 1.45.2  FAIL 5.23 (+16%)
packed12le : 1.46.0  FAIL 5.22 (+15%)
packed12le : 1.47.0  FAIL 5.21 (+15%)
packed12le : 1.48.0  FAIL 4.91 (+9%)
packed12le : 1.49.0  FAIL 5.18 (+15%)
packed12le : 1.50.0  FAIL 4.87 (+8%)
packed12le : 1.51.0  FAIL 4.86 (+8%)
packed12le : 1.52.1  FAIL 4.84 (+7%)
packed12le : 1.53.0  FAIL 4.85 (+7%)
packed12le : 1.54.0  FAIL 4.88 (+8%)
packed12le : 1.55.0  OK   4.14 (-8%)
packed12le : 1.56.1  OK   4.15 (-7%)
packed12le : 1.57.0  OK   3.47 (-22%)
packed12le : 1.58.1  OK   4.16 (-7%)
packed12le : 1.59.0  OK   4.05 (-10%)
packed12le : 1.60.0  OK   3.42 (-24%)
packed12le : 1.61.0  OK   4.16 (-7%)
packed12le : 1.62.1  OK   3.42 (-24%)
packed12le : 1.63.0  OK   4.16 (-7%)
packed12le : 1.64.0  OK   4.09 (-9%)
packed12le : 1.65.0  OK   4.49 (-0%)
packed12le : beta    OK   4.50 (+0%)
packed12le : nightly OK   4.50 (+0%)

Some recent versions were actually able to reach 20%+ performance improvements but now we're back to 0. Possibly that's a different performance improvement/regression going on.

scottmcm · 2023-01-05T18:38:28Z

This might be related to whether it auto-vectorizes.

This version gets a nice one: https://rust.godbolt.org/z/TMKvzozjv

  %58 = zext <4 x i8> %57 to <4 x i16>, !dbg !283
  %59 = shl nuw <4 x i16> %45, <i16 8, i16 8, i16 8, i16 8>, !dbg !314
  %60 = and <4 x i16> %59, <i16 3840, i16 3840, i16 3840, i16 3840>, !dbg !314
  %61 = or <4 x i16> %60, %32, !dbg !316
  %62 = shl nuw nsw <4 x i16> %58, <i16 4, i16 4, i16 4, i16 4>, !dbg !317
  %63 = lshr <4 x i16> %45, <i16 4, i16 4, i16 4, i16 4>, !dbg !318
  %64 = or <4 x i16> %62, %63, !dbg !319

So that would plausibly be a reason for the 40% improvement mentioned in #53340 (comment)

But peeling the last iteration from the non-exact is probably hard, losing the auto-vectorization when exact things aren't used.

pedrocr · 2024-03-29T14:27:32Z

This can probably be closed. The regression seems to be solved definitely since 1.55 and now hovers around 10 to 20% improvement depending on version:

packed12le : 1.24.1  BASE 1.58
packed12le : 1.25.0  FAIL 2.12 (+34%)
packed12le : 1.26.2  FAIL 2.06 (+30%)
packed12le : 1.27.2  FAIL 2.07 (+31%)
packed12le : 1.28.0  FAIL 2.08 (+31%)
packed12le : 1.29.2  FAIL 2.08 (+31%)
packed12le : 1.30.1  FAIL 2.13 (+34%)
packed12le : 1.31.1  FAIL 2.17 (+37%)
packed12le : 1.32.0  FAIL 2.14 (+35%)
packed12le : 1.33.0  FAIL 2.11 (+33%)
packed12le : 1.34.2  FAIL 2.13 (+34%)
packed12le : 1.35.0  FAIL 2.11 (+33%)
packed12le : 1.36.0  FAIL 2.12 (+34%)
packed12le : 1.37.0  FAIL 2.10 (+32%)
packed12le : 1.38.0  FAIL 1.81 (+14%)
packed12le : 1.39.0  FAIL 1.79 (+13%)
packed12le : 1.40.0  FAIL 1.81 (+14%)
packed12le : 1.41.1  FAIL 1.86 (+17%)
packed12le : 1.42.0  FAIL 1.81 (+14%)
packed12le : 1.43.1  FAIL 1.81 (+14%)
packed12le : 1.44.1  FAIL 1.83 (+15%)
packed12le : 1.45.2  FAIL 1.80 (+13%)
packed12le : 1.46.0  FAIL 1.77 (+12%)
packed12le : 1.47.0  FAIL 1.72 (+8%)
packed12le : 1.48.0  FAIL 1.69 (+6%)
packed12le : 1.49.0  FAIL 1.68 (+6%)
packed12le : 1.50.0  FAIL 1.69 (+6%)
packed12le : 1.51.0  FAIL 1.73 (+9%)
packed12le : 1.52.1  FAIL 1.67 (+5%)
packed12le : 1.53.0  FAIL 1.66 (+5%)
packed12le : 1.54.0  FAIL 1.63 (+3%)
packed12le : 1.55.0  OK   1.43 (-9%)
packed12le : 1.56.1  OK   1.38 (-12%)
packed12le : 1.57.0  OK   1.37 (-13%)
packed12le : 1.58.1  OK   1.32 (-16%)
packed12le : 1.59.0  OK   1.30 (-17%)
packed12le : 1.60.0  OK   1.40 (-11%)
packed12le : 1.61.0  OK   1.40 (-11%)
packed12le : 1.62.1  OK   1.37 (-13%)
packed12le : 1.63.0  OK   1.34 (-15%)
packed12le : 1.64.0  OK   1.42 (-10%)
packed12le : 1.65.0  OK   1.42 (-10%)
packed12le : 1.66.0  OK   1.46 (-7%)
packed12le : 1.67.1  OK   1.41 (-10%)
packed12le : 1.68.2  OK   1.42 (-10%)
packed12le : 1.69.0  OK   1.43 (-9%)
packed12le : 1.70.0  OK   1.46 (-7%)
packed12le : 1.71.1  OK   1.45 (-8%)
packed12le : 1.72.1  OK   1.46 (-7%)
packed12le : 1.73.0  OK   1.41 (-10%)
packed12le : 1.74.1  OK   1.42 (-10%)
packed12le : 1.75.0  OK   1.44 (-8%)
packed12le : 1.76.0  OK   1.44 (-8%)
packed12le : 1.77.0  OK   1.42 (-10%)
packed12le : beta    OK   1.39 (-12%)
packed12le : nightly OK   1.24 (-21%)

the8472 · 2024-03-29T14:43:29Z

Nice, adding a codegen test might be useful though since it seems like a fickle optimization

sanxiyn added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Aug 17, 2018

pedrocr mentioned this issue Oct 27, 2018

Performance regression in mixed f32/i32/u16 code in 1.30 #55413

Closed

nikic added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Dec 19, 2018

jonas-schievink added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Mar 28, 2019

m-ou-se added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 23, 2021

the8472 mentioned this issue Jul 2, 2021

Optimize unchecked indexing into chunks and chunks_mut #86823

Merged

bors closed this as completed in 0cd0709 Jul 8, 2021

the8472 mentioned this issue Jan 29, 2022

Use TrustedRandomAccess for loop desugaring #93243

Closed

the8472 reopened this Dec 8, 2022

the8472 self-assigned this Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in tight loop since rust 1.25 #53340

Performance regression in tight loop since rust 1.25 #53340

pedrocr commented Aug 14, 2018

pedrocr commented Aug 14, 2018

pedrocr commented Oct 27, 2018

pedrocr commented Dec 15, 2018

bluss commented Dec 19, 2018

bluss commented Dec 19, 2018

pedrocr commented Dec 19, 2018

bluss commented Dec 21, 2018

pedrocr commented Dec 21, 2018

bluss commented Dec 21, 2018

pedrocr commented Dec 21, 2018

bluss commented Dec 21, 2018

pedrocr commented Dec 21, 2018

pedrocr commented Dec 21, 2018

bluss commented Dec 21, 2018

bluss commented Dec 21, 2018 •

edited

Loading

pedrocr commented Dec 21, 2018

bluss commented Dec 21, 2018 •

edited

Loading

sfackler commented Dec 21, 2018

pedrocr commented Dec 21, 2018

pedrocr commented Jan 16, 2019

pedrocr commented Feb 25, 2019

pedrocr commented May 21, 2019

pedrocr commented Sep 27, 2019

pedrocr commented Jun 3, 2021

bstrie commented Jul 3, 2021

pedrocr commented Jul 3, 2021

the8472 commented Jul 3, 2021

pedrocr commented Jul 5, 2021

the8472 commented Jul 5, 2021

pedrocr commented Jul 6, 2021

pedrocr commented Dec 8, 2022

scottmcm commented Jan 5, 2023

pedrocr commented Mar 29, 2024

the8472 commented Mar 29, 2024

Performance regression in tight loop since rust 1.25 #53340

Performance regression in tight loop since rust 1.25 #53340

Comments

pedrocr commented Aug 14, 2018

pedrocr commented Aug 14, 2018

pedrocr commented Oct 27, 2018

pedrocr commented Dec 15, 2018

bluss commented Dec 19, 2018

bluss commented Dec 19, 2018

pedrocr commented Dec 19, 2018

bluss commented Dec 21, 2018

pedrocr commented Dec 21, 2018

bluss commented Dec 21, 2018

pedrocr commented Dec 21, 2018

bluss commented Dec 21, 2018

pedrocr commented Dec 21, 2018

pedrocr commented Dec 21, 2018

bluss commented Dec 21, 2018

bluss commented Dec 21, 2018 • edited Loading

pedrocr commented Dec 21, 2018

bluss commented Dec 21, 2018 • edited Loading

sfackler commented Dec 21, 2018

pedrocr commented Dec 21, 2018

pedrocr commented Jan 16, 2019

pedrocr commented Feb 25, 2019

pedrocr commented May 21, 2019

pedrocr commented Sep 27, 2019

pedrocr commented Jun 3, 2021

bstrie commented Jul 3, 2021

pedrocr commented Jul 3, 2021

the8472 commented Jul 3, 2021

pedrocr commented Jul 5, 2021

the8472 commented Jul 5, 2021

pedrocr commented Jul 6, 2021

pedrocr commented Dec 8, 2022

scottmcm commented Jan 5, 2023

pedrocr commented Mar 29, 2024

the8472 commented Mar 29, 2024

bluss commented Dec 21, 2018 •

edited

Loading

bluss commented Dec 21, 2018 •

edited

Loading