Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hint optimizer about reserved capacity #116568

Closed
wants to merge 1 commit into from

Conversation

kornelski
Copy link
Contributor

@kornelski kornelski commented Oct 9, 2023

The fact that Vec had capacity increased is not known to the optimizer, because functions like do_reserve_and_handle are not inlined. (#82801) Because of that, code such as:

vec.try_reserve(123)?;
vec.extend_from_slice(&s[..123]);

Tries to reserve the capacity twice. This is unnecessary code bloat.

https://rust.godbolt.org/z/YWY16Ezej

Adding a hint after reserve optimizes out the next check of self.needs_to_grow(len, additional).

@rustbot
Copy link
Collaborator

rustbot commented Oct 9, 2023

r? @m-ou-se

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 9, 2023
@rust-log-analyzer

This comment has been minimized.

@CryZe
Copy link
Contributor

CryZe commented Oct 9, 2023

I think this is what core::intrinsics::assume is for, which should probably be preferred. Seems to generate the same assembly too.

@workingjubilee
Copy link
Member

Let's give this a whirl.
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 10, 2023
@bors
Copy link
Contributor

bors commented Oct 10, 2023

⌛ Trying commit 9a55d7a with merge 5137d8a...

bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 10, 2023
Hint optimizer about reserved capacity

The fact that `Vec` had capacity increased is not known to the optimizer, because functions like `do_reserve_and_handle` are not inlined. Because of that, code such as:

```rust
vec.try_reserve(123)?;
vec.extend_from_slice(&s[..123]);
```

Tries to reserve the capacity **twice**. This is unnecessary code bloat.

https://rust.godbolt.org/z/YWY16Ezej

Adding a hint after reserve optimizes out the next check of `self.needs_to_grow(len, additional)`.
@bors
Copy link
Contributor

bors commented Oct 10, 2023

☀️ Try build successful - checks-actions
Build commit: 5137d8a (5137d8ae3f0ffd6ebe1f4667505311a29e3c56a6)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5137d8a): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.3%, 0.8%] 6
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [0.3%, 0.8%] 6

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
4.9% [4.9%, 4.9%] 1
Regressions ❌
(secondary)
4.2% [4.2%, 4.2%] 1
Improvements ✅
(primary)
-2.6% [-2.6%, -2.6%] 1
Improvements ✅
(secondary)
-1.1% [-1.1%, -1.1%] 1
All ❌✅ (primary) 1.2% [-2.6%, 4.9%] 2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.4% [0.8%, 8.0%] 6
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.9% [-1.9%, -1.9%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.7% [-1.9%, 8.0%] 7

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.2%] 49
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.1%, -0.0%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.1%, 0.2%] 52

Bootstrap: 626.006s -> 627.675s (0.27%)
Artifact size: 270.80 MiB -> 270.90 MiB (0.04%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Oct 10, 2023
@workingjubilee
Copy link
Member

workingjubilee commented Oct 10, 2023

slightly concerned that at larger scales like rustc, we're somehow seeing even larger artifacts, despite the minimal example's well-demonstrated changes. with no concomitant perf gains, either. I think we should have a codegen test that unquestionably demonstrates the improvement here before we commit this.

and maybe a more comprehensive change that makes the results slightly less middling at the larger scale.

@kornelski
Copy link
Contributor Author

kornelski commented Oct 10, 2023

Unfortunately, the hint in std doesn't seem to have the same effect as a hint in user code ;(

I've even tried using #[inline(always)] on the Vec::reserve methods:

    #[inline(always)]
    pub fn reserve(&mut self, additional: usize) {
        self.buf.reserve(self.len, additional);
        unsafe {
            core::intrinsics::assume(additional <= self.buf.capacity().wrapping_sub(self.len));
        }
    }

The unexpected speed regression happens also in #114370

@rust-log-analyzer

This comment has been minimized.

@kornelski kornelski force-pushed the hint-reserved branch 2 times, most recently from 048c423 to aa25a20 Compare October 20, 2023 03:15
@rust-log-analyzer

This comment has been minimized.

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 22, 2023
@kornelski
Copy link
Contributor Author

kornelski commented Oct 22, 2023

I think the regressions are expected here:

  • Mixed-bag results in optimization cost could be because the hint is added always, but not always has redundant allocation code to optimize out. The hint adds additional (dead) code to inlined functions, and extra info for the optimizer to work with.
  • binary size increases are almost all in debug profile. I would not be surprised if Rust generated debug info for the extra assume lines. Rust's debug info is already really large.
  • the only non-debug binary that has increased in size is the image library (10046380 to 10065324), but that's .rlib size where 54% of the rlib size is still the debug info. I have tested that locally, and rlib does increase, but when compiling a stripped binary that uses the image crate, I see no difference in size.

So I think paths forward:

  1. Accept that's the cost of hints, and maybe investigate faster/more efficient debug info for std separately.
  2. I can limit the hint to only try_reserve functions, since .try_reserve(); extend() is an important pattern, the rest was a nice to have.

@workingjubilee
Copy link
Member

workingjubilee commented Oct 28, 2023

This does look more promising. The binary size remarks earlier were more of a "hrm, apparent cost without benefit" kind of remark.

I think we'd still want a codegen test either way, and then if you feel it would be worth testing the hint on try_reserve specifically, that might be good to see. Otherwise I think we could accept this with the test.

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 2, 2023
Hint optimizer about try-reserved capacity

This is rust-lang#116568, but limited only to the less-common `try_reserve` functions to reduce bloat in debug binaries from debug info, while still addressing the main use-case rust-lang#116570
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 5, 2023
…ngjubilee

Hint optimizer about try-reserved capacity

This is rust-lang#116568, but limited only to the less-common `try_reserve` functions to reduce bloat in debug binaries from debug info, while still addressing the main use-case rust-lang#116570
@bors
Copy link
Contributor

bors commented Nov 5, 2023

☔ The latest upstream changes (presumably #117503) made this pull request unmergeable. Please resolve the merge conflicts.

@workingjubilee
Copy link
Member

~10% instruction count improvements on various benchmarks

Well, that's surprising.

@workingjubilee
Copy link
Member

I wonder if that's just noise or if something changed in the meantime or the smaller change is genuinely that much better on its own. Let's see what we can see.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 6, 2023
@bors
Copy link
Contributor

bors commented Nov 6, 2023

⌛ Trying commit 3e1cf5b with merge 361898b...

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 6, 2023
Hint optimizer about reserved capacity

The fact that `Vec` had capacity increased is not known to the optimizer, because functions like `do_reserve_and_handle` are not inlined. (rust-lang#82801) Because of that, code such as:

```rust
vec.try_reserve(123)?;
vec.extend_from_slice(&s[..123]);
```

Tries to reserve the capacity **twice**. This is unnecessary code bloat.

https://rust.godbolt.org/z/YWY16Ezej

Adding a hint after reserve optimizes out the next check of `self.needs_to_grow(len, additional)`.
@bors bors mentioned this pull request Nov 6, 2023
@bors
Copy link
Contributor

bors commented Nov 6, 2023

☀️ Try build successful - checks-actions
Build commit: 361898b (361898b6c86fdf12241aea3b6d338064507e220a)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (361898b): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.3%, 0.8%] 9
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.2%, -0.2%] 1
Improvements ✅
(secondary)
-0.3% [-0.4%, -0.3%] 3
All ❌✅ (primary) 0.4% [-0.2%, 0.8%] 10

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.9% [0.6%, 12.0%] 9
Regressions ❌
(secondary)
3.3% [2.3%, 4.3%] 2
Improvements ✅
(primary)
-1.7% [-2.8%, -0.6%] 5
Improvements ✅
(secondary)
-0.6% [-0.6%, -0.6%] 1
All ❌✅ (primary) 1.9% [-2.8%, 12.0%] 14

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.4%, 0.7%] 4
Regressions ❌
(secondary)
0.7% [0.7%, 0.7%] 1
Improvements ✅
(primary)
-0.5% [-0.5%, -0.4%] 2
Improvements ✅
(secondary)
-2.4% [-2.4%, -2.4%] 1
All ❌✅ (primary) 0.2% [-0.5%, 0.7%] 6

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.3%] 45
Regressions ❌
(secondary)
0.2% [0.2%, 0.2%] 1
Improvements ✅
(primary)
-0.4% [-0.4%, -0.4%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.4%, 0.3%] 46

Bootstrap: 636.185s -> 634.65s (-0.24%)
Artifact size: 304.50 MiB -> 304.39 MiB (-0.04%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 6, 2023
@kornelski
Copy link
Contributor Author

It looks like a mixed bag again. With try_reserve() hint merged I think it's fine to leave this one out. Users of reserve() can get this benefit by switching to try_reserve().

Maybe eventually debug info/assume in llvm or whatever is costing here will get optimized and this can be revisited.

@kornelski kornelski closed this Nov 6, 2023
@workingjubilee
Copy link
Member

Yeah, that's a much less beneficial diff, whereas the try-reserve-only version was much better on its own (even if it was mostly noise, it had no regressions), so this seems like the right balance.

@mati865
Copy link
Contributor

mati865 commented Nov 6, 2023

~10% instruction count improvements on various benchmarks

Well, that's surprising.

This is likely mostly noise, look at PR that changed only a comment:
image
#115333 (comment)

Similar things has happened to other PRs as well.

@workingjubilee
Copy link
Member

workingjubilee commented Nov 6, 2023

@mati865 It would be nice to have a better idea somehow of how noisy a statistical output is so that we can make better-informed decisions, I feel the current metrics don't adequately capture this thing. I feel like the decisions here were correct anyways, though, because it seems to be that this PR was consistently mixed and the try-reserve version was consistently mildly-positive-or-not-bad.

@mati865
Copy link
Contributor

mati865 commented Nov 7, 2023

It's not noisy typically but lately something odd happened, take a look at this graph:
image

I feel like the decisions here were correct anyways, though, because it seems to be that this PR was consistently mixed and the try-reserve version was consistently mildly-positive-or-not-bad.

I do agree with that, previous runs had no significant noise.
The noise that we observed is "all or nothing", so either there is this two digit variation meaning the result is not meaningful or there is virtually no noise.

One more PR with similar noise: #116412 (comment)

EDIT: Created Zulip thread: https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/Recent.20noise.20spikes

@kornelski kornelski deleted the hint-reserved branch February 3, 2024 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants