Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use undef for uninitialized bytes in constants #83698

Merged
merged 10 commits into from
Aug 26, 2021

Conversation

erikdesjardins
Copy link
Contributor

@erikdesjardins erikdesjardins commented Mar 31, 2021

Fixes #83657

This generates good code when the const is fully uninit, e.g.

#[no_mangle]
pub const fn fully_uninit() -> MaybeUninit<[u8; 10]> {
    const M: MaybeUninit<[u8; 10]> = MaybeUninit::uninit();
    M
}

generates

fully_uninit:
	ret

as you would expect.

There is no improvement, however, when it's partially uninit, e.g.

pub struct PartiallyUninit {
    x: u64,
    y: MaybeUninit<[u8; 10]>
}

#[no_mangle]
pub const fn partially_uninit() -> PartiallyUninit {
    const X: PartiallyUninit = PartiallyUninit { x: 0xdeadbeefcafe, y: MaybeUninit::uninit() };
    X
}

generates

partially_uninit:
	mov	rax, rdi
	mov	rcx, qword ptr [rip + .L__unnamed_1+16]
	mov	qword ptr [rdi + 16], rcx
	movups	xmm0, xmmword ptr [rip + .L__unnamed_1]
	movups	xmmword ptr [rdi], xmm0
	ret

.L__unnamed_1:
	.asciz	"\376\312\357\276\255\336\000"
	.zero	16
	.size	.L__unnamed_1, 24

which copies a bunch of zeros in place of the undef bytes, the same as before this change.

Edit: generating partially-undef constants isn't viable at the moment anyways due to #84565, so it's disabled

@rust-highfive
Copy link
Collaborator

r? @estebank

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 31, 2021
@oli-obk
Copy link
Contributor

oli-obk commented Mar 31, 2021

r? @oli-obk

@rust-highfive rust-highfive assigned oli-obk and unassigned estebank Mar 31, 2021
@oli-obk
Copy link
Contributor

oli-obk commented Mar 31, 2021

cc @rust-lang/wg-llvm we're now marking undef bytes from constants as undef, just like we mark padding in constructed values.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 1, 2021

@bors r+

@bors
Copy link
Contributor

bors commented Apr 1, 2021

📌 Commit be66c2e has been approved by oli-obk

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 1, 2021
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Apr 1, 2021
Use undef for uninitialized bytes in constants

Fixes rust-lang#83657

This generates good code when the const is fully uninit, e.g.

```rust
#[no_mangle]
pub const fn fully_uninit() -> MaybeUninit<[u8; 10]> {
    const M: MaybeUninit<[u8; 10]> = MaybeUninit::uninit();
    M
}
```
generates
```asm
fully_uninit:
	ret
```

as you would expect.

There is no improvement, however, when it's partially uninit, e.g.

```rust
pub struct PartiallyUninit {
    x: u64,
    y: MaybeUninit<[u8; 10]>
}

#[no_mangle]
pub const fn partially_uninit() -> PartiallyUninit {
    const X: PartiallyUninit = PartiallyUninit { x: 0xdeadbeefcafe, y: MaybeUninit::uninit() };
    X
}
```
generates
```asm
partially_uninit:
	mov	rax, rdi
	mov	rcx, qword ptr [rip + .L__unnamed_1+16]
	mov	qword ptr [rdi + 16], rcx
	movups	xmm0, xmmword ptr [rip + .L__unnamed_1]
	movups	xmmword ptr [rdi], xmm0
	ret

.L__unnamed_1:
	.asciz	"\376\312\357\276\255\336\000"
	.zero	16
	.size	.L__unnamed_1, 24
```
which copies a bunch of zeros in place of the undef bytes, the same as before this change.

The constant is undef where it should be, though (see the added test) so this seems to be a missed optimization in LLVM.
@erikdesjardins
Copy link
Contributor Author

I don't think this should be rolled up, it could affect perf or expose existing UB (because previously reading uninitialized bytes from constants at runtime would "work")

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Apr 2, 2021
Use undef for uninitialized bytes in constants

Fixes rust-lang#83657

This generates good code when the const is fully uninit, e.g.

```rust
#[no_mangle]
pub const fn fully_uninit() -> MaybeUninit<[u8; 10]> {
    const M: MaybeUninit<[u8; 10]> = MaybeUninit::uninit();
    M
}
```
generates
```asm
fully_uninit:
	ret
```

as you would expect.

There is no improvement, however, when it's partially uninit, e.g.

```rust
pub struct PartiallyUninit {
    x: u64,
    y: MaybeUninit<[u8; 10]>
}

#[no_mangle]
pub const fn partially_uninit() -> PartiallyUninit {
    const X: PartiallyUninit = PartiallyUninit { x: 0xdeadbeefcafe, y: MaybeUninit::uninit() };
    X
}
```
generates
```asm
partially_uninit:
	mov	rax, rdi
	mov	rcx, qword ptr [rip + .L__unnamed_1+16]
	mov	qword ptr [rdi + 16], rcx
	movups	xmm0, xmmword ptr [rip + .L__unnamed_1]
	movups	xmmword ptr [rdi], xmm0
	ret

.L__unnamed_1:
	.asciz	"\376\312\357\276\255\336\000"
	.zero	16
	.size	.L__unnamed_1, 24
```
which copies a bunch of zeros in place of the undef bytes, the same as before this change.

The constant is undef where it should be, though (see the added test) so this seems to be a missed optimization in LLVM.
@RalfJung
Copy link
Member

RalfJung commented Apr 2, 2021

@bors rollup=never

@Dylan-DPC might be better to re-roll that rollup... sorry for this.

@bors
Copy link
Contributor

bors commented Apr 2, 2021

⌛ Testing commit be66c2e with merge 24f957fcbe9f9432f0c74cc707a5f008268f4984...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Apr 2, 2021

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 2, 2021
@erikdesjardins
Copy link
Contributor Author

erikdesjardins commented Apr 6, 2021

wc -c < /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/wasm-panic-small/wasm-panic-small/foo.wasm
1139
[ "`wc -c < /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/wasm-panic-small/wasm-panic-small/foo.wasm`" -lt "1024" ]

When building this test locally, foo.wasm is never below 1024 bytes (which is strange), but there is a regression: on master foo.wasm is 6k, and with these changes it's 140k (!). It seems to be defeating some optimization in LLVM, so we're no longer able to eliminate a bunch of formatting code. I'm trying to figure out why.

It may be worth a perf run to see if this change as-is causes widespread regressions (I suspect it will), or if that wasm test is just fragile for some reason

@nagisa
Copy link
Member

nagisa commented Apr 6, 2021 via email

@RalfJung
Copy link
Member

RalfJung commented Apr 6, 2021

@bors try @rust-timer queue

@oli-obk
Copy link
Contributor

oli-obk commented Aug 26, 2021

@bors r=RalfJung,oli-obk

@bors
Copy link
Contributor

bors commented Aug 26, 2021

📌 Commit adf3b01 has been approved by RalfJung,oli-obk

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 26, 2021
@bors
Copy link
Contributor

bors commented Aug 26, 2021

⌛ Testing commit adf3b01 with merge 20997f6...

@bors
Copy link
Contributor

bors commented Aug 26, 2021

☀️ Test successful - checks-actions
Approved by: RalfJung,oli-obk
Pushing 20997f6 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 26, 2021
@bors bors merged commit 20997f6 into rust-lang:master Aug 26, 2021
@rustbot rustbot added this to the 1.56.0 milestone Aug 26, 2021
@erikdesjardins erikdesjardins deleted the undefconst branch August 26, 2021 16:25
@the8472
Copy link
Member

the8472 commented Sep 17, 2021

While the perf results show improvements in instructions counts this looks like a significant (non-noise) regression in wall times.

The jump is noticable in the timelines of encodings-opt or await-call-tree-opt helloworld-opt

Labeling as regression so that the perf team can have a look at it.

@the8472 the8472 added the perf-regression Performance regression. label Sep 17, 2021
@oli-obk
Copy link
Contributor

oli-obk commented Sep 18, 2021

the await call tree opt change is in #88308 not in this PR

there are a few other noticeable regressions in the timelines though, yes.

Most of the non-incremental regressions I looked at seem to point to LLVM spending more time, so this may either end up with better optimized code or just make it harder for LLVM to figure things out.

@the8472
Copy link
Member

the8472 commented Sep 18, 2021

hello-world-check regressed too (especially incr-unchanged), that shouldn't involve llvm.

@erikdesjardins
Copy link
Contributor Author

erikdesjardins commented Oct 1, 2021

I tried to reproduce this locally but was unable to. I don't see a regression in wall-time, nor when running under cachegrind. Cachegrind actually shows a slight improvement for encoding-opt.

(For posterity, using e.g. cargo +nightly run --release --bin collector -- diff_local cachegrind --include helloworld --builds Opt --runs IncrUnchanged +3b3ce374d203445eb1d0dce50f4211f4aceb7db6 +20997f6ad81721542e9ef97bb2f58190903a34d8, patched to use --cache-sim=yes --branch-sim=yes)

So, I decided to look at the timeline now that more time has passed and...it seems kinda weird.

Zoomed out, it's not that unusual. This PR introduced a regression and then a later change fixed it--right?

image

But if we zoom in on the point where wall-time returns to normal, it doesn't make sense:

First, it bounces between slow and fast before settling on fast. Which seems implausible--it's not likely that we introduced a perf improvement, then cancelled it out in the next commit, then reintroduced it three commits later.

Second, the two commits responsible for those two improvements don't make sense:

  • [A] 38e5764: This is a perf improvement, but it just avoids executing code. So for a >10% decrease in wall-time you'd expect a significant decrease in instruction count, but helloworld-opt only has -0.02%.
  • [B] 9dd4ce8: This is a cargo update and none of the cargo changes seem related to perf.

(All helloworld and encoding benchmarks exhibit this behavior. And probably other ones but I didn't check.)

I'm not sure what to make of this. Some intermittent issue with the perf collector that makes wall-time measurements longer?


Actually, while typing this up, I think I figured out what it is.

See this comment (on [B] 9dd4ce8) about the cpu governor being reset to ondemand on the perf machine: #88956 (comment)

For any perf triagers noticing the bootstrap and wall time improvement from this PR, it is almost certainly due to enabling performance rather than on demand cpu frequency governor on the perf collector. I think this got disabled by accident at some point (likely after a reboot).

#88308 was merged immediately after this PR. It too had some (unexplainable) wall-time regressions, in await-call-tree, deeply-nested-closures, and deeply-nested-async.

Similarly, if you look at the timeline for those benchmarks:

image

And zoom in:

The improvements are due to [A] 38e5764 and [C] a58db2e (merged right after [B] 9dd4ce8).

So I think this is what happened:

  • This PR is merged (20997f6). Halfway though the perf run, the cpu governor gets reset, regressing the wall-time of remaining benchmarks.

  • Morph layout_raw query into layout_of. #88308 is merged right after (4b9f4b2). It "regresses" the benchmarks that the previous run had already completed* before the governor changed.

...time passes...

  • [B] 9dd4ce8 hits the perf collector. Halfway through, the cpu governor gets fixed, undoing the regression in some benchmarks.

  • [C] a58db2e comes right after, undoing it in the rest.

And judging from the timestamp on this comment, [A] 38e5764 was missed initially and manually re-enqueued later. This would explain the downward spike that appears everywhere--although it was committed earlier, its perf run occurred later, after the governor was fixed.


* The regressions attributed to this PR (encoding, helloworld) come (alphabetically) after the regressions attributed to #88308 (await-call-tree, deeply-nested-*), which makes sense because the collector runs benchmarks in alphabetical order. And the same for the perf improvements attributed to [B] and [C].

@the8472
Copy link
Member

the8472 commented Oct 8, 2021

That seems like a reasonable explanation, marking as triaged.

@the8472 the8472 added the perf-regression-triaged The performance regression has been triaged. label Oct 8, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 25, 2022
Use undef for (some) partially-uninit constants

There needs to be some limit to avoid perf regressions on large arrays
with undef in each element (see comment in the code).

Fixes: rust-lang#84565
Original PR: rust-lang#83698

Depends on LLVM 14: rust-lang#93577
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Returning MaybeUninit consts generates code to zero out the return value.