Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify MIR semantics of storage statements #99050

Merged
merged 1 commit into from
Jul 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion compiler/rustc_const_eval/src/transform/validate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,13 @@ impl<'a, 'tcx> Visitor<'tcx> for TypeChecker<'a, 'tcx> {
}

if self.reachable_blocks.contains(location.block) && context.is_use() {
// Uses of locals must occur while the local's storage is allocated.
// We check that the local is live whenever it is used. Technically, violating this
// restriction is only UB and not actually indicative of not well-formed MIR. This means
// that an optimization which turns MIR that already has UB into MIR that fails this
// check is not necessarily wrong. However, we have no such optimizations at the moment,
// and so we include this check anyway to help us catch bugs. If you happen to write an
// optimization that might cause this to incorrectly fire, feel free to remove this
// check.
self.storage_liveness.seek_after_primary_effect(location);
let locals_with_storage = self.storage_liveness.get();
if !locals_with_storage.contains(local) {
Expand Down
26 changes: 13 additions & 13 deletions compiler/rustc_middle/src/mir/syntax.rs
Original file line number Diff line number Diff line change
Expand Up @@ -237,19 +237,19 @@ pub enum StatementKind<'tcx> {

/// `StorageLive` and `StorageDead` statements mark the live range of a local.
///
/// Using a local before a `StorageLive` or after a `StorageDead` is not well-formed. These
/// statements are not required. If the entire MIR body contains no `StorageLive`/`StorageDead`
/// statements for a particular local, the local is always considered live.
///
/// More precisely, the MIR validator currently does a `MaybeStorageLiveLocals` analysis to
/// check validity of each use of a local. I believe this is equivalent to requiring for every
/// use of a local, there exist at least one path from the root to that use that contains a
/// `StorageLive` more recently than a `StorageDead`.
///
/// **Needs clarification**: Is it permitted to have two `StorageLive`s without an intervening
/// `StorageDead`? Two `StorageDead`s without an intervening `StorageLive`? LLVM says poison,
/// yes. If the answer to any of these is "no," is breaking that rule UB or is it an error to
/// have a path in the CFG that might do this?
/// At any point during the execution of a function, each local is either allocated or
/// unallocated. Except as noted below, all locals except function parameters are initially
/// unallocated. `StorageLive` statements cause memory to be allocated for the local while
/// `StorageDead` statements cause the memory to be freed. Using a local in any way (not only
/// reading/writing from it) while it is unallocated is UB.
///
/// Some locals have no `StorageLive` or `StorageDead` statements within the entire MIR body.
/// These locals are implicitly allocated for the full duration of the function. There is a
/// convenience method at `rustc_mir_dataflow::storage::always_storage_live_locals` for
/// computing these locals.
///
/// If the local is already allocated, calling `StorageLive` again is UB. However, for an
/// unallocated local an additional `StorageDead` all is simply a nop.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// unallocated local an additional `StorageDead` all is simply a nop.
/// unallocated local an additional `StorageDead` call is simply a nop.

(probably too late for this PR though, it's already rolled up)

Copy link
Member

@RalfJung RalfJung Jul 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should consider double-free to also be UB, just for consistency? The only reason I allow this is that we do generate redundant StorageDead in MIR currently. (Or at least we did last time I checked this.)

MiniRust says double-dead is UB: I think I'd prefer that given that double-free of a heap allocation is also UB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently reported #98896 indicates this is still an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually going to suggest that we make the StorageLive version of this not be UB either. I don't see what we gain from adding the UB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you suggest translating StorageLive to LLVM ir if double StorageLive isn't UB? I believe StorageLive(_1); _1 = 42; StorageLive(_1); _2 = _1; is UB in LLVM ir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be UB in MIR too - the semantics I am suggesting for StorageLive are that calling it when the local is already live is allowed but still fully reallocates the local

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed but still fully reallocates the local

Oh I see. Yes that would satisfy the "reset to uninit" condition.

FWIW a separate StorageReallocate has been suggested in the past for this purpose (#61849, #68622).

Copy link
Member

@RalfJung RalfJung Jul 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still we should get confirmation from LLVM that lifetime.start will not have any effects that flow "backwards" in the execution and make preceding accesses UB -- ideally, to explicitly state in their docs "doing this on an already live local is fine (but reset its contents to be uninitialized)".

Copy link
Contributor Author

@JakobDegen JakobDegen Jul 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ptr is a non-stack-allocated object, it does not point to the first byte of the object or it is a stack object that is already alive, it simply fills all bytes of the object with poison.

https://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic

Seems like they already do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't see the "already alive" part. Good point. And for lifetime.end it says

Calling llvm.lifetime.end on an already dead alloca is no-op.

So yeah this removes my main motivation for making redundant annotations UB in the first place. That means I do not have a strong opinion either way. I don't know if there are any good motivations for doing one or the other, or any optimizations/transformations that would be helped by either choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this in a new issue: #99160

StorageLive(Local),

/// See `StorageLive` above.
Expand Down