Skip to content

Commit

Permalink
RFC for an operator to take a raw reference
Browse files Browse the repository at this point in the history
  • Loading branch information
RalfJung committed Nov 1, 2018
1 parent 7bf6206 commit fd4b4cd
Showing 1 changed file with 146 additions and 0 deletions.
146 changes: 146 additions & 0 deletions text/0000-raw-reference-operator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
- Feature Name: raw_reference_operator
- Start Date: 2018-11-01
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

Introduce a new primitive operator on the MIR level: `&[mut|const] raw <place>`
to create a raw pointer to the given place (this is not surface syntax, it is
just how MIR might be printed). Desugar the surface syntax `&[mut] <place> as
*[mut|const] _` to use this operator, instead of two statements (first take
normal reference, then cast).

# Motivation
[motivation]: #motivation

Currently, if one wants to create a raw pointer pointing to something, one has
no choice but to create a reference and immediately cast it to a raw pointer.
The problem with this is that there are some invariants that we want to attach
to references, that have to *always hold*. (This is not finally decided yet,
but true in practice because of annotations we emit to LLVM. It is also the
next topic of discussion in the
[Unsafe Code Guidelines](https://github.com/rust-rfcs/unsafe-code-guidelines/).)
In particular, references must be aligned and dereferencable, even when they are
created and never used.

One consequence of these rules is that it becomes essentially impossible to
create a raw pointer pointing to an unaligned struct field: `&packed.field as
*const _` creates an immediate unaligned reference, triggering undefined
behavior because it is not aligned. Similarly, `&(*raw).field as *const _` is
not just computing an offset of the raw pointer `raw`, it also asserts that the
intermediate shared reference is aligned and dereferencable. In both cases,
that is likely not what the author of the code intended.

To fix this, we propose to introduce a new primitive operation on the MIR level
that, in a single statement, creates a raw pointer to a given place. No
intermediate reference exists, so no invariants have to be adhered to.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

When working with unaligned or potentially dangling pointers, it is crucial that
you always use raw pointers and not references: References come with guarantees
that the compiler assumes are always upheld, and these guarantees include proper
alignment and not being dangling. Importantly, these guarantees must be
maintained even when the reference is created and never used! The following is
UB (assuming `packed` is a variable of packed type):

```rust
let x = unsafe { &packed.field }; // `x` is not aligned -> undefined behavior
```

There is no situation in which the above code is correct, and hence it is a hard
error to write this.

The only way to create a pointer to an unaligned or dangling location without
triggering undefined behavior is to *immediately* cast it to a raw pointer:

```rust
let x = unsafe { &packed.field as *const _ };
```

These two operations (taking a reference, casting to a raw pointer) are actually
considered a single operation happening in one step, and hence the invariants
incurred by references do not come into play.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

When translating HIR to MIR, we recognize `&[mut] <place> as *[mut|const] _` as
a special pattern and turn it into a single MIR `Rvalue` that takes the address
and produces it as a raw pointer -- a "take raw reference" operation. This
might be a variant of the existing `Ref` operation (say, a boolean flag for
whether this is raw), or a new `Rvalue` variant. The borrow checker should do
the usual checks on `<place>`, but can just ignore the result of this operation
and the newly created "reference" can have any lifetime. (Currently this will
be some form of unbounded inference variable because the only use is a
cast-to-raw, the new "raw reference" operation can have the same behavior.)
When translating MIR to LLVM, nothing special has to happen as references and
raw pointers have the same LLVM type anyway; the new operation behaves like
`Ref`.

When interpreting MIR in the miri engine, the engine will recognize that the
value produced by this `Rvalue` has raw pointer type, and hence must not satisfy
any special invariants.

When doing unsafety checking, we make references to packed fields that do *not*
use this new "raw reference" operation a *hard error even in unsafe blocks*
(after a transition period). There is no situation in which this code is okay;
it creates a reference that violates basic invariants. Taking a raw reference
to a packed field, on the other hand, is a safe operation as the raw pointer
comes with no special promises. "Unsafety checking" is thus not even a good
term for this, maybe it should be a special pass dedicated to packed fields
traversing MIR, or this can happen when lowering HIR to MIR. This check has
nothing to do with whether we are in an unsafe block or not.

# Drawbacks
[drawbacks]: #drawbacks

It might be surprising that the following two pieces of code are not equivalent:
```rust
// Variant 1
let x = unsafe { &packed.field }; // Undefined behavior!
let x = x as *const _;
// Variant 2
let x = unsafe { &packed.field as *const _ };
```

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

This is a compromise: I see no reasonable way to translate the first variant
shown in the "Drawbacks" section to a raw reference operation, and the second
variant is so common that we likely do not want to rule it out. Hence the
proposal to make them not equivalent.

One alternative to introducing a new primitive operation might be to somehow
exempt "references immediately cast to a raw pointer" from the invariant.
However, it is unclear how that information is supposed to be encoded in the
MIR, and how it is to be maintained by optimizations. We believe that the
semantics of a MIR program, including whether it has undefined behavior, should
be deducible by executing it one step at a time.

Instead of compiling `&[mut] <place> as *[mut|const] _` to a raw reference
operation, we could introduce new surface syntax and keep the existing HIR->MIR
lowering the way it is. However, that would make lots of carefully written
existing code dealing with packed structs have undefined behavior. (There is
likely also lots of code that forgets to cast to a raw pointer, but I see no way
to make that legal -- and the proposal would make such uses a hard error in the
long term, so we should catch many of these bugs.) Also, no good proposal for a
surface syntax has been made yet -- and if one comes up later, this proposal is
forwards-compatible with also having explicit syntax for taking a raw reference
(and deprecating the safe-ref-then-cast way of writing this operation).

# Prior art
[prior-art]: #prior-art

I am not aware of another language with both comparatively strong invariants for
its reference types, and raw pointers. The need for taking a raw reference only
arise because of Rust having both of these features.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

None I can think of.

0 comments on commit fd4b4cd

Please sign in to comment.