forked from rust-lang/rfcs
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
RFC for an operator to take a raw reference
- Loading branch information
Showing
1 changed file
with
146 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
- Feature Name: raw_reference_operator | ||
- Start Date: 2018-11-01 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Introduce a new primitive operator on the MIR level: `&[mut|const] raw <place>` | ||
to create a raw pointer to the given place (this is not surface syntax, it is | ||
just how MIR might be printed). Desugar the surface syntax `&[mut] <place> as | ||
*[mut|const] _` to use this operator, instead of two statements (first take | ||
normal reference, then cast). | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
Currently, if one wants to create a raw pointer pointing to something, one has | ||
no choice but to create a reference and immediately cast it to a raw pointer. | ||
The problem with this is that there are some invariants that we want to attach | ||
to references, that have to *always hold*. (This is not finally decided yet, | ||
but true in practice because of annotations we emit to LLVM. It is also the | ||
next topic of discussion in the | ||
[Unsafe Code Guidelines](https://github.com/rust-rfcs/unsafe-code-guidelines/).) | ||
In particular, references must be aligned and dereferencable, even when they are | ||
created and never used. | ||
|
||
One consequence of these rules is that it becomes essentially impossible to | ||
create a raw pointer pointing to an unaligned struct field: `&packed.field as | ||
*const _` creates an immediate unaligned reference, triggering undefined | ||
behavior because it is not aligned. Similarly, `&(*raw).field as *const _` is | ||
not just computing an offset of the raw pointer `raw`, it also asserts that the | ||
intermediate shared reference is aligned and dereferencable. In both cases, | ||
that is likely not what the author of the code intended. | ||
|
||
To fix this, we propose to introduce a new primitive operation on the MIR level | ||
that, in a single statement, creates a raw pointer to a given place. No | ||
intermediate reference exists, so no invariants have to be adhered to. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
When working with unaligned or potentially dangling pointers, it is crucial that | ||
you always use raw pointers and not references: References come with guarantees | ||
that the compiler assumes are always upheld, and these guarantees include proper | ||
alignment and not being dangling. Importantly, these guarantees must be | ||
maintained even when the reference is created and never used! The following is | ||
UB (assuming `packed` is a variable of packed type): | ||
|
||
```rust | ||
let x = unsafe { &packed.field }; // `x` is not aligned -> undefined behavior | ||
``` | ||
|
||
There is no situation in which the above code is correct, and hence it is a hard | ||
error to write this. | ||
|
||
The only way to create a pointer to an unaligned or dangling location without | ||
triggering undefined behavior is to *immediately* cast it to a raw pointer: | ||
|
||
```rust | ||
let x = unsafe { &packed.field as *const _ }; | ||
``` | ||
|
||
These two operations (taking a reference, casting to a raw pointer) are actually | ||
considered a single operation happening in one step, and hence the invariants | ||
incurred by references do not come into play. | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
When translating HIR to MIR, we recognize `&[mut] <place> as *[mut|const] _` as | ||
a special pattern and turn it into a single MIR `Rvalue` that takes the address | ||
and produces it as a raw pointer -- a "take raw reference" operation. This | ||
might be a variant of the existing `Ref` operation (say, a boolean flag for | ||
whether this is raw), or a new `Rvalue` variant. The borrow checker should do | ||
the usual checks on `<place>`, but can just ignore the result of this operation | ||
and the newly created "reference" can have any lifetime. (Currently this will | ||
be some form of unbounded inference variable because the only use is a | ||
cast-to-raw, the new "raw reference" operation can have the same behavior.) | ||
When translating MIR to LLVM, nothing special has to happen as references and | ||
raw pointers have the same LLVM type anyway; the new operation behaves like | ||
`Ref`. | ||
|
||
When interpreting MIR in the miri engine, the engine will recognize that the | ||
value produced by this `Rvalue` has raw pointer type, and hence must not satisfy | ||
any special invariants. | ||
|
||
When doing unsafety checking, we make references to packed fields that do *not* | ||
use this new "raw reference" operation a *hard error even in unsafe blocks* | ||
(after a transition period). There is no situation in which this code is okay; | ||
it creates a reference that violates basic invariants. Taking a raw reference | ||
to a packed field, on the other hand, is a safe operation as the raw pointer | ||
comes with no special promises. "Unsafety checking" is thus not even a good | ||
term for this, maybe it should be a special pass dedicated to packed fields | ||
traversing MIR, or this can happen when lowering HIR to MIR. This check has | ||
nothing to do with whether we are in an unsafe block or not. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
It might be surprising that the following two pieces of code are not equivalent: | ||
```rust | ||
// Variant 1 | ||
let x = unsafe { &packed.field }; // Undefined behavior! | ||
let x = x as *const _; | ||
// Variant 2 | ||
let x = unsafe { &packed.field as *const _ }; | ||
``` | ||
|
||
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
This is a compromise: I see no reasonable way to translate the first variant | ||
shown in the "Drawbacks" section to a raw reference operation, and the second | ||
variant is so common that we likely do not want to rule it out. Hence the | ||
proposal to make them not equivalent. | ||
|
||
One alternative to introducing a new primitive operation might be to somehow | ||
exempt "references immediately cast to a raw pointer" from the invariant. | ||
However, it is unclear how that information is supposed to be encoded in the | ||
MIR, and how it is to be maintained by optimizations. We believe that the | ||
semantics of a MIR program, including whether it has undefined behavior, should | ||
be deducible by executing it one step at a time. | ||
|
||
Instead of compiling `&[mut] <place> as *[mut|const] _` to a raw reference | ||
operation, we could introduce new surface syntax and keep the existing HIR->MIR | ||
lowering the way it is. However, that would make lots of carefully written | ||
existing code dealing with packed structs have undefined behavior. (There is | ||
likely also lots of code that forgets to cast to a raw pointer, but I see no way | ||
to make that legal -- and the proposal would make such uses a hard error in the | ||
long term, so we should catch many of these bugs.) Also, no good proposal for a | ||
surface syntax has been made yet -- and if one comes up later, this proposal is | ||
forwards-compatible with also having explicit syntax for taking a raw reference | ||
(and deprecating the safe-ref-then-cast way of writing this operation). | ||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
I am not aware of another language with both comparatively strong invariants for | ||
its reference types, and raw pointers. The need for taking a raw reference only | ||
arise because of Rust having both of these features. | ||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
None I can think of. |