Validity of references: Memory-related properties #77

RalfJung · 2019-01-10T13:27:42Z

Discussing the memory-related properties of references: does &T have to point to allocated memory (with at least size_of::<T>() bytes being allocated)? If yes, does the memory have to contain data that satisfies the validity invariant of T?

If the answer to both of these questions is "yes", one consequence is that &! is uninhabited: There is no valid reference of type &!.

Currently, during LLVM lowering, we add a "dereferencable" attribute to references, indicating that the answer to the first question should be "yes". This is a rather unique case in that this is the only case where validity depends on the contents of memory. This opens some new, interesting questions:

I mentioned above that size_of::<T>() many bytes need to be dereferencable. How do we handle unsized types? We could determine the size according to the metadata and the type of the unsized tail. For slices, that's really easy, but for trait objects this involves the vtable, so it would introduce yet another kind of dependy of validity on the memory. However, vtables must not be modified, and they never deallocated (right?), so this is a fairly weak form of dependency where if a pointer was a valid vtable pointer once, then it always will be.

With more exotic forms of unsized types, this becomes less easy. extern type we can mostly ignore, we cannot even dynamically know their size so we basically can just assume it is 0, and check dereferencability for that. But what about custom DST? I don't think we want to make validity depend on executing arbitrary user-defined code. We could just check validity for the sized prefix of this unsized type, but that would introduce an inconsistency between primitive DST and user-defined custom DST. Is that a problem?

For unsized types, even the requirement that the pointer be well-aligned becomes subtle because determining alignment has similar issues than determining the size.
What about validity of ManuallyDrop<&T>? ManuallyDrop<T> certainly shares all the bit-level properties of T, because we perform layout optimization on it. But does ManuallyDrop<&T> have to be dereferencable?

Note that this is not about aliasing or provenance; those should be discussed separately -- a bunch of open issues already exist for provenance in general and stacked borrows specifically.

CURRENT STATE: The thread is long and there were many positions without a good summary. My own latest position can be found here.

The text was updated successfully, but these errors were encountered:

RalfJung · 2019-01-10T13:31:26Z

Notice that there is an alternative that would make validity not depend on memory, while maintaining the dereferencable attribute: the Stacked Borrows aliasing model includes an operation on references called "retagging" that, among other things, raises UB if the reference is not dereferencable. So, if we answer the two questions from the OP with "yes" and "no", respectively, we could equivalently say that validity does not make any requirements about references being dereferencable, but the aliasing model does. That would make validity be a property that only depends on raw bits, not on memory, which would simplify the discussion elsewhere (and resolve #50 (comment)).

With this approach, the properties of ManuallyDrop<&T> would be determined by whether retagging descends into the fields of ManuallyDrop or not.

RalfJung · 2019-01-10T13:58:04Z

Concerning the second question, my personal thinking is that we should not require the pointed-to memory to be valid itself.

One good argument for making things UB with a strict validity invariant is bug-checking tools, but in this case actually doing recursive checking of references all the time is really costly, and probably makes it near impossible to actually develop useful tools.

On the other hand, it is very useful when writing unsafe code to be able to pass around a &mut T to some uninitialized data, and have another function write into that reference to initialize it. If we say that valid references must point to valid data, this pattern becomes UB. As a consequence, then, we should offer tons of new APIs in libstd that take raw pointers instead of references, so that they can be used for initialization.

Some examples:

nikomatsakis · 2019-01-31T16:45:15Z

So @arielb1, for example, has traditionally maintained that having &T require that its referent is valid would invalidate far too much code. I'm inclined to agree. I think that our idea for ! patterns kind of assuaged my concerns about how to handle &!, so I feel comfortable with making the validity invariant shallow. (The argument about &mut T to an uninitialized T is also strong.)

I am also intrigued by this comment from @RalfJung :

Notice that there is an alternative that would make validity not depend on memory,

That seems like a very good property to have. I am inclined to pursue this approach, personally.

nagisa · 2019-02-04T14:12:26Z

I (personally) think that not considering & to be "pointers" is the only sensible solution here (similar to how C++ does it, references behave more like plain values rather than pointers). My motivating example is that given a function like this:

fn generic<T>(foo: &T) {
    // body
}

it is way too easy to end up with something that will most likely indefinitely stay UB for T = ! where code would otherwise be valid for all other Ts. Making &! uninhabited avoids this problem altogether and we may be able to relax this later on if we figure out that:

There are convincing use cases for &! not be uninhabited;
Figure out how to make all the safe constructs for &T be defined with &!.

arielb1 · 2019-02-04T17:31:16Z

it is way too easy to end up with something that will most likely indefinitely stay UB for T = ! where code would otherwise be valid for all other Ts.

Could you come up with such an example - that is UB for T = ! but not UB for say T = bool and foo containing a pointer to an invalid bool?

nagisa · 2019-02-04T19:56:36Z

@arielb1 I do not think I’m able to come up with an example (is there one?) where it would not be UB if bool had invalid bit pattern, but at least it is possible to produce a reference to anything valid at all for &bool.

I realized since I last wrote the comment that, in order to obtain &!, unsafe code is necessary one way or the other (even though @RalfJung says this is a property of the safety system, not value validity system, and those are independent). With that in mind, I’m fine with whatever ends up being decided here.

RalfJung · 2019-02-05T11:52:37Z

We talked about this at the all-hands. @cramertj expressed interest in &! being uninhabited to be able to optimize functions away for being dead code. @Centril noted that in particular related to matches, if we just make validity of &T recursive, there is no question about automatically going below reference types in a match, such as in

fn foo<T>(x: &!) -> T { match x { } }

Even in unsafe code, the match can never cause issues on its own, the reference would already be invalid and hence you'd have UB earlier.

I believe we should handle all types consistently, meaning that if &! is uninhabited (from a validity perspective, not just from a safety perspective), then we should also say that &bool is UB if it does not point to a valid bool, and so on.

One issue with this is that this makes validity very hard to check for in a UB checker like Miri, or in a valgrind tool. You'd have to do a recursive walk following all the pointers. Also, it is unclear how much optimizations benefit from this (beyond removing dead code for &!) because a value that used to be valid at some point, might become invalid later when the contents of memory change.
Also, new hard questions then pop up about the interaction with Stacked Borrows, where I think it might be hard to make sure that transitively through the pointer chain, all the aliasing works out the right way. Retagging is currently a key ingredient for this, but if we do this transitively we'd have to Retag references that are stored in memory, which I don't think we want to do -- magically modifying memory seems like a bad idea.

RalfJung · 2019-02-05T11:59:08Z

it is way too easy to end up with something that will most likely indefinitely stay UB for T = ! where code would otherwise be valid for all other Ts. Making &! uninhabited avoids this problem altogether.

I don't understand what you are saying here. Making &! uninhabited makes strictly more programs UB? How is that supposed to solve problems with programs being UB?

RalfJung · 2019-02-05T12:05:03Z

Also one thing @Centril brought up at the all-hands: we need more data. In particular, we should figure out if there are interesting patterns of unsafe code that rely on having references to invalid data, and that would be too disruptive to convert to raw pointers or too widely used to break.

RalfJung · 2019-02-08T11:20:05Z

One issue with requiring references to be transitively valid: we have a whole bunch of existing reference-based APIs, such as for slices, that we could then not use. I expect this to cause a lot of trouble with existing code, but I am not sure.

Another proposal for references that enables @cramertj's optimizations could be: if reference's validity depends on memory in complex ways, we will need a notion of "bitstring validity". (Avoiding that is one argument for shallow validity, IMO.) We could define validity of a reference to require that the pointee is bitstring valid. This makes checking validity feasible and enables some optimizations. However, it would mean that &! is uninhabited while &&! is not.

CAD97 · 2019-03-13T14:33:16Z

Another data point is rust-lang/rfcs#2645 (FCP-merge) which theoretically will allow transmuting between &mut T and &mut MaybeUninit<T>, which removes some of the pressure of wanting to use &mut T to uninitialized memory.

I'm in favor of the validity invariant of &_ being shallow but the borrowing model requiring some amount of deep validity (though it could potentially be just one dereference deep) at most usage of the type, including reborrows for function calls..

RalfJung · 2019-03-13T16:03:53Z

@CAD97 that RFC is not necessary for the transmute you mentioned -- it is only needed if we want to pass things by value. In memory, T and MaybeUninit<T> are already the same (as in, they have the same size and alignment; there might be a different for layout optimizations).

petertodd · 2019-05-21T05:53:30Z

Data point for a use-case where a reference to invalid data is useful: https://users.rust-lang.org/t/am-i-triggering-undefined-behavior-here/28364/10

tl;dr:

/// A byte array with the same size as type `T`
///
/// If `T: !Sized`, this size may be dynamic.
#[repr(packed)]
pub struct Bytes<T: ?Sized> {
    buf: ManuallyDrop<T>,
}

where Bytes<T> is never created directly, but rather is always used via a reference such as &Bytes<T> or &mut Bytes<T>.

Counterpoint: if MaybeUninit<T> accepted T: ?Sized the reference would be valid, and thus this wouldn't be an issue.

gnzlbg · 2019-05-21T08:17:40Z

@petertodd can you walk me through why do you think that &Byte<T> is invalid in that case? I don't see any UB in the code you show in the post.

RalfJung · 2019-05-21T09:04:15Z

"A byte array with the same size as type T" where T: ?Sized... that's weird.^^

We don't have a story currently for uninitialized unsized data -- and given some of the plans for custom DSTs, it will not be possible to support that in general (namely, for "thin" DSTs).

gnzlbg · 2019-05-21T09:10:18Z

I don't follow. I don't see anything wrong with Bytes<T>. IIUC, @petertodd always uses it when T is properly initialized. If anything, there would be something wrong in the conversions from &[u8] to &Bytes<T> and vice-versa, but I don't see why @petertodd cannot use &[MaybeUninit<u8>] there instead, although depending on #71 that might not be necessary and @petertodd code might be correct.

RalfJung · 2019-05-21T09:12:46Z

@gnzlbg the issue is in this function:

impl<T> Bytes<T> {
    pub fn from_slice(buf: &[u8]) -> &Bytes<T> {
        assert_eq!(buf.len(), mem::size_of::<T>());
        unsafe {
            &*Self::from_ptr(buf.as_ptr() as *const T)
        }   
    }
}

I can use that to turn a &[u8; 1] into a &Bytes<bool> even if the array contains a 3.

gnzlbg · 2019-05-21T09:15:06Z

That function should be unsafe, and then the caller needs to assert the validity of the buf, or is there an expectation that we will be ever able to do better than that ?

RalfJung · 2019-05-21T09:22:14Z

If we don't make validity of references recursive (and my opinion still is that we should not :D ), then that code is just fine.

petertodd · 2019-05-22T13:09:26Z

@gnzlbg So the full use-case where I came up with that beast is in-place "deserialization", e.g. for memmapped files (which as @RalfJung correctly noted elsewhere have issues with other processes changing them, but for sake of argument assume that has been solved).

As we know, for many types not all bit-patterns are valid. Thus we can have an API along the lines of:

unsafe trait Validate {
   type Error;
   /// Validate that `buf` represents a valid instance of `Self`.
   fn validate(buf: &Bytes<Self>) -> Result<(), Self::Error>;
}

The validate() function is safe to call, because Bytes can only be (safely) created from byte slices for which all bytes in the slice are initialized; Validate is unsafe to implement, because other code depends on it actually working.

This is why Bytes has a safety problem: the whole point is to verify that the bytes are valid for the type in question.

Unsized Types

So how do unsized types fit into all this? See https://users.rust-lang.org/t/am-i-triggering-undefined-behavior-here/28364/9

As @RalfJung correctly notes, it's not clear that you can always/will always be able to determine the size of an unsized type value from pointer metadata. However it's certainly possible to do this for a subset of types, such as slices. So I simply made a Pointee trait for the subset of types that can do this - essentially implementing part of what a custom DST API would do.

But MaybeUninit doesn't (yet) support unsized types, which leads me to this problem.

Alternative Solution

Wrap a pointer instead:

struct Bytes<'a, T: ?Sized> {
    marker: PhantomData<&'a ()>,
    ptr: *const T,
}

Which is fine for me - not quite as nice an API for what I'm doing, as I'll need BytesMut etc., but it'll work. I'm just bringing all this up to give an example of a potential use-case where validity of references matters.

RalfJung · 2019-05-27T07:39:16Z

Another case of libstd manifesting invalid references.

RalfJung · 2019-05-31T08:03:23Z

See rust-lang/rust-memory-model#2 for some earlier discussion of basically the same subject.

According to the current Rust Reference [1], storing an uninitialized `u8` is undefined behavior. This may change in the future [2], but for now we should continue to assume it is undefined behavior. Every use of `core::mem::uninitialized` in `ufmt` is to create a local `[u8; _]`, and therefore is an example of this undefined behavior. I removed the undefined behavior in the simplest way possible, which is to replace the initializers with `[u8; _]`. [1] https://doc.rust-lang.org/reference/behavior-considered-undefined.html [2] rust-lang/unsafe-code-guidelines#77

306: Remove uses of `core::mem::uninitialized` from `ufmt`. r=jrvanwhy a=jrvanwhy According to the current Rust Reference [1], storing an uninitialized `u8` is undefined behavior. This may change in the future [2], but for now we should continue to assume it is undefined behavior. Every use of `core::mem::uninitialized` in `ufmt` is to create a local `[u8; _]`, and therefore is an example of this undefined behavior. I removed the undefined behavior in the simplest way possible, which is to replace the initializers with `[u8; _]`. [1] https://doc.rust-lang.org/reference/behavior-considered-undefined.html [2] rust-lang/unsafe-code-guidelines#77 Co-authored-by: Johnathan Van Why <jrvanwhy@google.com>

interpret/validity: reject references to uninhabited types According to https://doc.rust-lang.org/reference/behavior-considered-undefined.html, this is definitely UB. And we can check this without actually looking up anything in memory, we just need the reference value and its type, making this a great candidate for a validity invariant IMO and my favorite resolution of rust-lang/unsafe-code-guidelines#77. With this PR, Miri with `-Zmiri-check-number-validity` implements all my preferred options for what the validity invariants of our types could be. :) CTFE has been doing recursive checking anyway, so this is backwards compatible but might change the error output. I will submit a PR with the new Miri tests soon. r? `@oli-obk`

JakobDegen · 2022-07-09T00:51:19Z

I'm going to reproduce a comment I made over in #346 here, since I believe it is a new point in favor of requiring some amount of validity on Retag operations:

So I'm going to contest the claim that there's no known optimization benefit to this. The code below gives an example. It uses & instead of &mut but presumably the intent is that this should apply there equally. The code is nearly identical to code I gave in the write-up linked above, although the motivation is different.
enum E {
    A(UnsafeCell<u8>),
    B(u8),
}

let b = E::B(0);
opaque_func(&b);
assert_eq!(b, E::B(0));
The goal is to optimize out the assert. However, it is unclear how to justify this optimization. Specifically, we pass a &E to the opaque function, but we must somehow know that if the "active variant" of x is A, then the passed reference has write permission to the second byte of the enum, but if the active variant is B, then it only has read permission. The obvious way to implement this is to read from memory and check every time we do a retag operation.

This necessarily means that retagging (which happens basically every time a reference is copied) must assert validity conditions in at least some cases. We have some control over when this happens though. For example, there's no need to assert validity when retagging a &u8.

There are alternatives to this model as well - for example, we could try and adjust the aliasing model to make this unnecessary, by having some rule like ("retagged references get the same permissions as their parents" or such). It's unclear how precisely to do this though - a couple naive ideas I've thought of don't work - and so this would probably need quite some investigation.

RalfJung · 2022-07-09T19:00:18Z

Indeed, that is the canonical example for having fully precise UnsafeCell tracking in Stacked Borrows. That's a different issue, #236, but it bears some relation to this one: doing fully precise UnsafeCell tracking requires checking some amount of validity behind references, on a retag. It would not require checking validity of &bool though, so your optimization is not an argument in favor of general validity-behind-references.

That said, I would find it odd to have some validity but not full validity. So if we want fully precise UnsafeCell tracking then I think we should also do proper validation. Then we could still either do that checking just one level deep, or fully recursively.

RalfJung · 2022-07-12T12:55:38Z

(Relaying from #346 (comment))

I think a general "check validity behind references" is not possible. Consider:

// This function can be called with `rc` and `unique` pointing to
// overlapping regions of memory.
fn evil_alias<T>(rc: &RefCell<T>, unique: &mut T) {}

fn proof<T>(x: T) {
    let rc = RefCell::new(x);
    let mut bmut = rc.borrow_mut();
    evil_alias(&rc, &mut *bmut);
}

Now when evil_alias is called, to validate that rc points to a valid RefCell<T>, we have to read the T in there. But that means we are reading from memory that unique points to, with a different pointer, which violates the uniqueness assumptions of &mut!

RalfJung · 2023-06-06T20:10:17Z

This issue has had way too much discussion to still be useful. It is replaced by

RalfJung added active discussion topic A-validity Topic: Related to validity invariants labels Jan 10, 2019

This was referenced Jan 22, 2019

Tracking issue for RFC 1892, "Deprecate uninitialized in favor of a new MaybeUninit type" rust-lang/rust#53491

Closed

causes UB by calling mem::zeroed on any type Diggsey/rust-field-offset#2

Closed

RalfJung mentioned this issue Jan 31, 2019

RFC for a formalized notion on where to enforce reference propertes in MIR rust-lang/rfcs#2631

Closed

RalfJung mentioned this issue Apr 8, 2019

Functions with uninhabited return values codegen trap instead of unreachable rust-lang/rust#59793

Open

RalfJung mentioned this issue May 27, 2019

avoid creating Boxes of uninitalized values in RawVec rust-lang/rust#61230

Merged

RalfJung mentioned this issue May 31, 2019

Passing safe references to empty enums rust-lang/rust-memory-model#2

Closed

jrvanwhy mentioned this issue May 20, 2021

Remove uses of core::mem::uninitialized from ufmt. tock/libtock-rs#306

Merged

dtolnay mentioned this issue May 21, 2021

Functions called by a may_dangle Drop impl #283

Open

RalfJung mentioned this issue Nov 14, 2021

Add option to flag uninitialized integers as UB rust-lang/miri#1340

Closed

This was referenced Dec 9, 2021

Miri detects a violation of stacked borrows when joining &str with -Zmiri-tag-raw-pointers rust-lang/rust#91574

Closed

Miri should catch uses of slice::from_raw_parts on uninitialized memory rust-lang/miri#1240

Open

nico-abram mentioned this issue Jan 30, 2022

Does writing through a raw pointer with an assignment assert validity for Copy types? #317

Closed

RalfJung mentioned this issue Feb 4, 2022

Tracking Issue for pointer methods returning MaybeUninit<T> rust-lang/rust#75402

Open

3 tasks

digama0 mentioned this issue May 10, 2022

Is it UB for a uninhibited type, like the never type, value to exist? #337

Open

This was referenced May 15, 2022

Adding methods for accepting &mut [MaybeUninit<u8>] tokio-rs/mio#1574

Closed

interpret/validity: reject references to uninhabited types rust-lang/rust#97116

Merged

RalfJung mentioned this issue Jun 2, 2022

!nonnull not emitted when loading the field of nonnull rust-lang/rust#97161

Open

This was referenced Jun 30, 2022

Keep valid scalar ranges attached to newtypes when dealing with their inner field. rust-lang/rust#97261

Closed

Unsound unsafe usages denoland/deno#15020

Open

CAD97 mentioned this issue Jul 5, 2022

Document current justification for not requiring recursive reference validity (in particular, &mut uninit not being immediate UB) #346

Open

5225225 mentioned this issue Aug 11, 2022

set_len on a Vec<u8> of uninit is UB hsivonen/encoding_rs#79

Open

5225225 mentioned this issue Aug 30, 2022

Reference to uninitialized memory is not caught rust-lang/miri#2518

Closed

cuviper mentioned this issue Mar 2, 2023

Potential unsoundness in DrainProducer around drop_in_place on mutable reference rayon-rs/rayon#1029

Closed

saethlin mentioned this issue Mar 22, 2023

Add CastKind::Transmute to MIR rust-lang/rust#108442

Merged

RalfJung mentioned this issue Jun 6, 2023

Do we have full recursive validity for references? #412

Open

RalfJung closed this as completed Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validity of references: Memory-related properties #77

Validity of references: Memory-related properties #77

RalfJung commented Jan 10, 2019 •

edited

Loading

RalfJung commented Jan 10, 2019

RalfJung commented Jan 10, 2019

nikomatsakis commented Jan 31, 2019

nagisa commented Feb 4, 2019 •

edited

Loading

arielb1 commented Feb 4, 2019

nagisa commented Feb 4, 2019

RalfJung commented Feb 5, 2019

RalfJung commented Feb 5, 2019

RalfJung commented Feb 5, 2019

RalfJung commented Feb 8, 2019 •

edited

Loading

CAD97 commented Mar 13, 2019 •

edited

Loading

RalfJung commented Mar 13, 2019

petertodd commented May 21, 2019

gnzlbg commented May 21, 2019

RalfJung commented May 21, 2019

gnzlbg commented May 21, 2019 •

edited

Loading

RalfJung commented May 21, 2019

gnzlbg commented May 21, 2019

RalfJung commented May 21, 2019

petertodd commented May 22, 2019

RalfJung commented May 27, 2019

RalfJung commented May 31, 2019

JakobDegen commented Jul 9, 2022

RalfJung commented Jul 9, 2022 •

edited

Loading

RalfJung commented Jul 12, 2022 •

edited

Loading

RalfJung commented Jun 6, 2023 •

edited

Loading

Validity of references: Memory-related properties #77

Validity of references: Memory-related properties #77

Comments

RalfJung commented Jan 10, 2019 • edited Loading

RalfJung commented Jan 10, 2019

RalfJung commented Jan 10, 2019

nikomatsakis commented Jan 31, 2019

nagisa commented Feb 4, 2019 • edited Loading

arielb1 commented Feb 4, 2019

nagisa commented Feb 4, 2019

RalfJung commented Feb 5, 2019

RalfJung commented Feb 5, 2019

RalfJung commented Feb 5, 2019

RalfJung commented Feb 8, 2019 • edited Loading

CAD97 commented Mar 13, 2019 • edited Loading

RalfJung commented Mar 13, 2019

petertodd commented May 21, 2019

gnzlbg commented May 21, 2019

RalfJung commented May 21, 2019

gnzlbg commented May 21, 2019 • edited Loading

RalfJung commented May 21, 2019

gnzlbg commented May 21, 2019

RalfJung commented May 21, 2019

petertodd commented May 22, 2019

Unsized Types

Alternative Solution

RalfJung commented May 27, 2019

RalfJung commented May 31, 2019

JakobDegen commented Jul 9, 2022

RalfJung commented Jul 9, 2022 • edited Loading

RalfJung commented Jul 12, 2022 • edited Loading

RalfJung commented Jun 6, 2023 • edited Loading

RalfJung commented Jan 10, 2019 •

edited

Loading

nagisa commented Feb 4, 2019 •

edited

Loading

RalfJung commented Feb 8, 2019 •

edited

Loading

CAD97 commented Mar 13, 2019 •

edited

Loading

gnzlbg commented May 21, 2019 •

edited

Loading

RalfJung commented Jul 9, 2022 •

edited

Loading

RalfJung commented Jul 12, 2022 •

edited

Loading

RalfJung commented Jun 6, 2023 •

edited

Loading