Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addr_of[_mut]! docs should be more precise about what's sound #114902

Closed
joshlf opened this issue Aug 16, 2023 · 21 comments · Fixed by #117572
Closed

addr_of[_mut]! docs should be more precise about what's sound #114902

joshlf opened this issue Aug 16, 2023 · 21 comments · Fixed by #117572
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team

Comments

@joshlf
Copy link
Contributor

joshlf commented Aug 16, 2023

This example comes from this URLO thread, which in turn was inspired by this zerocopy issue.

The addr_of! and addr_of_mut! docs imply that they can handle unaligned pointers so long as a reference is never materialized from that pointer (which would be insta-UB). However, running the following program under Miri fails:

fn main() {
    #[repr(C, packed)]
    struct Unalign<T>(T);
    
    #[repr(C)]
    struct Foo {
        a: u8,
        b: u16,
    }
    
    // Has alignment `align_of::<T>()`, and the `Unalign<T>`
    // is at byte offset 1; so long as `align_of::<T>() > 1`,
    // the contained `T` is misaligned.
    #[repr(C)]
    struct Misalign<T>(u8, Unalign<T>, [T; 0]);
    
    let u = Misalign(0, Unalign(Foo{ a: 1, b: 2 }), []);
    let u_ptr: *const Unalign<Foo> = &u.1;
    // Sound because `Unalign` contains a `T` and nothing else.
    let f_ptr: *const Foo = u_ptr.cast();
    // Should be sound because we never construct a reference.
    let addr_of_b: *const u16 = unsafe { core::ptr::addr_of!((*f_ptr).b) };
    println!("{:?}", addr_of_b);
}

Here's the failure:

error: Undefined Behavior: accessing memory with alignment 1, but alignment 2 is required
  --> src/main.rs:22:42
   |
22 |     let addr_of_b: *const u16 = unsafe { core::ptr::addr_of!((*f_ptr).b) };
   |                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ accessing memory with alignment 1, but alignment 2 is required
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
   = note: BACKTRACE:
   = note: inside `main` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:1980:5: 1980:22
   = note: this error originates in the macro `core::ptr::addr_of` (in Nightly builds, run with -Z macro-backtrace for more info)

@cuviper pointed out my issue:

The example shows an unaligned field (since it's packed), but that doesn't excuse everything. "Note, however, that the expr in addr_of!(expr) is still subject to all the usual rules." -- I think this would include alignment for (*f_ptr), although the docs only call out null pointers.

Miri is happy with addr_of!((*u_ptr).0.b).

The docs strongly imply that this would be sound. It does this in two ways. First, when contrasting references and raw pointers, it specifically calls out that references need to be aligned:

Creating a reference with &/&mut is only allowed if the pointer is properly aligned and points to initialized data. For cases where those requirements do not hold, raw pointers should be used instead.

Second, it seems to enumerate the cases in which addr_of! would be unsound, and operating on unaligned pointers isn't on the list:

Note, however, that the expr in addr_of!(expr) is still subject to all the usual rules. In particular, addr_of!(*ptr::null()) is Undefined Behavior because it dereferences a null pointer.

IMO the docs as currently written are both vague and misleading with respect to what's allowed. It would be good if the docs were more precise on exactly what is and isn't allowed. I was pretty confident in the soundness of the code I'd written, and only discovered the issue thanks to Miri.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 16, 2023
@scottmcm scottmcm added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Aug 16, 2023
@ChrisDenton
Copy link
Member

Maybe @rust-lang/opsem have some thoughts on precise wording that should be used.

@RalfJung
Copy link
Member

RalfJung commented Aug 16, 2023

The docs strongly imply that this would be sound.

That is unfortunate, since when I read them I find them fairly clearly stating this to be not sound. :/ Our mental models must be quite different here.

I wonder, if the code did addr_of!(***ptr), would you be surprised that the two loads from memory that are happening here are still subject to the usual alignment rules? Your case is really another instance of the same thing -- each of these three * operators requires alignment. It's as simple as that. The addr_of! only becomes relevant after the place expression ***ptr was evaluated to a place.

Sadly there are many inaccurate mental models out there about how place and value expression evaluation interacts. You are by far not the first to be confused by these docs. That's why we added the part about addr_of!(*ptr::null()) being UB -- I would assume that conflicts with your model, so my hope was that people with the wrong model would read that part and then realize they took a wrong turn somewhere. (Or maybe your model makes that UB but allows your code... but then I can't even imagine what your model is like, unfortunately.)

I think the proper model here is beautiful and I'd love to teach it to everyone... but maybe, this one time, we don't actually have to. If rust-lang/reference#1387 gets accepted, your code will become sound. addr_of!(**ptr::null()) will still be UB but I don't think anyone is surprised by that. The "wrong" model isn't actually wrong any more, it is just unnecessarily complicated in my eyes, but it should produce the right predictions. At least I hope so.

@saethlin
Copy link
Member

I think that any invalid example involving ptr::null() is too easily hand-waved away; null is special-cased for reads so it does not necessarily register as surprising that it is also not allowed here.

@joshlf
Copy link
Contributor Author

joshlf commented Aug 16, 2023

The docs strongly imply that this would be sound.

That is unfortunate, since when I read them I find them fairly clearly stating this to be not sound. :/ Our mental models must be quite different here.

Yeah, and I wish I had your mental model! I know what it's like to be on the other side of "I don't see what's confusing about this for you", so I'll try as best as I can to articulate how I interpreted the docs.

I wonder, if the code did addr_of!(***ptr), would you be surprised that the two loads from memory that are happening here are still subject to the usual alignment rules? Your case is really another instance of the same thing -- each of these three * operators requires alignment. It's as simple as that. The addr_of! only becomes relevant after the place expression ***ptr was evaluated to a place.

Maybe I'm missing something, but you said that there are two loads - doesn't that mean that the final dereference (since there are three of them) isn't performing a load? (And, by implication, that final dereference doesn't need to be properly aligned?)

Maybe this is the issue: In my mental model, "dereference" doesn't necessarily mean "load". A few examples of why that's my mental model:

  • In C, you can do ptr->foo = bar - that's a dereference but not a load of the referent of ptr
  • (This one I'm less sure about) Also in C, you can do *ptr = foo - also a dereference but not a load of the referent
  • In Rust, doing &*ptr is the way you spell "cast this to a reference", and I've always assumed this doesn't actually perform a load of the referent of ptr? (though, as the docs mention, materializing an invalid reference is UB, so it's still dangerous for other reasons)

For that reason, my mental model of something like addr_of!((*ptr).foo) is "give me the address that I'd be storing into if I did (*ptr).foo = bar". That's reinforced by code like the following:

fn main() {
    let mut x = (0usize,);
    let x_ptr: *mut (usize,) = &mut x;
    unsafe { (*x_ptr).0 = 1 };
    println!("{}", x.0); // prints 1
}

My reading of this example is that, if *x_ptr loaded from x_ptr, then it would create a temporary, and so the assignment would have no effect. The fact that it has an effect implies to me that *x_ptr is closer to a "place" (I'm using that term loosely to mean "thing I can assign into", not necessarily in its precise Rust meaning).

In your example of addr_of!(***ptr), I'd assume that *ptr is loaded - let's call it ptr2 - and then **ptr (in other words, *ptr2) is loaded - let's call it ptr3 - and then ***ptr (in other words, *ptr3) constructs a place expression but does not result in a load from the memory location addressed by ptr3.

Sadly there are many inaccurate mental models out there about how place and value expression evaluation interacts. You are by far not the first to be confused by these docs. That's why we added the part about addr_of!(*ptr::null()) being UB -- I would assume that conflicts with your model, so my hope was that people with the wrong model would read that part and then realize they took a wrong turn somewhere. (Or maybe your model makes that UB but allows your code... but then I can't even imagine what your model is like, unfortunately.)

Honestly, I think I read the docs like this:

Docs: "Note, however, that the expr in addr_of!(expr) is still subject to all the usual rules."
Me: "Hmmm, I'm not sure what "all the usual rules" are."
Docs: "In particular, addr_of!(*ptr::null()) is Undefined Behavior because it dereferences a null pointer."
Me: "Oh, I guess that's the only requirement, cool. Not sure why they said "rules" plural. Oh well."

I think the proper model here is beautiful and I'd love to teach it to everyone... but maybe, this one time, we don't actually have to. If rust-lang/reference#1387 gets accepted, your code will become sound. addr_of!(**ptr::null()) will still be UB but I don't think anyone is surprised by that. The "wrong" model isn't actually wrong any more, it is just unnecessarily complicated in my eyes, but it should produce the right predictions. At least I hope so.

Yeah I would absolutely love to have the correct model in my head! If I had to speculate, I'd say that part of the disconnect between folks like you - who work on this stuff, and especially its internals - and folks like me - who merely consume these semantics - is that it'd be a lot harder for your mental model to be wrong-but-accidentally-not-causing-any-problems. For me, it's easy to feel like I understand things, and so long as I come to the correct conclusions (where "correct" means something like "write code that other programmers and Miri agree is sound"), I never realize I'm wrong until I come to edge cases like these. By contrast, it'd be a lot harder for you to persist with an incorrect mental model because, when working on the compiler internals, much smaller/finer-grained errors in your model have practical consequences than is the case for me.

I think about how confusing it used to be for me to reason about indexing/off-by-one errors/etc when I was first learning to program, and now it's not just something I'm good at, it feels obvious and easy. It's hard for me to empathize with what it'd be like not to have an intuitive grasp of those concepts. I think this is basically a Hard Problem. I don't really know of any way past it besides a) lots of navel gazing about how you came to your current understanding and, b) trying out your explanations on people (and then seeing if they can expand on the model they've acquired from those explanations, and if those expansions are correct).

On a more pragmatic note, I think there are two pieces of low-hanging fruit here that would make things a lot better:

  • Canonical locations for definitions, and more linking between docs. E.g., "all the usual rules" doesn't help me if I don't already know what that refers to. It was only while writing this bullet point that I thought to look and discovered that "place" is a well-documented term.
  • In the canonical definitions, lists of examples (e.g., this code is sound, this code is unsound, this code is sound, this code is unsound, etc). That gives folks a way of testing their own understanding.

@digama0
Copy link
Contributor

digama0 commented Aug 17, 2023

Maybe I'm missing something, but you said that there are two loads - doesn't that mean that the final dereference (since there are three of them) isn't performing a load? (And, by implication, that final dereference doesn't need to be properly aligned?)

The first sentence is true, but the "by implication" is not. The final dereference produces a place, exactly as you have described, but constructing a place already adds the requirement of being properly aligned, same as you would get if you put a & on the front. This is what the example involving *ptr::null() was supposed to indicate: the place itself has to be valid, even though a load is not actually happening because it is wrapped in addr_of!(_).

We found this behavior confusing and foot-gunny as well, which is why rust-lang/reference#1387 documents the new behavior, that constructing a place on its own doesn't add any requirements.

@RalfJung
Copy link
Member

RalfJung commented Aug 17, 2023 via email

@saethlin saethlin added T-opsem Relevant to the opsem team and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 17, 2023
@joshlf
Copy link
Contributor Author

joshlf commented Aug 17, 2023

Sounds good!

A few more clarifying questions:

First, do field accesses generate loads? E.g.:

let p: *const (u8, (u16, (u32, u64))) = ...;
addr_of!((*p).1.1.1);

Does this addr_of! expression perform any loads? If not, then under the new model, is this also sound even if p is unaligned?

Second, I assume that indirecting through a reference generates a load unconditionally? E.g.:

let p: *const (u8, &'static (u16, (u32, u64))) = ...;
addr_of!((*p).1.1.1);

I assume that this generates a single load (namely, it loads the value of the p.1 (the &'static) so that it can then calculate the offset of .1.1 within the type (u16, (u32, u64)) and add that offset to the address stored in p.1)? I assume that, under the new model, p can not be unaligned because this generates a load from p.1?

@RalfJung
Copy link
Member

Under the rules are currently documented and enforced by Miri, loads are irrelevant for alignment. Aligned is checked on *, i.e. on derefs. This implies that all loaded-from values are aligned, so a separate check at load time is not needed.

Under rust-lang/reference#1387, it is only loads that generate alignment requirements. Field accesses do not require alignment, but they do require "inbounds" (ptr.offset rules).

For the second point, yes -- that code desugars to addr_of!((*(*p).1).1.1) which desugars to addr_of!((*load(*load(p)).1).1.1) and each load requires alignment. (The first load here is just the load from the local itself which is always aligned since the compiler ensures to allocate it in an aligned way.

@saethlin
Copy link
Member

addr_of! applies to a Place expression. A Place can only contain one Deref. Ergo, even if in surface Rust you pile a lot of code into one expr, you get loads for all but the last (in evaluation order) Deref. The difference with references and raw pointers is that a Deref of a pointer must be explicit, reference Deref may be implicit.

I don't know a better way to explain this. I feel like we want to avoid the explanation that a Place can only contain one Deref. That's why your example has a load through the reference, it desugars into multiple Places.

@RalfJung
Copy link
Member

RalfJung commented Aug 18, 2023

A Place can only contain one Deref.

That's not even true for the initially built MIR. (Normalizing to "only one deref" is a MIR pass that runs at some point.) It is certainly not true for something like MiniRust.

The thing is that addr_of!(**p) is just omitting a lot of details. * is an operation that takes a value expression as an argument. So for something like **p, where *p is a place expression, this isn't even grammatically valid -- we have to insert the place-to-value coercion, which is spelled load. That's how we end up at addr_of!(*load(*p)). If we systematically apply this everywhere we realize that if p is a local variable, then p is also a place expression, so *p is not in the grammar, but *load(p) is, and we end up with addr_of!(*load(*load(p))). Under the current rules, both * are doing an alignment check or else we have UB. Under the new rules, the load is doing the alignment check, and for the first load that's trivial since the compiler ensures that the local variable p is allocated sufficiently aligned.

@joshlf
Copy link
Contributor Author

joshlf commented Sep 13, 2023

I ran into another issue with addr_of!, and it's not a big enough deal to open a new issue, so I figured I'd just mention it here.

In particular, I have a use case for out-of-bounds projection: using it to compute type layout information based on synthesized pointers. The playground is having issues right now, so here's the code instead.

The idea is that I'm trying to figure out the offset of a field within a struct, and I want that to become part of its type information (by assigning to a trait's associated constant). The type is unsized, so I can't construct a MaybeUninit<Self> so that I actually have an instance of it during const eval. My attempt here synthesizes pointers and does field projection + pointer math to deduce the byte offset. However, this is both unsound and actually rejected by the compiler due to the in-bounds requirement for projection.

(As an aside, I'm also not sure about whether this abides by strict provenance. A previous attempt used NonNull::dangling(), and Miri didn't like that. This version - which uses a const reference to start with and converts it to a pointer - seems to be doing better, but maybe I'm missing something.)

Code
use core::mem::{align_of, size_of};
use core::num::NonZeroUsize;
use core::ptr::NonNull;

/// A trait which describes the layout of sized types
/// and of slice-based DSTs.
unsafe trait KnownLayout {
    const MIN_SIZE: usize;
    const ALIGN: NonZeroUsize;
    const TRAILING_ELEM_SIZE: Option<usize>;
    // # Safety
    //
    // Implementer promises this is aligned to `LargestSupportedAlign`.
    const MAX_ALIGN_DANGLING_PTR: NonNull<Self>;
}

macro_rules! impl_known_layout {
    ($($t:ty),*) => {
        $(
            unsafe impl KnownLayout for $t {
                const MIN_SIZE: usize = size_of::<Self>();
                const ALIGN: NonZeroUsize = const_align_of::<Self>();
                const MAX_ALIGN_DANGLING_PTR: NonNull<Self> = {
                    assert!(Self::ALIGN.get() <= align_of::<LargestSupportedAlign>());
                    LARGEST_SUPPORTED_ALIGN_RAW.cast::<Self>()
                };
                const TRAILING_ELEM_SIZE: Option<usize> = None;
            }
        )*
    }
}

impl_known_layout!((), u8, u16, u32);

unsafe impl<const N: usize, T> KnownLayout for [T; N] {
    const MIN_SIZE: usize = size_of::<Self>();
    const ALIGN: NonZeroUsize = const_align_of::<Self>();
    const MAX_ALIGN_DANGLING_PTR: NonNull<Self> = {
        assert!(Self::ALIGN.get() <= align_of::<LargestSupportedAlign>());
        LARGEST_SUPPORTED_ALIGN_RAW.cast::<Self>()
    };
    const TRAILING_ELEM_SIZE: Option<usize> = None;
}

unsafe impl<T> KnownLayout for [T] {
    const MIN_SIZE: usize = 0;
    const ALIGN: NonZeroUsize = const_align_of::<T>();
    const MAX_ALIGN_DANGLING_PTR: NonNull<Self> = {
        assert!(Self::ALIGN.get() <= align_of::<LargestSupportedAlign>());
        let elem = LARGEST_SUPPORTED_ALIGN_RAW.cast::<T>();
        let slc = core::ptr::slice_from_raw_parts(elem.as_ptr().cast_const(), 0);
        unsafe { NonNull::new_unchecked(slc.cast_mut()) }
    };
    const TRAILING_ELEM_SIZE: Option<usize> = Some(size_of::<T>());
}

#[repr(C)]
struct Foo<A, B, T: ?Sized + KnownLayout> {
    a: A,
    b: B,
    t: T,
}

unsafe impl<A, B, T: ?Sized + KnownLayout> KnownLayout for Foo<A, B, T> {
    const MIN_SIZE: usize = {
        let slf = Self::MAX_ALIGN_DANGLING_PTR.as_ptr().cast_const();
        // TODO: Provenance issues here?
        let t_ptr = unsafe { core::ptr::addr_of!((*slf).t) };
        let t_offset = unsafe { t_ptr.cast::<u8>().offset_from(slf.cast::<u8>()) };
        t_offset as usize + T::MIN_SIZE
    };
    const ALIGN: NonZeroUsize = {
        let aligns = [const_align_of::<A>(), const_align_of::<B>(), T::ALIGN];
        let mut max_align = aligns[0];
        let mut i = 0;
        while i < aligns.len() {
            max_align = if aligns[i].get() > max_align.get() {
                aligns[i]
            } else {
                max_align
            };
            i += 1;
        }
        max_align
    };
    const TRAILING_ELEM_SIZE: Option<usize> = T::TRAILING_ELEM_SIZE;
    const MAX_ALIGN_DANGLING_PTR: NonNull<Self> = {
        assert!(Self::ALIGN.get() <= align_of::<LargestSupportedAlign>());
        // SAFETY: Cannot produce un-aligned pointer due to preceding assert.
        let self_raw = T::MAX_ALIGN_DANGLING_PTR.as_ptr() as *mut Self;
        // SAFETY: `self_raw` came from a non-null pointer.
        unsafe { NonNull::new_unchecked(self_raw) }
    };
}

// TODO: Making this really big makes rustc very uhappy.
// Luckily, invalidly-aligned pointers shouldn't be a problem
// once rust-lang/reference#1387 lands.
// #[repr(align(536_870_912))] // 2^29
#[repr(align(4096))] // 2^29
struct LargestSupportedAlign;

const LARGEST_SUPPORTED_ALIGN: &LargestSupportedAlign = &LargestSupportedAlign;
const LARGEST_SUPPORTED_ALIGN_RAW: NonNull<LargestSupportedAlign> = {
    let ptr: *const LargestSupportedAlign = LARGEST_SUPPORTED_ALIGN;
     unsafe { NonNull::new_unchecked(ptr.cast_mut()) }
};

const fn const_align_of<T>() -> NonZeroUsize {
    match NonZeroUsize::new(align_of::<T>()) {
        Some(align) => align,
        None => unreachable!(),
    }
}

fn main() {
    macro_rules! print_meta {
        ($($t:ty),*) => {
            println!("{:10}\tmin_size align\ttrailing_size\tdangling", "type");
            $(
                print_meta!(@inner $t);
                print_meta!(@inner [$t]);

                print_meta!(@inner Foo<u16, u8, $t>);
                print_meta!(@inner Foo<u16, u8, [$t]>);
                print_meta!(@inner Foo<u16, u8, Foo<u16, u8, [$t]>>);
            )*
        };
        (@inner $t:ty) => {{
            let min_size = <$t as KnownLayout>::MIN_SIZE;
                let align = <$t as KnownLayout>::ALIGN;
                let trailing_size = <$t as KnownLayout>::TRAILING_ELEM_SIZE;
                let dangling = <$t as KnownLayout>::MAX_ALIGN_DANGLING_PTR;
                println!("{:10}\t{min_size}\t {align}\t{trailing_size:?}\t\t{dangling:?}", stringify!($t));
        }};
    }

    print_meta!((), u8, u16, u32, [(); 2], [u8; 2], [u16; 2], [u32; 2]);
}

@RalfJung
Copy link
Member

RalfJung commented Sep 13, 2023

using it to compute type layout information based on synthesized pointers.

This came up in the t-opsem meeting about whether we should change the rules for ptr.offset and place projection to be "nowrap" rather than "inbounds". However, the general consensus was that offset_of! solves this problem, so we don't need to go for "nowrap". (The issue with "nowrap" being that it is not currently supported by LLVM, so we need (a) someone to write that patch and (b) a good usecase to convince the LLVM people to take the patch.)

Why is offset_of! not enough in your case? Is it because of unsizedness? Is that more than "just" a feature request for offset_of!?

@joshlf
Copy link
Contributor Author

joshlf commented Sep 13, 2023

This came up in the t-opsem meeting about whether we should change the rules for ptr.offset and place projection to be "nowrap" rather than "inbounds". However, the general consensus was that offset_of! solves this problem, so we don't need to go for "nowrap". (The issue with "nowrap" being that it is not currently supported by LLVM, so we need (a) someone to write that patch and (b) a good usecase to convince the LLVM people to take the patch.)

Yeah, sorry, I should have clarified. I saw discussions to this effect, I just wanted to put a "real world" out-of-bounds projection use case out there in case it was useful information. I'm not actually advocating for this, as overall it sounds like the wrong tradeoff to support it given LLVM.

Why is offset_of! not enough in your case? Is it because of unsizedness? Is that more than "just" a feature request for offset_of!?

Once offset_of! is stable and supports field offset within unsized types, it would be a valid solution to this problem.

@RalfJung
Copy link
Member

Offsets of the unsized field specifically are not a constant so I don't see how offset_of! could support them.

@joshlf
Copy link
Contributor Author

joshlf commented Sep 13, 2023

Offsets of the unsized field specifically are not a constant so I don't see how offset_of! could support them.

I'm referring to the offset to the beginning of an unsized field:

#[repr(C)]
struct Foo {
    a: u8,
    b: [u16], // offset to this field guaranteed to be 2, right?
}

@cuviper
Copy link
Member

cuviper commented Sep 13, 2023

I think that's ok for slices, but dyn Trait does not have statically known alignment.

@joshlf
Copy link
Contributor Author

joshlf commented Sep 13, 2023

I think that's ok for slices, but dyn Trait does not have statically known alignment.

Given the repr(C) type layout reference, I would assume that the offset of b in this example is guaranteed to be a function of T's alignment even if T is a dyn Trait? Or should that section only be read as applying to sized types and slice-based DSTs?

#[repr(C)]
struct Foo<T: ?Sized>{
    a: u8,
    b: T,
}

@RalfJung
Copy link
Member

RalfJung commented Sep 13, 2023 via email

@joshlf
Copy link
Contributor Author

joshlf commented Sep 13, 2023

The offset of b depends on the dynmic alignment of T, so for dyn Traiy it can differ depending on the underlying dynamic type. This is necessary to ensure that the field is properly aligned.

Hmm that raises an interesting question. IIUC, in practice, the way you would obtain a &Foo<T> where T happens to be a dyn Trait is that you'd start off with a Foo<U> (for concrete U), and then coerce &Foo<U> into &Foo<dyn Trait> where U: Trait. When you have &Foo<U>, you know the alignment of Foo<U>, and you know the field offset of the U within Foo<U>. That offset will be a function of U's alignment.

However, in a &Foo<dyn Trait> context, how does Rust know what offset the U lives at? This must be a dynamic runtime property, which I guess means that, what, the vtable must encode the offset or something wild like that? It does indeed seem to be able to figure it out:

Demo code
use std::fmt::Debug;

struct Foo<T: ?Sized> {
    a: u8,
    t: T,
}

fn print_foo_t(f: &Foo<dyn Debug>) {
    let offset = {
        let base: *const Foo<dyn Debug> = f;
        let t: *const dyn Debug = core::ptr::addr_of!(f.t);
        unsafe { t.cast::<u8>().offset_from(base.cast::<u8>()) }
    };
    println!("{:?}\tat offset {offset}", &f.t);
}

fn main() {
    print_foo_t(&Foo { a: 0, t: 0u8 });
    print_foo_t(&Foo {
        a: 0,
        t: (0u32, 1u32),
    });
    print_foo_t(&Foo {
        a: 0,
        t: (0u128, 1u128),
    });
    print_foo_t(&Foo {
        a: 0,
        t: "hello",
    });
}

Prints:

0	at offset 1
(0, 1)	at offset 4
(0, 1)	at offset 8
"hello"	at offset 8

@cuviper
Copy link
Member

cuviper commented Sep 13, 2023

The vtable includes the size and alignment of the sole unsized field, which is enough to dynamically compute its offset.

@RalfJung
Copy link
Member

RalfJung commented Sep 13, 2023

Yes, the code for projecting to a field of unsized type will, at runtime, run align_of_val to compute the dynamic alignment from the metadata, and compute the offset from that.

The codegen logic for this is around here.

github-actions bot pushed a commit to rust-lang/miri that referenced this issue Nov 15, 2023
update and clarify addr_of docs

This updates the docs to match rust-lang/reference#1387. Cc `@rust-lang/opsem`

`@chorman0773` not sure if you had anything else you wanted to say here, I'd be happy to get your feedback. :)

Fixes rust-lang/rust#114902, so Cc `@joshlf`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants