Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment of the bytes of Allocation to match align parameter #100467

Closed
wants to merge 25 commits into from

Conversation

emarteca
Copy link

Aligning the bytes field of Allocation so that it is aligned matching the align parameter.

Main changes are to:

Instead of just converting the relevant [u8] to a Box<[u8]> directly we now:

  • create a std::alloc::Layout of the alignment the Allocation is specified to have
  • allocate a buffer with this layout
  • get the corresponding [u8] for this buffer

Then we write the relevant bytes to it, check for uninit, etc -- same process as previously.

NOTE
With this current code, when the bytes field is deallocated, even though the size is right, in the layout the alignment might be under-required: for example we might have allocated it with alignment 4 and deallocate it with alignment 2. We know this is undefined behaviour violating the memory fitting requirements of dealloc -- we're going to work around this by adding a wrapper struct for the Box<[u8]> that stores the alignment it was allocated with, and manually implements deallocation with the right alignment.

Why are we doing this?

We want the address of the bytes field of an Allocation to be accessible, and represent the actual address of the [u8]. This is part of an effort to pass pointers in Miri to the C FFI.

maurer and others added 12 commits August 2, 2022 15:01
Punch a hole to allow direct access to bytes - this API should be
changed to provide a `Size` value corresponding to an `AllocId`.

Additionally, force bytes to be appropriately aligned - the current
trick is a hack, and over-aligns. We should use the `Align` parameter to
appropriately align these buffers, as otherwise they'll never be able to
be used by native code.

The "adjust" callback used on deserialized allocations might not have
that information - if it doesn't, just overalign to max alignment
…he check make sense (should be len of bytes) for real addrs
@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 12, 2022
@rustbot
Copy link
Collaborator

rustbot commented Aug 12, 2022

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

The Miri submodule was changed

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @davidtwco (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive
Copy link
Collaborator

⚠️ Warning ⚠️

  • These commits modify submodules.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 12, 2022
@RalfJung
Copy link
Member

The Miri submodule was changed

Please take the submodule update out of this MR. :) That will be done separately.

@davidtwco
Copy link
Member

r? @RalfJung

@rust-highfive rust-highfive assigned RalfJung and unassigned davidtwco Aug 15, 2022
let size = Size::from_bytes(slice.len());
let align_usize: usize = align.bytes().try_into().unwrap();
let layout = std::alloc::Layout::from_size_align(slice.len(), align_usize).unwrap();
let bytes = unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my biggest concern is that this introduces unsafe code deep in the core of the interpreter engine. This is a rather significant cost and risk that the interpreter has to carry, even the compile-time interpreter that definitely does not want or need this.

At the least, we should find a way to factor this such that all the unsafe code lives in the Miri repository. Even that is still not great though, it is exactly the kind of global cost I was worried about.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your point, but I don't think we can align the bytes of the allocation without unsafe code.
If we want to factor this out into Miri, then one option could be to refactor Allocation to expose bytes so that the bytes alignment can be done on the Miri side. Although, this would be a bigger change to Allocation.
What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we certainly don't want to expose bytes. But we can make Allocation generic over how bytes are allocated.

You can add a new type parameter for that:

pub struct Allocation<Prov = AllocId, Extra = (), Bytes = Box<[u8]>> {
  bytes: Bytes

and then define a suitable trait

trait AllocationBytes

and add a bound Bytes: AllocationBytes whereveer needed.

The trait needs to then contain all the operations you need. The implementation in rustc can use Box<[u8]> and be completely safe, and that way only Miri needs unsafe code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's smart -- I had tried earlier to make Allocation generic over the allocator for bytes which didn't work. But this makes sense!

@RalfJung
Copy link
Member

RalfJung commented Aug 15, 2022

This is part of rust-lang/miri#2365.

So this is a first step of a many-step plan -- would be good to have at least some idea of the overall plan, rather than going blindly into some direction without knowing whether it will work out in the end.

@rustbot
Copy link
Collaborator

rustbot commented Aug 15, 2022

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

Some changes occurred in src/tools/rustfmt

cc @rust-lang/rustfmt

Comment on lines 26 to 27
pub trait AllocBytes:
Clone + core::fmt::Debug + Eq + PartialEq + PartialOrd + Ord + core::hash::Hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make this also have Deref<Target = [u8]>, you can remove everything but adjust_to_align, from_bytes and uninit and should have to adjust much less code as it would pick up the slice methods that Box<[u8]> also exposes via Deref.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point! I was hoping to keep AllocBytes more general though, so I can use it to represent foreign memory in Miri, that isn't represented by a [u8]. Given that, do you think it's ok to keep these extra functions in AllocBytes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of memory are you thinking of? You already support get_slice_from_range, which returns a &[u8]. Most of what I'm proposing is to get rid of that method and require the implementor of AllocBytes to put the logic into the Deref::deref method instead. Or are you planning to change the return type of the get_slice methods?

Copy link
Author

@emarteca emarteca Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, if we can make a custom get_slice method with the same return type then that should work! In my implementation these methods still return &[u8] -- my current representation of foreign memory is represented as a pair of machine address and length, and the get_slice methods build the slice from std::slice::from_raw_parts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, so you could implement Deref and DerefMut for your foreign memory representation and have the deref and deref_mut methods use from_raw_parts. Then you can remove the get_slice* methods and others from the trait, as you get them for free once Deref is a super trait

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see -- that's smart, thanks for the suggestion! (Will push these changes soon)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the trait has a lot more methods than I was expecting so this sounds good.

Comment on lines 520 to 522
/// Get the base address for the bytes in an `Allocation` specified by the
/// `AllocID` passed in; error if no such allocation exists.
pub fn get_alloc_base_addr(&self, id: AllocId) -> InterpResult<'tcx, usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Get the base address for the bytes in an `Allocation` specified by the
/// `AllocID` passed in; error if no such allocation exists.
pub fn get_alloc_base_addr(&self, id: AllocId) -> InterpResult<'tcx, usize> {
/// Get the base address for the bytes in an `Allocation` specified by the
/// `AllocID` passed in; error if no such allocation exists. The address will
/// be exposed (on the host system!).
///
/// It is up to the caller to take sufficient care when using this address:
/// there could be provenance or uninit memory in there, and other memory
/// accesses could invalidate the exposed pointer.
pub fn expose_alloc_base_addr(&self, id: AllocId) -> InterpResult<'tcx, usize> {

Also note that this only allows read-only access to the bytes. If C code writes to this memory, it's writing to a shared reference! So I think you probably want to make this mut.


// Default `bytes` for `Allocation` is a `Box<[u8]>`.
impl AllocBytes for Box<[u8]> {
fn uninit<'tcx, F: Fn() -> InterpError<'tcx>>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not uninit though, it returns a zeroed buffer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh that's true -- I just took this code from here where it was originally in the file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire allocation is considered uninit based on its InitMask, yeah. But looking just at the bytes, which is the level of abstraction you have here, this is zeroed. The bytes cannot be uninit.

/// Adjust the bytes to the specified alignment -- by default, this is a no-op.
fn adjust_to_align(self, _align: Align) -> Self {
self
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this about? Alignment should be set at allocation time.

@JohnCSimon
Copy link
Member

@emarteca
Ping from triage:
Can you please address the comments from the reviewer?

FYI: when a PR is ready for review, send a message containing
@rustbot ready to switch to S-waiting-on-review so the PR is in the reviewer's backlog.

@emarteca
Copy link
Author

emarteca commented Nov 9, 2022

Note: the build is failing b/c the Miri machine doesn't specify its Bytes type, but I don't think I'm meant to be editing the submodules in this PR (correct me if I'm wrong though!)
Sorry for the delay in PR edits, grad school really picked up.

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 9, 2022
@rust-log-analyzer
Copy link
Collaborator

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@RalfJung
Copy link
Member

RalfJung commented Nov 10, 2022 via email

@bors
Copy link
Contributor

bors commented Nov 13, 2022

☔ The latest upstream changes (presumably #104370) made this pull request unmergeable. Please resolve the merge conflicts.

@RalfJung
Copy link
Member

@rustbot author
waiting for the Miri fixes

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 26, 2022
Comment on lines +285 to +287
// REEEEEEE TODO ELLEN
// make a new constructor that just takes a Bytes
// then, make the allocation this way in ffi_support, and add it to the memory with allocate_raw_ptr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look like it should land in this form ;)

Comment on lines +37 to +38
/// The address of the bytes.
fn expose_addr(&self) -> u64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to make this return a raw pointer, and leave exposing and things like that to the caller.

And in fact... why is this even needed? One can use the deref to get such a pointer, right?

Comment on lines +28 to +33
+ core::fmt::Debug
+ Eq
+ PartialEq
+ PartialOrd
+ Ord
+ core::hash::Hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the Ord bounds, they should not be needed. No idea why Allocation derives PartialOrd, but #104933 removes that.

@Dylan-DPC
Copy link
Member

@emarteca any updates on this?

@Dylan-DPC
Copy link
Member

Closing this due to inactivity. Feel free to create a new pr if you wish to continue with this

@Dylan-DPC Dylan-DPC closed this Feb 23, 2023
@Dylan-DPC Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 23, 2023
compiler-errors added a commit to compiler-errors/rust that referenced this pull request Mar 2, 2023
Support allocations with non-Box<[u8]> bytes

This is prep work for allowing miri to support passing pointers to C code, which will require `Allocation`s to be correctly aligned. Currently, it just makes `Allocation` generic and plumbs the necessary changes through the right places.

The follow-up to this will be adding a type in the miri interpreter which correctly aligns the bytes, using that for the Miri engine, then allowing Miri to pass pointers into these allocations to C calls.

Based off of rust-lang#100467, credit to `@emarteca` for the code
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 2, 2023
Support allocations with non-Box<[u8]> bytes

This is prep work for allowing miri to support passing pointers to C code, which will require `Allocation`s to be correctly aligned. Currently, it just makes `Allocation` generic and plumbs the necessary changes through the right places.

The follow-up to this will be adding a type in the miri interpreter which correctly aligns the bytes, using that for the Miri engine, then allowing Miri to pass pointers into these allocations to C calls.

Based off of rust-lang#100467, credit to ``@emarteca`` for the code
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 2, 2023
Support allocations with non-Box<[u8]> bytes

This is prep work for allowing miri to support passing pointers to C code, which will require `Allocation`s to be correctly aligned. Currently, it just makes `Allocation` generic and plumbs the necessary changes through the right places.

The follow-up to this will be adding a type in the miri interpreter which correctly aligns the bytes, using that for the Miri engine, then allowing Miri to pass pointers into these allocations to C calls.

Based off of rust-lang#100467, credit to ```@emarteca``` for the code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.