Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe checking skips pointer dereferences in unused places #80059

Closed
scottmcm opened this issue Dec 15, 2020 · 29 comments · Fixed by #102256
Closed

Unsafe checking skips pointer dereferences in unused places #80059

scottmcm opened this issue Dec 15, 2020 · 29 comments · Fixed by #102256
Labels
C-bug Category: This is a bug. fixed-by-thir-unsafeck `-Z thir-unsafeck` handles this correctly. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@scottmcm
Copy link
Member

Seen in #79735

I tried this code:

fn foo(ptr: *const bool) {
    let _ = *ptr;
}

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=44e42a069bc7601365f5f21a102178d9

Which, oddly, happily compiled despite the syntactic pointer dereference.

Back in 1.21, however, that gave the expected error:

error[E0133]: dereference of raw pointer requires unsafe function or block
 --> <source>:2:13
  |
2 |     let _ = *ptr;
  |             ^^^^ dereference of raw pointer

error: aborting due to previous error

https://rust.godbolt.org/z/rP93nf

The conversation in the 2020-12-15 lang team meeting implied that this was an unexpected regression, so opening and nomination for further discussion.

@scottmcm scottmcm added I-nominated T-lang Relevant to the language team, which will review and decide on the PR/issue. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. C-bug Category: This is a bug. labels Dec 15, 2020
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Dec 15, 2020
@scottmcm
Copy link
Member Author

Probably P-low as it's sound as-is, and has been this way for over 3 years.

@LeSeulArtichaut LeSeulArtichaut added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Dec 15, 2020
@LeSeulArtichaut
Copy link
Contributor

Why is this sound? Because the dereference gets optimized away?

@camelid
Copy link
Member

camelid commented Dec 15, 2020

Why is this sound? Because the dereference gets optimized away?

I think so. If you look at the MIR (in debug mode, so this should only have 'guaranteed' optimizations):

// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn foo(_1: *const bool) -> () {
    debug ptr => _1;                     // in scope 0 at src/lib.rs:1:8: 1:11
    let mut _0: ();                      // return place in scope 0 at src/lib.rs:1:26: 1:26
    scope 1 {
    }

    bb0: {
        _0 = const ();                   // scope 0 at src/lib.rs:1:26: 3:2
        return;                          // scope 0 at src/lib.rs:3:2: 3:2
    }
}

You'll see that the dereference never happens. I think this is because the wildcard pattern (_) has special semantics.

@scottmcm
Copy link
Member Author

Because the dereference gets optimized away?

It's because _ is magic. If you instead try this:

fn foo(ptr: *const bool) {
    *ptr;
}

the error is still there even though LLVM also optimizes away that dereference.

@LeSeulArtichaut LeSeulArtichaut added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Dec 15, 2020
@LeSeulArtichaut
Copy link
Contributor

@spastorino
Copy link
Member

@tmiasko
Copy link
Contributor

tmiasko commented Dec 23, 2020

A few more examples:

pub struct S {
    x: (),
    y: u32,
}

pub fn f(s: *const S) {
    let S { x: (), .. } = *s;
}

pub fn g(a: *const u8) {
    let 0u8..=255u8 = *a;
}

It also affects behaviour of borrow checker, allowing following, which was disallowed back in 1.21:

pub fn f() {
    let _ = {
        let a = 0;
        &a
    };
}

@LeSeulArtichaut
Copy link
Contributor

LeSeulArtichaut commented Dec 27, 2020

Is this nominated for T-lang?

@LeSeulArtichaut
Copy link
Contributor

LeSeulArtichaut commented Dec 27, 2020

I'd like to tinker around and see what I can find :D

@LeSeulArtichaut LeSeulArtichaut self-assigned this Dec 27, 2020
@LeSeulArtichaut
Copy link
Contributor

LeSeulArtichaut commented Dec 27, 2020

Interestingly, this doesn't compile:

fn foo(ptr: *const bool) {
    let _ = { *ptr };
}

Seems like the MIR builder immediately optimizes ignored results from BlockFrame::Statement but not BlockFrame::TailExpr.

EDIT: in the code above, the deref get optimized away later in the MIR opts, so it appears with -Zmir-opt-level=0 but not with -Zmir-opt-level=1. IIUC, this means that simply emitting the statement in raw MIR and letting it be optimized by the MIR opts later should fix this issue.

@scottmcm
Copy link
Member Author

let _ = { *ptr };

The critical difference there, I think, is that the {}s make it no longer a place, but an rvalue.

@camelid
Copy link
Member

camelid commented Dec 28, 2020

I think scottmcm is saying that adding curly braces makes a new value, rather than a "projection" on a Place. A projection is a field access, a pointer dereference, etc.

@LeSeulArtichaut LeSeulArtichaut removed their assignment Dec 28, 2020
@LeSeulArtichaut
Copy link
Contributor

@RalfJung
Copy link
Member

Cc #79735

@RalfJung
Copy link
Member

The critical difference there, I think, is that the {}s make it no longer a place, but an rvalue.

Yeah, curly braces force the presence of a place-to-value coercion, at which point a deref definitely happens.

@apiraino apiraino removed I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 30, 2020
@pnkfelix
Copy link
Member

pnkfelix commented Jan 8, 2021

Another interesting variation that I do not think is covered by the comments above:

fn foo(ptr: *const bool) {
    let _: bool = *ptr;
}

this also does not compile, due to unsafety-checking:

error[E0133]: dereference of raw pointer is unsafe and requires unsafe function or block
 --> issue-80059.rs:2:12
  |
2 |     let _: bool = *ptr;
  |            ^^^^ dereference of raw pointer
  |
  = note: raw pointers may be NULL, dangling or unaligned; they can violate aliasing rules and cause data races: all of these are undefined behavior

Update: Also, the span is wrong. 😆

@RalfJung
Copy link
Member

RalfJung commented Jan 8, 2021

What... is going on there? Why does adding a type annotation change anything?!?

The only thing I can imagine is that this forces some kind of MIR to be emitted. --emit mir on playground shows an empty body... likely this is optimized away already then. :/

But that sounds very wrong -- one can debate whether a _ pattern should trigger the place-to-value coercion or not, but I would never expect the presence of type annotations to have such a fundamental operational effect.

@pnkfelix
Copy link
Member

pnkfelix commented Jan 8, 2021

What... is going on there? Why does adding a type annotation change anything?!?

I think the answer is going to be "because this is 'just a bug' that arose from the ad-hoc manner that some things are implemented in the compiler."

If you look at this code in the compiler:

match *irrefutable_pat.kind {

it is a spot where we have two special-cases, one for let x = expr and another for let pat: T = expr, before we finally fall-through to general-purpose code for all the other cases here:

I haven't yet 100% confirmed this hypothesis, but my current thinking is that these odd differences in behavior are arising due to differences in those three code paths.

@RalfJung
Copy link
Member

RalfJung commented Jan 8, 2021

Yeah, something like that... e.g. I could imagine that the type ascription case emits a AscribeUserType MIR statement, and maybe that is considered a "use" by unsafety checking. That would also explain the funny span of the error message.

@camelid
Copy link
Member

camelid commented Jan 8, 2021

Yeah, something like that... e.g. I could imagine that the type ascription case emits a AscribeUserType MIR statement, and maybe that is considered a "use" by unsafety checking. That would also explain the funny span of the error message.

I think this is right. This is the dumped MIR:

// MIR for `foo` 0 mir_map

| User Type Annotations
| 0: Canonical { max_universe: U0, variables: [], value: Ty(bool) } at foo.rs:2:12: 2:16
| 1: Canonical { max_universe: U0, variables: [], value: Ty(bool) } at foo.rs:2:12: 2:16
|
fn foo(_1: *const bool) -> () {
    debug ptr => _1;                     // in scope 0 at foo.rs:1:8: 1:11
    let mut _0: ();                      // return place in scope 0 at foo.rs:1:26: 1:26
    scope 1 {
    }

    bb0: {
        AscribeUserType((*_1), +, UserTypeProjection { base: UserType(1), projs: [] }); // scope 0 at foo.rs:2:12: 2:16
        _0 = const ();                   // scope 0 at foo.rs:1:26: 3:2
        return;                          // scope 0 at foo.rs:3:2: 3:2
    }
}

@camelid
Copy link
Member

camelid commented Jan 8, 2021

(Working on an experimental PR to not treat AscribeUserType as a use – it'll be interesting to see what if any side effects it has.)

camelid added a commit to camelid/rust that referenced this issue Jan 8, 2021
Previously, if the MIR had an AscribeUserType statement that ascribed a
type to the pointee of a raw pointer, it would be treated as a dereference
of the raw pointer for purposes of unsafe-checking. For example, the
following code would be treated as containing a raw-pointer dereference:

    fn foo(ptr: *const bool) {
        let _: bool = *ptr;
    }

Producing this error:

    error[E0133]: dereference of raw pointer is unsafe and requires unsafe function or block
     --> issue-80059.rs:2:12
      |
    2 |     let _: bool = *ptr;
      |            ^^^^ dereference of raw pointer

Note that the error points to the type ascription as having a
dereference! That's because the generated AscribeUserType MIR statement
is treated as containing a dereference of `_1`:

    AscribeUserType((*_1), +, UserTypeProjection { base: UserType(1), projs: [] });

Now the unsafe-checker ignores uses inside `AscribeUserType` statements,
which means this code now compiles successfully.

-----

Note that this code:

    fn foo(ptr: *const bool) {
        let _: bool = *ptr;
    }

does *not* produce an error (it compiles fine) because of the magical
behavior of the `_` (wildcard) pattern (see rust-lang#80059).
@rodrimati1992
Copy link
Contributor

rodrimati1992 commented Jan 11, 2021

This also compiles

fn main(){
    let uninit: &bool;
    let _: bool = *uninit;
}

Is this caused by the same bug?

@camelid
Copy link
Member

camelid commented Jan 11, 2021

Yes, in general any checks that are done on the MIR will not happen for assignments to _.

@pnkfelix
Copy link
Member

pnkfelix commented Jan 12, 2021

This also compiles

fn main(){
    let uninit: &bool;
    let _: bool = *uninit;
}

Is this caused by the same bug?

While the matching behaviors here may have the same root cause, I do not immediately classify that example as a bug. The idea is that *uninit is just describing how to reach a place, not necessarily a memory read, and the use of _ in the pattern means "we won't try to read from that point in the pattern match.

  • I do recognize that its difficult to reconcile viewing the above as "not a bug" while also asserting that the unsafety-checks should be applied in the case from the description.

Having said that, I am not saying I would veto changing our behavior in this scenario. I guess it depends on whether one views the unsafety-checking and the initialization-checking as being on similar footing or not.

@scottmcm
Copy link
Member Author

It wasn't obvious to me that the initialization checking wasn't also a bug. But @pnkfelix said something in the meeting today that was persuasive to me: it's fundamentally flow-sensitive, not scope-based, so while this might feel strange, it fits.

@RalfJung
Copy link
Member

I feel like it would be rather strange to ignore let _: bool = *uninit; for initialization checking but honor it for unsafety checking.

@nikomatsakis
Copy link
Contributor

We discussed this in a previous lang team meeting and concluded that we did not find that strange =)

Unsafe checking is not flow sensitive (e.g., it applies to dead code), but borrow checking is.

@RalfJung
Copy link
Member

IMO we should fix this by having a special MIR statement for "evaluate this place and then discard the result (without place-to-value conversion)". Alternatively this can be expressed by using &raw const <place expr> and storing the result in an unused temporary, but that seems a bit too hacky?

This would also help with Miri's rust-lang/miri#2360.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. fixed-by-thir-unsafeck `-Z thir-unsafeck` handles this correctly. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.