Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve match ergonomics #1944

Closed
wants to merge 2 commits into from
Closed

Improve match ergonomics #1944

wants to merge 2 commits into from

Conversation

nrc
Copy link
Member

@nrc nrc commented Mar 6, 2017

Prevent some of the boilerplate required for match expressions by auto-
dereferencing the target expression, and 'auto-referencing' pattern variables.
The rules for the latter are based closely on those for captured variables in
closures.

For example, current:

match *x {
    Foo(ref x) => { ... }
    Bar(ref mut y, z) => { ... }
}

proposed:

match x {
    Foo(x) => { ... }
    Bar(y, z) => { ... }
}

Prevent some of the boilerplate required for match expressions by auto-
dereferencing the target expression, and 'auto-referencing' pattern variables.
The rules for the latter are based closely on those for closures.

For example, current:

```
match *x {
    Foo(ref x) => { ... }
    Bar(ref mut y, z) => { ... }
}
```

proposed:

```
match x {
    Foo(x) => { ... }
    Bar(y, z) => { ... }
}
```
@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Mar 6, 2017
@nrc nrc self-assigned this Mar 6, 2017
@joshtriplett
Copy link
Member

Rendered

@joshtriplett
Copy link
Member

joshtriplett commented Mar 6, 2017

Target auto-deref seems fine to me. I can imagine some potential surprises from it, but nothing worse than we currently get from method calls.

Arm auto-ref, and especially arm auto-ref-mut, makes me very nervous. I think we need to consider the case of programs that didn't compile before, do compile now, and shouldn't.

In particular, with the help of type inference, I've seen people analyzing programs on the Rust IRC channels (and I've written a few myself) where the types accidentally ended up with double-references and similar (e.g. &&str, or Vec<&&T>, just because Rust happily allowed that to happen through type inference. I worry that this change will lead to many more such programs.

I do agree that ref and ref mut in patterns represent an unusual bit of Rust. And I won't dispute that they seem non-trivial to learn or to teach. However, I want to make sure that in the course of addressing that, we don't introduce something more mysterious, and more inscrutable.

@joshtriplett
Copy link
Member

One other thought: it seems a bit inconsistent to apply this only to match, and not to let, or more importantly to if let, often used as a one-branch match statement. For instance:

if let Some(ref y) = x {
    ...
}

@nrc
Copy link
Member Author

nrc commented Mar 6, 2017

where the types accidentally ended up with double-references and similar

is this bad? I've hit this myself, and it feels like a type error where I've written the wrong code and the compiler has caught it. That doesn't seem any worse than any other kind of type error.

I worry that this change will lead to many more such programs

Can you give an example of how?

However, I want to make sure that in the course of addressing that, we don't introduce something more mysterious, and more inscrutable.

This is a good point, but I don't think we are - like closures, we're introducing something that mostly 'just works'. Where we have to explain the mechanism we can insert the refs, just as we insert explicit types to teach type inference.

One other thought: it seems a bit inconsistent to apply this only to match, and not to let, or more importantly to if let, often used as a one-branch match statement

The RFC proposes applying this to if let. I have agonised over whether we should apply to let (the inconsistency is bothersome), my reasoning not to is that there is no explicit scope for the variables in a let statement and so the code on which to do inference is more 'open' than for match/if let. Furthermore, there is not so much motivation - people don't tend to write such patterns anywhere near as often as in matches.

@withoutboats
Copy link
Contributor

Consider this example:

let x = String::from("hello");
if let "hello" = x {
     println!("hello")
}

This gives a type error. You have to do &x[..] (and neither &x nor x[..] are sufficient).

I don't think the RFC as written covers this case, which has not been uncommon in my experience. Could it possibly? Is there a reason we can't autoref as well as autoderef the target, in the same style as we handle methods today? Philosophically, it seems appropriate that the transformation on the target of a match should be the same as the transformation on the receiver of a method.

@joshtriplett
Copy link
Member

joshtriplett commented Mar 6, 2017

@nrc

is this bad? I've hit this myself, and it feels like a type error where I've written the wrong code and the compiler has caught it.

I've helped debug people's programs on #rust and #rust-beginners, where in the natural course of type inference, people have ended up with double-references and vectors/slices of double-references, in programs that type-checked and compiled just fine. Then, when trying to make a change, they ended up with a type error, and suddenly wondered "how did I end up with &&str / Vec<&&T> / &&&str / etc in the first place". Not infrequently, the point at which they ended up with that type error and realized the problem involved a match statement.

This seems to happen especially often when dealing with iterators, and functions over iterators such as maps/filters/etc, which tend to introduce another level of reference.

I worry that auto-ref and auto-deref will make it even easier to dig even further before realizing you have a weird type. (And then wonder "why didn't Rust (with its strong typing) catch this sooner?".)

I wonder if we could introduce a lint of some kind for variables or expressions that unnecessarily have a type like &&T, if the compiler has enough information from inference to realize that every usage would work just as well with &T. If we had a lint like that, I'd feel much more comfortable with this proposal.

@nrc
Copy link
Member Author

nrc commented Mar 6, 2017

@withoutboats

I assume &*x works in this example? (And btw, it is annoying to me that &x[..] works, but &x doesn't, but I fear that ship has sailed).

Anyway, I don't think there is a technical reason not to do auto-ref on match targets, however, I have not given it as much thought as auto-deref. My reason for avoiding it is to avoid implicitly borrowing owned data (i.e., we only convert one kind of borrowing to another). This is the philosophy behind which conversions we allow for 'deref' coercions, and it just 'felt' right to me here. I can totally see the argument for allowing auto-ref - it makes things more flexible, should work fine, and is analogous to the conversions we do with the dot operator - and I don't have a strong argument against it. It is however less conservative - it would be easier to add it later than to take it away.

@nrc
Copy link
Member Author

nrc commented Mar 6, 2017

I wonder if we could introduce a lint of some kind for variables or expressions that unnecessarily have a type like &&T, if the compiler has enough information from inference to realize that every usage would work just as well with &T. If we had a lint like that, I'd feel much more comfortable with this proposal.

Thanks for the expanded explanation. Personally, I think such a lint and this proposal are fairly orthogonal, however, I see where you are coming from here and I agree it could be helpful (especially if it runs even if compilation fails). In fact, independent of this proposal, I think such a lint would be helpful in general - afaik there is never a reason to have &&T rather than &T (outside of generic code) and whenever this happens to me it is a mistake.

duration of the match. It also does not work with nested types, `match (*e,)
...` for example is not allowed.

In either case if we further bind variables, we must ensure that we do not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This of course is still a problem if you don't use derefs - you'll just get "use of move data" errors instead of "move out of borrow" errors.


Auto-deref allows the programmer to access borrowed data within a limited scope
without having to explicitly dereference the data. This is analogous to the
implicit dereferencing performed by the dot operator, however, the scope of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"scope of the dereference"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "the scope of dereference is inline, rather than defined by a function" mean?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read it as "the scope of the dereference is the match expression".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dereference exists for the duration of the match expression c.f. the duration of a function call

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for the dereference to exist? There is no temporaries that can "stop existing" in both function calls and match expressions. The resulting value lives as long as borrowck lets it live.

patterns. In order to be backwards compatible, we must be able to fall back to
the type known from the patterns if type inference is stuck without that
information. I think this will only require choosing `m = 0` if type inference
can't make progress.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should act like deref coercions - if we have a match with k derefs, proceed with it even if it "forces" some inference variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case I'm covering here is if we get stuck (if we can proceed by forcing inference, we should) and can't find m/k. Then we arbitrarily choose 0, and try to continue.

Copy link
Contributor

@arielb1 arielb1 Mar 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get stuck = encounter a type error for every k? Then continuing is a matter of error recovery, right?

There are 4 situations:

  1. There is some k such that we can proceed using k autoderefs, possibly "forcing inference". In such case, choose the minimal such k and proceed.
  2. For all j < k, we can autoderef j times but can't proceed with the match, and the kth autoderef is ambiguous - in that case, we generally report a "the type of this value must be known in this context" error and do something random for error recovery, through coercion forces 0 autoderefs.
  3. For all j < k, we can autoderef j times but can't proceed with the match, and the kth autoderef can't be done. This is an error. To display the error to the user, we can e.g. show the type error you get from 0 autoderefs.
  4. For all j < RECURSION_LIMIT, we can autoderef j times but can't proceed with the match. Report a recursion limit error.

Because the current situation corresponds to 0 autoderefs, and the 0th autoderef always succeeds, cases (2), (3) and (4) would have caused an error pre-RFC, and so aren't backwards-compatibility issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of an example like

match from_str("foo").unwrap() {
    Foo { ... } => { ... }
}

where the type of the target is completely unknown. We currently use the type of the pattern to infer the type of the target. Under the rules in the RFC we can't do that - we need the type of the target and pattern to decide on the number of derefs. In cases like this, we must pick zero derefs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what "forcing inference" means.

```
let mut i: i32 = 0;
match i {
j => j += 1, // `j` is treated as `ref mut j` since `+=` takes `j` as `&mut i32`.
Copy link
Contributor

@arielb1 arielb1 Mar 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

silent binding mutability? This looks scary. I would prefer mut j.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a valid alternative, IMO. However, the back-compat question is more complex because if you write mut x that would force x to be by value in the current proposal. If we allow that to also be something like ref mut x, then the user can't force a mutable value in any way - we'd need move in this case, and we still might not be 100% back compat, we'd need to look into that.

More conservatively, we could just not to any auto-ref'ing for mutable variables - roughly, we would infer ref, but never ref mut. That is obviously less flexible, but avoids implicit mutable binding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a valid alternative, IMO. However, the back-compat question is more complex because if you write mut x that would force x to be by value in the current proposal.

But the current proposal doesn't allow you to force an immutable move, no? Is it your intent that users would have to write mut x when they want to move x, whether or not they need mutability? That would raise a warning and seems like the wrong thing to teach people to do.

If we want users to be able to force a by-value binding, we should add the move keyword in patterns to do that. It seems separate from inferring mut x to a ref.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it your intent that users would have to write mut x when they want to move x

No, my belief is that 'inference' of move/borrow should be perfect (c.f., closures) and so move or forcing moving a single variable is not necessary. I agree that would be the wrong thing to do.

If we want users to be able to force a by-value binding, ... It seems separate from inferring mut x to a ref.

You might be right (I admit, I haven't thought this through as thoroughly as the actual proposal. My worry is that with mutability, there is another category of use that doesn't exist without it. Consider

match x {
    mut y => { y.f = 0 }
}

Should this mutate x or not (or, IOW should x be moved)? There is no guide for the compiler whether y should be borrowed or not (both would compile) and the semantics are clearly different. If we do adopt the mut requirement and infer this as ref mut we break back compat, as far as I can see.

Copy link
Contributor

@withoutboats withoutboats Mar 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, x is moved, so while the semantics are different, they're not observable in any code that compiles today, are they? I guess this is what you mean about needing to look into the backcompat issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do adopt the mut requirement and infer this as ref mut we break back compat, as far as I can see.

We break back compat even if the code compiles without the mut. If we want to keep back-compat, we need to make mut force by-move - though that would leave Cell as a nasty edge case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comex If the data is Copy, it is never referenced

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Closure inference does not do anything smart wrt. by-ref vs. by-value" I'm not sure what you mean by "smart" here. Closure inference clearly does some fairly smart stuff to know whether to borrow or move data. AIUI, you only need to mark a closure move if it is returned by a function or otherwise escapes its stack frame, in that case the inference cannot be perfect and the programmer has to handle this stuff manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If we want to keep back-compat, we need to make mut force by-move" - right, this is what the RFC currently does.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrc Oops, I missed that. Doesn't that turn implementing Copy for a given type into a breaking change? Especially in the alternative where mut y can be a mutable reference, but even as written in edge cases, e.g. a huge array that won't fit on the stack.

}
```

To opt out of all implicit dereferencing in a pattern, we could allow the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move-mode would also be the default for let-statements (which are basically a one-armed match).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, though not if let

@e-oz
Copy link

e-oz commented Mar 6, 2017

Implicit "mut" is a dangerous thing in my opinion. I think "mut" should be always explicit only.

@Stebalien
Copy link
Contributor

Allowing auto-ref without a matching auto-deref will change drop order (possibly introducing deadlock in programs that rely on it). Consider:

#[derive(Debug)]
struct Dropper;
impl Drop for Dropper {
    fn drop(&mut self) {
        println!("dropping");
    }
}
fn main() {
    let thing = Some(Dropper);
    match thing {
        Some(thing) => { // Auto ref will turn this into a reference instead of a move.
            println!("{}", thing);
            // Currently drops here.
        },
        None => {},
    }
    println!("after");
}

This currently prints "Dropper", "dropping", then "after". After this proposed change, it would print "Dropper", "after", "dropping".


Also, please see my objections on the internals thread.

@leoyvens
Copy link

leoyvens commented Mar 6, 2017

Consider a pattern Some(x) where Some(ref mut x) is inferred. Now consider someone implementsCopy for x. Now that pattern no longer infers ref mut and instead copies x, most likely introducing a bug since future uses of x expected it to be mutated. Seems bad.

I'd suggest starting with a minimal RFC for auto-deref of target only, which is much less controversial.

@withoutboats
Copy link
Contributor

withoutboats commented Mar 6, 2017

Example of a deadlock after @Stebalien's example:

use std::sync::Mutex;

fn main() {
    let x = Mutex::new(());
    let lock = x.lock();
    match lock {
        Ok(_lock) => { } // deadlocks if we infer this to `ref _lock`
        _ => ()
    };
    let _ = x.lock();
}

@withoutboats
Copy link
Contributor

withoutboats commented Mar 6, 2017

Given the way drop order affects this, it seems to me like there are two possible avenues:

  1. Rather than only moving if necessary, only ref if necessary. This seems like the most backwards compatible (AFAICT it would mean no existing code changes), but I suspect it may be more complicated to define and implement.
  2. Support move and do a crater run. What appeals to me about this is that it seems like it works the same way as closures. This is similar to what I like about auto-reffing the discriminant - that would work the same as methods. Then there aren't really new concepts to learn here, just the application of old concepts to new places.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Mar 6, 2017

The fact that adding this sugar opens up so many thorny issues makes me wonder if it will really make anybody's life easier. Perhaps this makes code easier to write, but does it make it easier to read? Per the evaluation procedure in https://blog.rust-lang.org/2017/03/02/lang-ergonomics.html I'd think this worrisomely context-dependent.

@burdges
Copy link

burdges commented Mar 7, 2017

Could rustfix, or rust itself, just suggest code transformations that eliminated &&T?

@oli-obk
Copy link
Contributor

oli-obk commented Mar 7, 2017

Could rustfix, or rust itself, just suggest code transformations that eliminated &&T?

clippy does this for existing code already (let x: &T = &&t), If this gets merged, we'll obviously extend the lint to also suggest the neatest representation possible.

@arielb1
Copy link
Contributor

arielb1 commented Mar 7, 2017

This is similar to what I like about auto-reffing the discriminant - that would work the same as methods.

If we don't auto-ref the discriminant, this would be like deref coercions, which I feel are more similar because they are fundamentally both lvalue operations (matching an autoref-ed pattern by ref would create a ref to an "extraneous" temporary).

@nrc
Copy link
Member Author

nrc commented Mar 7, 2017

Given the way drop order affects this, it seems to me like there are two possible avenues:

Option 1 did not seem possible when I investigated it when working on the RFC - it breaks back compat and means we have different rules to closures (possibly some other issues too).

Option 2 sounds good to me. I agree this feels like a real issue that can't be worked around, but also one that is unlikely to come up in practice.

@nrc
Copy link
Member Author

nrc commented Mar 7, 2017

The biggest sticking point here seems to be the inference of ref mut. I am sympathetic to this concern. I see two possibilities:

  1. we never infer ref mut, if you have a variable that needs to be mutably referenced you have to write it. This has the disadvantage of moving the 'annotation cliff' forward - if we you want to change a variable from referenced to mutably referenced the change is from x to ref mut x.

  2. require mut to infer ref mut, i.e., mut x means x is either a mutable value of a mutable reference. This breaks backwards compatibility, and means there is no way to force a mutable value. The latter can be addressed by supporting move on patterns. The former we could solve by investigating how much of a problem this is and if not too bad, then just changing it, or by making this a Rust 2.0 change.

ISTM that unless we decide to schedule Rust 2.0, that option 1 is better here.

@withoutboats
Copy link
Contributor

Option 1 did not seem possible when I investigated it when working on the RFC - it breaks back compat and means we have different rules to closures (possibly some other issues too).

How does it break backwards compatibility?

@burdges
Copy link

burdges commented Mar 8, 2017

Aren't @Stebalien's examples a problem when inferring even just ref though?

@glaebhoerl
Copy link
Contributor

I get your point, and agree that it's a learnability hazard. I still think it's better for there to be a consistent logic that needs a bit of effort to learn, than for there not to be a consistent logic you can learn with any amount of effort. And while right now the voices of people who find the current meaning confusing are being heard, I suspect if we were to flip it, we'd instead hear the voices of the other half (especially considering we taught them). And if it's going to defy the expectations of half the audience either way we go (which seems quite likely), then as with while..else I suspect the wisest route would be neither way. That is, to avoid having it.

@withoutboats
Copy link
Contributor

withoutboats commented Apr 21, 2017

I suspect the wisest route would be neither way. That is, to avoid having it.

I think I agree with this, and that's why I think I prefer using keywords here when possible (and elide thise keywords as much as possible!)

@joshtriplett
Copy link
Member

joshtriplett commented Apr 21, 2017

For the record, so far I still find all the active proposals fairly confusing, in that they introduce magic type translation rules that make it hard for me to form a clear mental model of the types involved. When I see match expression { pattern => ... , I expect expression and pattern to have exactly the same type, not "the same type modulo some magic rules about references". Auto-deref is quite understandable in most cases, but auto-ref on the other hand introduces confusion about whether something is owned or referenced.

I think the biggest problem is that automatic dereferencing goes from "this couldn't possibly make sense as written" to "here's a plausible interpretation", while automatic referencing goes from one reasonable interpretation (ownership) to another reasonable interpretation (referencing). (That's true even if the reasonable interpretation of ownership leads to a compile error; better to have a clear mental model that leads to an error than to have a fuzzier model that allows the code to compile but perhaps not mean what the user expected it to mean.) @glaebhoerl's point about wanting a consistent underlying logic echoes exactly the concerns I have.

ref and ref mut took me a little time to learn, but once I learned how they worked, they just felt like the pattern equivalent of writing &a.b.c.d or &mut a.b.c.d: they let you reference something out of the middle of a structure you're matching.

All that said, I do think there might be a way to simplify match; I just don't think the current proposals are worth the cost in consistency and reasoning.

@phaylon
Copy link

phaylon commented Apr 21, 2017

I understand the logic of this & even agree with it, yet it seems that in practice a large number of users who have no problem understanding destructuring in general still expect the reference operators to perform a reference in patterns. I think the majority of users don't see & and &mut as constructors per se and I'm not sure that it would be so bad for the syntax to match their expectations.

Is this actually quantifiable? I'm much more comfortable with the current situation. And although I wouldn't be too surprised if I'm in the minority, I'm not sure it can be seen as certain. It seems hard to measure the number of people that have an easier time with the way it works now.

@joshtriplett
Copy link
Member

joshtriplett commented Apr 21, 2017

Responding to an earlier comment from @nrc:

In fact, independent of this proposal, I think such a lint would be helpful in general - afaik there is never a reason to have &&T rather than &T (outside of generic code) and whenever this happens to me it is a mistake.

I agree, but I've also run into cases where it happens and it's not obvious how to avoid it. As an example:

  • You have a collection of String objects.
  • You create another data structure consisting of references to the string objects in that collection. For instance, you have a HashMap with &str as either the key or value.
  • You want to iterate over the keys/values. The iterator's Item type ends up as &&str.

That situation ("iterator over a collection of references") seems to happen fairly often, and I don't see an easy way to avoid momentarily ending up with such types (though you can then eliminate the extra reference). Where that then becomes a much more easily avoided error is if you then use collect() or similar to turn those references into a collection, and end up with types like Vec<&&str> or HashSet<&&str>, without eliminating the extra reference. (And if you keep going down that road, and iterate again, you can even end up with types like &&&str.)

When working with generic functions, type-inferred closures, and auto-deref, it's surprisingly easy for all of the above to happen without ever encountering a type error. And where that comes back to the topic of this RFC is that it's surprisingly common for the first point where you realize you've ended up with a && or &&& is when you go to pattern-match it and get a type error.

@nikomatsakis
Copy link
Contributor

@glaebhoerl I definitely agree that we should strive to make "deref patterns" as unnecessary as possible. We can put aside the question of how to write them for the time being, I suspect. =)

@withoutboats
Copy link
Contributor

withoutboats commented Apr 23, 2017

Returning to @nagisa's point about not focusing only on match, I really think these rules would be valuable in every kind of pattern. Its not uncommon that I iter over a Vec<(_, _)> and find or filter by one of the tuple members:

vec.iter().find(|(name, _)| name == some_name)

Even today it usually takes me more than one attempt to get this right, because it is behind two references:

vec.iter().find(|&&(name, _)| name == some_name)

@burdges
Copy link

burdges commented Apr 23, 2017

What about assignment let x = y; or let x = bar.foo(); where fn foo(&self) -> &mut T?

@withoutboats
Copy link
Contributor

withoutboats commented Apr 23, 2017

@burdges There's no destructuring going on, so it doesn't impact those.

However, if you were to do:

struct Foo { arg: usize }

let Foo { arg } = &Foo { arg: 0 };

arg would be an &usize.

@nikomatsakis
Copy link
Contributor

So, I think I'd like to see an RFC with my take on @Stebalien's proposal, and then we can weigh them side-by-side? @Stebalien, would you be interested in working with me on it?

@glaebhoerl
Copy link
Contributor

glaebhoerl commented Apr 25, 2017

What about @ patterns?

match &Some(foo()) {
    x @ Some(y) => { ... },
    _ => { ... }
}

Is the type of x here Option<T> or &Option<T>? I suspect the latter is more appropriate, that is the @ binding (x) gets "reference-inverted" the same way as sub-bindings (y: &T) do.

EDIT: Context is I was looking at this code linked from @nikomatsakis's recent blog post. It seems like that choice (@ gets reference-inverted) would work out better in practice as well in this case: all of the &s and refs in this match arm could simply be removed.

@glaebhoerl
Copy link
Contributor

glaebhoerl commented Apr 26, 2017

I meant to add that my proposal as written left out the deref coercions contained in the current proposal, but I suspect we should include those too. i.e., if you match a &Rc<Option<i32>> with Some(x) that would be a by-ref binding into the Rc.

This sounds good too. If I understand correctly, this would end up obviating things like box patterns? That sounds really good.

Thoughts related to this:

  • I recall one of the motivations for prior RFCs about things like DerefPure or SafeDeref was to enable the use of user-provided deref impls in pattern matching. Given that this would involve calling user-provided deref impls in pattern matching, do the same concerns re-arise here?

  • If I understand correctly, if you were to match on a Box<Option<T>> (or some structure containing one) by-value, and use a Some(x) pattern, reference-inversion would not happen for the Box -- to enable them, you would have to match on &Box<Option<T>> or (&MyStructure<Box<Option<T>>>) instead, after which x would end up being an &T reference through an extension of the "default binding mode transition" rules in @nikomatsakis's comment to types-implementing-Deref. This sounds appropriate to me. The issue with allowing it for by-value Box directly would be that ref mut would be the "strongest" mode we currently support for the inner bindings, but if we ever added something like &move references, we would presumably want to "upgrade" the default-mode-for-Box to that instead, which wouldn't be backwards compatible at that point.

  • Suppose I match on an &Option<Box<T>>. Is there any pattern I could write such that the type of the binding is &T, rather than &Box<T>? As far as I can tell, no - that is, things like Box and Rc are always "transparent" in pattern matching, and you can't refer to them "directly". Is this ever a significant issue? (My gut feeling is no: you can just manually deref or rely on deref coercions on the RHS of the match.)

@nikomatsakis
Copy link
Contributor

This sounds good too. If I understand correctly, this would end up obviating things like box patterns? That sounds really good.

Well, that "auto-deref" I was referring to was intended to be applied only at the top-level. So if you were matching a Rc<Foo>, if would be helpful, but not a (Rc<Foo>, Rc<Foo>). That said, part of why I like the idea of a general "deref" pattern (as opposed to a &T pattern) is that it could subsume all derefable things (e.g., Rc). But I think we should just leave that part out of this discussion for now and focus on the clear wins in front of us.

@nikomatsakis
Copy link
Contributor

@glaebhoerl

I recall one of the motivations for prior RFCs about things like DerefPure or SafeDeref was to enable the use of user-provided deref impls in pattern matching. Given that this would involve calling user-provided deref impls in pattern matching, do the same concerns re-arise here?

This is one of the reasons that I want to avoid general deref patterns for the moment. =)

@aturon
Copy link
Member

aturon commented Apr 29, 2017

@joshtriplett

When I see match expression { pattern => ... , I expect expression and pattern to have exactly the same type, not "the same type modulo some magic rules about references". Auto-deref is quite understandable in most cases, but auto-ref on the other hand introduces confusion about whether something is owned or referenced.

Thanks for articulating this perspective and concern, which I imagine many people share. Here are a couple of things that, in my mind, help mitigate the worry:

  • First, and most importantly, the expectation is that you already have some idea about the ownership status of the value being matched against. This proposal is basically "extending" that knowledge through the decomposition. IOW, there would only be uncertainty about the resulting ownership if you were uncertain about the ownership of the value to being with.

  • It's also the case that, if you misunderstand the situation with respect to borrowing, you'll end up falling afoul of the borrowchecker. We should be able to give quite good error messages here (explaining that e.g. a field binding was borrowed because the value being matched was borrowed) as well. So the readability concern is mostly about telling "at a glance" what's going on. But again, as above, I don't think that really changes here; it's still a highly local consideration. (Which is quite different from the RFC itself.)

  • I tried to spell out above the mental model I see at play here, but let me try to rephrase it. I find that when I'm programming "at speed", I tend to think about & mostly as a kind of marker that "this value is borrowed". From that perspective, matching &Option<T> against Some(x) makes perfect sense: I'm matching against an Option that happens to be borrowed, and of course the components of the match would likewise be borrowed.

  • I suspect that this is one of those features where we'll need to try it at scale before we can fully know the impact on the reasoning footprint for ownership. I definitely think we'll want to take care around stabilization for that reason.


@nikomatsakis, I quite like your proposal as it stands, and the longer I sit with it, the more I prefer this general direction over the original RFC (as clever as it is).

@aturon
Copy link
Member

aturon commented Apr 29, 2017

Oh, there was one downside I wanted to mention, which is the mild "inconsistency" with field projection. Compare:

struct Foo { x: Bar, y: Bar }
let my_foo = Foo { x: ..., y: ... };

let Foo { x, y } = &my_foo; // here, x and y have type &Bar
&my_foo.x // this also has type &Bar, and there's a decent 
          // intuition that we're applying `&` to the whole path
          // to say "please borrow this path"

let my_foo_ref = &my_foo;
let Foo { x, y } = my_foo_ref; // here, x and y have type &Bar
my_foo_ref.x // error
&my_foo_ref.x // here, we get &Bar, but we had to *ask* for
              // the borrow, even though we started with a borrow;
              // the match doesn't, though

@Stebalien
Copy link
Contributor

@nikomatsakis

Your proposal is similar mine but a bit more DWIM.

match & &mut Some(3) {
    Some(x) if false => {
        // `x` would be an `&i32` here, because we dereference the `&`, getting into
        // ref mode, then the `&mut`, but we are already in `ref` mode, so we stay that way
    }
    _ => { }
}

In my proposal, x would either be an &&mut i32 or this wouldn't compile (I didn't really consider repeated applications). However, I prefer your way; it's consistent with how the IntoIterator trait works.

One thing I'm a bit fuzzy on is the reference table:

Kind of reference Current default New default
&T move ref
&T ref ref
&T ref mut ref
&mut T move ref mut
&mut T ref ref
&mut T ref mut ref mut

To me, "default" generally means implicit. However, unless I'm mistaken, patterns always bind by value at the moment so that can't be what you're talking about. Are you talking about the case where the programmer explicitly binds by ref/ref mut? That is:

Kind of reference Explicit Pattern New default
&T ref
&T ref ref
&T ref mut (??) ref
&mut T ref mut
&mut T ref ref
&mut T ref mut ref mut

By example,

match &Some(3i32) {
    Some(ref x) => {
        // Is `x` a `&i32` or an `&&i32`? In my proposal, it would have been the
        // latter. If I understood your proposal correctly, it would be the
        // former.
    },
    _ => {},
}

Now, if I am reading this table correctly, I disagree with the third entry (marked by "??"; I think it should be a type error. That is:

match &Some(3i32) {
    Some(ref mut x) => {
        // Type error because we can't mutably bind `x`. Based on my understanding
        // of your current proposal, `x` would be an `&i32`.
    },
    _ => {},
}

However, converse (the fifth entry) should be perfectly valid:

match &mut Some(3i32) {
    Some(ref x) => {
        // `x` is an `&i32` (downgrade mutability).
    },
    _ => {},
}

On the topic of not needing a special box pattern, a generalized move pattern should be a valid replacement. That is, a move pattern would match box (e.g., using Deref*), &, or &mut.

However, the word move is a bit unfortunate as I'd expect ref mut move to be equivalent to DerefMut. That is, I'd expect the following to work:

let mut y = Box::new(1i32);
let ref mut move x: &mut i32 = y;
*x = 2;

However, as y isn't really "moved", it's dereferenced and then re-referenced.


As for writing an RFC, I'm unfortunately a bit busy at the moment (getting ready to start a new job) so I really shouldn't take on any more responsibilities (I'm already falling behind on my current ones).

However, I'd be happy to give feedback and kibitz (and let you do all the real work :)).

@aturon
Copy link
Member

aturon commented May 1, 2017

@Stebalien

One thing I'm a bit fuzzy on is the reference table:

It also took me a long time to grok what the table was getting at. What "default" means here is basically what implicit mode you're already inside. In particular, if you are taking apart a &&mut Option<T> and you have a Some arm, you want the borrowing mode to be ref, not ref mut. At the outside, you're in move mode. After going through &, you're in ref mode. But now, when you go through &mut, you stay at ref mode rather than going to ref mut mode.

re: RFCs planning, so far it seems like those in favor of doing something along these lines are in favor of @Stebalien and @nikomatsakis's proposal in particular. @nrc, though, has been on parental leave, and obviously his opinion is very much desired. But assuming that he also agrees that these new proposals are an improvement, perhaps he can just update this RFC accordingly?

@nikomatsakis
Copy link
Contributor

I spoke with @nrc, and he seemed to be generally in favor of the new direction, but I guess he can speak for himself. Pretty sure he's buried under notifications at the moment. =)

@nikomatsakis
Copy link
Contributor

@Stebalien Hopefully @aturon's explanation helped. In any case, the reason I called it the default mode because the idea was that it can be overridden by an explicit mode declaration. (So, e.g., one could write let Some(move x) = &Some(3), and x would have type i32.) Today, things work the same way, except that the default mode is always move (regardless of context); but one can always override with an explicit ref or ref mut mode.

That said, @nrc was pointing out that there isn't really a need to make it possible to declare "move" mode explicitly, since one could still use an explicit & pattern for that: let &Some(x) = &Some(3). I'm sort of indifferent about whether we permit an explicit move x to be written or not.

@withoutboats
Copy link
Contributor

I think move would be more intuitive to most users than & destructuring (though I think neither would be used often). I'd like to at least support it unstabley and try to get some practical experience with it.

@phaylon
Copy link

phaylon commented May 3, 2017

That said, @nrc was pointing out that there isn't really a need to make it possible to declare "move" mode explicitly, since one could still use an explicit & pattern for that: let &Some(x) = &Some(3). I'm sort of indifferent about whether we permit an explicit move x to be written or not.

It seems quite counter-intuitive that I'd have to use reference-facilities to perform a move. I'd also consider it a lot more vague, given the goal is the ability to be explicit. Additionally I'm not sure how that would work outside of Copy types.

@nrc
Copy link
Member Author

nrc commented May 22, 2017

Closing in favour of #2005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.