Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add std::cell::Ref::map and friends #25747

Merged
merged 2 commits into from
May 29, 2015
Merged

Add std::cell::Ref::map and friends #25747

merged 2 commits into from
May 29, 2015

Conversation

SimonSapin
Copy link
Contributor

For slightly complex data structures like rustc_serialize::json::Json, it is often convenient to have helper methods like Json::as_string(&self) -> Option<&str> that return a borrow of some component of &self.

However, when RefCells are involved, keeping a Ref around is required to hold a borrow to the insides of a RefCell. But Ref so far only references the entirety of the contents of a RefCell, not a component. But there is no reason it couldn’t: Ref internally contains just a data reference and a borrow count reference. The two can be dissociated.

This adds a map_ref function that creates a new Ref for some other data, but borrowing the same RefCell as an existing Ref.

Example:

struct RefCellJson(RefCell<Json>);

impl RefCellJson {
    fn as_string(&self) -> Option<Ref<str>> {
        map_ref(self.borrow(), |j| j.as_string())
    }
}

r? @alexcrichton

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see CONTRIBUTING.md for more information.

@SimonSapin
Copy link
Contributor Author

https://github.com/rust-lang/rfcs#when-you-need-to-follow-this-process lists "Additions to std", but maybe it should be "#[stable] additions to std"? Anyway let me know if I should write an RFC.

Unresolved questions:

  • Is there a better name for this?
  • Should it be a method instead? It would shadow T methods that could otherwise be used through Deref.

@SimonSapin SimonSapin force-pushed the map_ref branch 4 times, most recently from 62e8200 to 2b49fc6 Compare May 24, 2015 13:10
@bluss
Copy link
Member

bluss commented May 24, 2015

I've added and experimented with this once. I scrapped the idea for my use case, because this makes it easy to return Ref, at which point you push the responsibility to not do a borrowing error onto the user of your API. I didn't like that, but if you have a use case for it, I suspect this is so useful we should have it available for when you need it.

@bluss
Copy link
Member

bluss commented May 24, 2015

Wait, why don't you use a closure here?

@SimonSapin
Copy link
Contributor Author

What borrowing error do you mean?

This design of taking &U doesn't work. The usage I want of iit doesn't borrow-check. Trying something with a closure.

@bluss
Copy link
Member

bluss commented May 24, 2015

By borrowing error I mean a panic in borrow() or (more likely) borrow_mut.

@SimonSapin
Copy link
Contributor Author

Updated

@SimonSapin
Copy link
Contributor Author

Sigh. I realized the PR as-is introduces memory unsafety: the closure can keep around the pointer it’s given longer than the lifetime of the Ref. This test case prints false then true, meaning we’re accessing Foo after its destructor has run.

#![feature(core)]

use std::cell::{RefCell, Ref, map_ref};

struct Foo {
    destroyed: bool
}

impl Drop for Foo {
    fn drop(&mut self) {
        self.destroyed = true;
    }
}

fn main() {
    let a = RefCell::new(Box::new(Foo { destroyed: false }));
    let mut b = None;
    let _: Option<Ref<()>> = map_ref(a.borrow(), |s| {
        b = Some(&**s);
        None
    });
    println!("{}", b.unwrap().destroyed);
    *a.borrow_mut() = Box::new(Foo { destroyed: false });
    println!("{}", b.unwrap().destroyed);  // Use after free
}

I think making map_ref accept a more restrictive Fn closure instead of FnOnce would patch this particular hole, but is it enough?

@bluss
Copy link
Member

bluss commented May 25, 2015

It looks like the closure idea was broken then. Sorry about that, my bad.

@huonw
Copy link
Member

huonw commented May 25, 2015

cc #19220

I think making map_ref accept a more restrictive Fn closure instead of FnOnce would patch this particular hole, but is it enough?

I suspect not; couldn't one make b a Cell<Option<...>> instead?

@SimonSapin
Copy link
Contributor Author

Right. Here is an alternative idea.

@jdm
Copy link
Contributor

jdm commented May 25, 2015

There's prior art that should definitely be considered at #19220.

@bluss
Copy link
Member

bluss commented May 25, 2015

I believe this version is sound, it looks good. Why use an optional return value though?

@SimonSapin
Copy link
Contributor Author

My use case it returning a new Ref to some component that may or may not be there. But checking whether it’s there is the same amount of work as accessing it. If map_ref didn’t involve options, I’d have to do this work twice.

Example in Kuchiki:

pub struct ElementData {
    pub name: QualName,
    pub attributes: RefCell<HashMap<QualName, String>>,
}

impl ElementData {
    fn get_attribute(&self, name: Atom) -> Option<Ref<str>> {
        // With `map_ref` as proposed:
        map_ref(self.attributes.borrow(), |attrs| {
            attrs.get(&QualifiedName { ns: ns!(""), local: name }).map(|s| &**s)
        })

        // With a simplified `map_ref`, two HashMap lookups:
        let borrow = self.attributes.borrow();
        let key = QualifiedName { ns: ns!(""), local: name };
        if borrow.contains_key(&key) {
            Some(map_ref(borrow, |attrs| &*attrs[&key]))
        } else {
            None
        }
    }
}

@SimonSapin
Copy link
Contributor Author

Unlike #19220, this doesn’t change the semantic of existing items (it just adds a new one), so I don’t think it has the same issues. The example program at #19220 (comment) correctly fails to build with a lifetime error (after updating it for language changes).

@SimonSapin
Copy link
Contributor Author

Maybe my get_attribute method could return Ref<Option<_>> instead of Option<Ref<_>>, but that seems more awkward to deal with.

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 26, 2015
@alexcrichton
Copy link
Member

I, too, have wanted this kind of functionality from time to time before, and I'm totally fine seeing it prototyped! This is a minor enough API that I don't believe it will require an RFC, but I'll cc @rust-lang/libs to see if other have opinons as well :)

I have the same question as @bluss as well in that from an API perspective, could you elaborate on why the closure returns Option<&U>? I'd expect it to return &U.

Is there a better name for this?

Naming-wise this may also wish to consider RefMut which may also want the capability to map itself to another reference.

Should it be a method instead? It would shadow T methods that could otherwise be used through Deref.

We've been quite wary of adding methods on types that implement Deref, so I think for now it'd want to stick around as a global function. I think when considering the name for how to map a RefMut the names map_ref and map_ref_mut do mirror each other well. Another option could possibly be an associated function Ref::map and RefMut::map, although I'm not sure if there's a way to force the compiler to not think that they're available as methods as well.

Apart from the Option return value, I agree with @bluss that this appears sound to me.

@SimonSapin
Copy link
Contributor Author

Regarding Option<&U>, my use case is that I want to return Option<Ref<_>>, an optional reference to a component that may or may not be there. Another example:

pub enum NodeData {
    Element(ElementData),
    Text(String),
    Comment(String),
    Doctype(Doctype),
    Document(DocumentData),
}

pub struct Node {
    // ...
    pub data: RefCell<NodeData>,
}

impl Node {
    fn as_element(&self) -> Option<Ref<ElementData>> {
        map_ref(self.data.borrow(), |data| match *data {
            Element(ref element) => Some(element),
            _ => None
        })
    }
}

If map_ref was simplified to

pub fn map_ref<'b, T: ?Sized, U: ?Sized, F>(orig: Ref<'b, T>, f: F) -> Ref<'b, U>
where F: FnOnce(&T) -> &U

… this would be more tricky to implement and require matching twice:

impl Node {
    fn as_element(&self) -> Option<Ref<ElementData>> {
        let borrow = self.data.borrow();
        match *borrow {
            Element(_) => map_ref(borrow, |data| match *data {
                Element(ref element) => element,
                _ => unreachable!()
            }),
            _ => None
        }
    }
}

I mentioned before that maybe I could instead return Ref<Option<_>> but that doesn’t work: There is no existing Option I can return a &Option reference to, from the closure.

@SimonSapin
Copy link
Contributor Author

Added map_ref_mut.

How can I run the doctests?

@SimonSapin
Copy link
Contributor Author

I’ve found out (thanks eddyb!) that I can run the doctests with make check-stage1-doc-crate-core TESTNAME=map_ref NO_REBUILD=1. But they’re failing and I have no idea why :( The output does not include the build error or the panic! message that make the test fail, and I didn’t manage to fix it by guessing and making changes blindly.

@alexcrichton
Copy link
Member

Regarding Option<&U>, my use case is that I want to return Option<Ref<_>>, an optional reference to a component that may or may not be there.

I'm worried about the compositionality of this API, however. For example if I have a Ref<Result<T, E>> I may want to turn it into a Result<Ref<U>, &E> (or something like that). Singling out Option seems like it's addressing a problem, but perhaps tackling it from the wrong angle. For example Ref<Option<T>> is very similar to &Option<T> on which we use the as_ref method to convert to Option<&T> (or in this case Option<Ref<T>>).

I would personally rather see this as &T -> &U (aka as zero-cost as possible) and explore an ergonomic transformation of Ref<Option<T>> to Option<Ref<T>> later on.


Also, it looks like associated functions can indeed be defined that aren't methods:

struct A;
impl A {
    fn foo(a: &A) {}
}

let a = A;
a.foo(); // error
A::foo(&a); // ok

Perhaps static map functions could be used? You've already imported Ref and RefMut most likely if you're returning them, so it avoids the import of std::cell as well.

@bluss
Copy link
Member

bluss commented May 26, 2015

Using static functions sounds great unless there is any plan to expand UFCS and make them resolve with method syntax.

@SimonSapin
Copy link
Contributor Author

I would personally rather see this as &T -> &U (aka as zero-cost as possible) and explore an ergonomic transformation of Ref<Option> to Option<Ref> later on.

I agree that using a &T -> Option<&U> closure as proposed is not a great solution, but Ref<Option<T>> doesn’t actually work:

#![feature(core)]
use std::cell::*;

// So that I don’t have to re-bootrap...
fn simplified_map_ref<'b, T, U, F>(orig: Ref<'b, T>, f: F) -> Ref<'b, U>
where F: FnOnce(&T) -> &U {
    map_ref(orig, move |v| Some(f(v))).unwrap()
}

fn main() {
    let x: RefCell<Result<u32, ()>> = RefCell::new(Ok(5));
    simplified_map_ref(x.borrow(), |r| &r.ok());
}
a.rs:11:41: 11:47 error: borrowed value does not live long enough
a.rs:11     simplified_map_ref(x.borrow(), |r| &r.ok());
                                                ^~~~~~
note: in expansion of closure expansion
a.rs:11:36: 11:47 note: expansion site
a.rs:11:40: 11:47 note: reference must be valid for the anonymous lifetime #1 defined on the block at 11:39...
a.rs:11     simplified_map_ref(x.borrow(), |r| &r.ok());
                                               ^~~~~~~
a.rs:11:40: 11:47 note: ...but borrowed value is only valid for the block at 11:39
a.rs:11     simplified_map_ref(x.borrow(), |r| &r.ok());
                                               ^~~~~~~
error: aborting due to previous error

@sfackler
Copy link
Member

I like the idea of using static methods on Arc etc instead of free functions in the arc module. We should make sure that future UFCS work won't make those methods callable with the dot syntax though.

@SimonSapin
Copy link
Contributor Author

We should make sure that future UFCS work won't make those methods callable with the dot syntax though.

If we decide to do this I don’t see an obstacle to this: we can tell such functions apart from methods by them not using self.

@SimonSapin
Copy link
Contributor Author

As discussed on IRC with @alexcrichton, I modified the PR to have both map functions that take &T -> &U closures, and filter_map that take &T -> Option<&U>. The naming is similar to Iterator methods.

#[unstable(feature = "core", reason = "recently added")]
#[inline]
pub fn map<U: ?Sized, F>(orig: RefMut<'b, T>, f: F) -> RefMut<'b, U>
where F: FnOnce(&mut T) -> &mut U {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistically we tend to format functions like this as:

pub fn foo()
    where ...
{
    // ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@SimonSapin
Copy link
Contributor Author

Here is an attempt at generalizing filter_map:

impl<'b: 'c, 'c, T: ?Sized> Ref<'b, T> {
    pub fn generalized_map<U: ?Sized, F, R>(orig: Ref<'b, T>, f: F) -> R
    where F: FnOnce(&T, &FnOnce(&'c U) -> Ref<'c, U>) -> R {
        f(orig._value, &move |new| Ref {
            _value: new,
            _borrow: orig._borrow,
       })
    }
}

But using it doesn’t borrow-check:

    let x: RefCell<Result<u32, ()>> = RefCell::new(Ok(5));
    let b: Option<Ref<u32>> = Ref::generalized_map(x.borrow(), |r, new_ref| match *r {
        Ok(ref value) => Some(new_ref(value)),
        Err(_) => None,
    });
    assert_eq!(*b.unwrap(), 5);
}

I’m not sure if I’m doing something wrong or if this is not possible to express in current Rust. Regardless, a function that takes a closure that takes another closure is quite convoluted, not a great API.

/// This is an associated function that needs to be used as `Ref::clone(...)`.
/// A `Clone` implementation or a method would interfere with the widespread
/// use of `r.borrow().clone()` to clone the contents of a `RefCell`.
#[unstable(feature = "core",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these features switch to something other than "core"? Something like cell_extras would be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@alexcrichton
Copy link
Member

This API looks good to me, and I also believe the implementation is sound, so I'm all good with this!

I'd like to hear more opinions about Type::foo as opposed to std::type::foo as well as how fancy we want to get with the extra methods on Ref{,Mut}.

@aturon
Copy link
Member

aturon commented May 28, 2015

+1 to the basic idea here.

@alexcrichton

I'd like to hear more opinions about Type::foo as opposed to std::type::foo

This one is tricky. In the limit, with full inherent associated items, many "single type" modules could be flattened to just the type, and this is one step further in that direction. The main advantage is that you're usually importing the type already, so this saves you an additional import of the module/function. To be honest, I think it's inevitable that APIs will drift in this direction, and don't see much reason to fight it in std.

(In the long run, this means that true free functions are predominantly going to be code that is not clearly associated with any particular type.)

as well as how fancy we want to get with the extra methods on Ref{,Mut}.

I think the level of fanciness in the current PR is appropriate.

@alexcrichton
Copy link
Member

Ok, thanks again for the PR @SimonSapin! r=me with a squash

@SimonSapin SimonSapin changed the title Add std::cell::map_ref Add std::cell::Ref::map and friends May 28, 2015
…::Ref

... and generalize the bounds on the value type.
@SimonSapin
Copy link
Contributor Author

Squashed into two commits.

@alexcrichton
Copy link
Member

@bors: r+ 64e72b0

@bors
Copy link
Contributor

bors commented May 28, 2015

⌛ Testing commit 64e72b0 with merge b597347...

@bors
Copy link
Contributor

bors commented May 29, 2015

💔 Test failed - auto-linux-64-nopt-t

@SimonSapin
Copy link
Contributor Author

Oh noes.

Should be fixed, with this diff before squashing:

diff --git a/src/libcoretest/lib.rs b/src/libcoretest/lib.rs
index 78c7215..3d14b3f 100644
--- a/src/libcoretest/lib.rs
+++ b/src/libcoretest/lib.rs
@@ -24,6 +24,7 @@
 #![feature(step_by)]
 #![feature(slice_patterns)]
 #![feature(float_from_str_radix)]
+#![feature(cell_extras)]

 extern crate core;
 extern crate test;

@alexcrichton
Copy link
Member

@bors: r+ d0afa6e

@bors
Copy link
Contributor

bors commented May 29, 2015

⌛ Testing commit d0afa6e with merge 25fc917...

bors added a commit that referenced this pull request May 29, 2015
For slightly complex data structures like `rustc_serialize::json::Json`, it is often convenient to have helper methods like `Json::as_string(&self) -> Option<&str>`  that return a borrow of some component of `&self`.

However, when `RefCell`s are involved, keeping a `Ref` around is required to hold a borrow to the insides of a `RefCell`. But `Ref` so far only references the entirety of the contents of a `RefCell`, not a component. But there is no reason it couldn’t: `Ref` internally contains just a data reference and a borrow count reference. The two can be dissociated.

This adds a `map_ref` function that creates a new `Ref` for some other data, but borrowing the same `RefCell` as an existing `Ref`.

Example:

```rust
struct RefCellJson(RefCell<Json>);

impl RefCellJson {
    fn as_string(&self) -> Option<Ref<str>> {
        map_ref(self.borrow(), |j| j.as_string())
    }
}
```

r? @alexcrichton
@bors bors merged commit d0afa6e into rust-lang:master May 29, 2015
@SimonSapin SimonSapin deleted the map_ref branch May 29, 2015 12:10
XMPPwocky pushed a commit to XMPPwocky/rust that referenced this pull request May 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants