Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTree: remove Ord bound from new #88040

Merged
merged 1 commit into from
Sep 1, 2021
Merged

BTree: remove Ord bound from new #88040

merged 1 commit into from
Sep 1, 2021

Conversation

nbdd0121
Copy link
Contributor

@nbdd0121 nbdd0121 commented Aug 15, 2021

K: Ord bound is unnecessary on BTree{Map,Set}::new and their Default impl. No elements exist so there are nothing to compare anyway, so I don't think "future proof" would be a blocker here. This is analogous to HashMap::new not having a K: Eq + Hash bound.

#79245 originally does this and for some reason drops the change to new and Default. I can see why changes to other methods like entry or symmetric_difference need to be careful but I couldn't find out any reason not to do it on new.

Removing the bound also makes the stabilisation of const fn new not depending on const trait bounds.

cc @steffahn who suggests me to make this PR.

r? @dtolnay

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 15, 2021
Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No elements exist so there are nothing to compare anyway

I don't think "future proof" would be a blocker here

I couldn't find out any reason not to do it

This is not as clear cut as the PR description implies.

I am hesitant about this because this API change makes it impossible to selectively tackle some of the code bloat caused by monomorphizations inside the internals of these collections. According to https://www.stroustrup.com/SCARY.pdf such optimizations deliver 1.2x to 2.1x faster performance and 1x to 25x smaller code size in real applications.

I pasted below a sketch of the kind of restructure that this change hinders.

Clearly we still have bounds available via the insert and contains methods etc, so a comparator could be passed down from there as a function argument to every places that needs it, but the extra argument passing could negate a chunk of the performance benefit relative to stashing a comparator at suitable places inside of the data structure up front.

@rust-lang/libs-api @rust-lang/libs

use self::private::{Element, TypeErasedSet};
use std::cmp::Ordering;
use std::marker::PhantomData;
use std::mem;

pub struct Set<T> {
    imp: TypeErasedSet,
    marker: PhantomData<T>,
}

impl<T> Set<T> {
    pub fn new() -> Self
    where
        T: Ord,
    {
        unsafe fn erased_cmp<T: Ord>(a: &dyn Element, b: &dyn Element) -> Ordering {
            T::cmp(&*(a as *const _ as *const _), &*(b as *const _ as *const _))
        }
        Set {
            imp: unsafe { TypeErasedSet::new(erased_cmp::<T>) },
            marker: PhantomData,
        }
    }

    pub fn insert(&mut self, value: T)
    where
        T: Ord,
    {
        let boxed = Box::new(value);
        unsafe {
            let erased = mem::transmute::<Box<dyn Element>, Box<dyn Element + 'static>>(boxed);
            self.imp.insert(erased)
        }
    }

    pub fn contains(&self, value: &T) -> bool
    where
        T: Ord,
    {
        unsafe { self.imp.contains(value) }
    }
}

mod private {
    use std::cmp::Ordering;

    pub(super) trait Element {}
    impl<Sized> Element for Sized {}

    type Comparator = unsafe fn(&dyn Element, &dyn Element) -> Ordering;

    pub(super) struct TypeErasedSet {
        // Obviously you could build a more efficient representation than this.
        // Just illustrating that the concept works without a monomorphized T.
        repr: Vec<Box<dyn Element>>,
        cmp: Comparator,
    }

    impl TypeErasedSet {
        pub(super) unsafe fn new(cmp: Comparator) -> Self {
            TypeErasedSet {
                repr: Vec::new(),
                cmp,
            }
        }

        pub(super) unsafe fn insert(&mut self, value: Box<dyn Element>) {
            match self.find(&*value) {
                Ok(present) => self.repr[present] = value,
                Err(absent) => self.repr.insert(absent, value),
            }
        }

        pub(super) unsafe fn contains(&self, value: &dyn Element) -> bool {
            self.find(value).is_ok()
        }

        unsafe fn find(&self, value: &dyn Element) -> Result<usize, usize> {
            self.repr
                .binary_search_by(|elem| unsafe { (self.cmp)(&**elem, value) })
        }
    }
}

fn main() {
    let mut set = Set::new();
    set.insert(5);
    set.insert(2);
    set.insert(9);
    set.insert(4);
    set.insert(9);
    println!("{}", set.contains(&4));
    println!("{}", set.contains(&3));
}

@nbdd0121
Copy link
Contributor Author

Clearly we still have bounds available via the insert and contains methods etc, so a comparator could be passed down from there as a function argument to every places that needs it, but the extra argument passing could negate a chunk of the performance benefit relative to stashing a comparator at suitable places inside of the data structure up front.

I suppose this wouldn't be a big deal for BTreeMap, since when searching the tree we currently only take NodeRef without having the references to the BTreeMap itself. So:

  • If the comparator is to be stored in BTreeMap but not within the nodes, then extra argument is needed anyway. This would mean that it doesn't make sense to store comparator in the BTreeMap.
  • If the comparator is to be stored within nodes, then the comparator is not needed to create an empty BTreeMap because it starts with no nodes (root = None).

@steffahn
Copy link
Member

steffahn commented Aug 15, 2021

Storing a comparator in the data structure might kill covariance and ignoring that issue might be unsound. I will investigate more on that later.

Passing a comparator with every call (to insert, etc) seems more reasonable to me. Especially since you'd need to "pass down" the function anyway if it's stored at top level. Whether it originates from a read operation or from a compile-time constant passed down from one layer above in the call hierarchy doesn't seem like the most significant difference. But I don't even know where the proclaimed performance benefit in that "SCARY" paper comes from (it's a bit lengthy, havent read through much of it yet) so I can't be sure to have the full picture. (The code size reduction is obvious on the other hand.)

Passing the comparator every call also allows for more flexible API, where you could offer a version of BTreeMap (using the same underlying erased data structure) that supports a comparator dependent on short-term borrows of another data structure. E. g. a BTreeSet of indices into another Vec would be possible. (Sorted by the Order of the Vec elements).

@steffahn
Copy link
Member

steffahn commented Aug 15, 2021

@dtolnay

[…] ignoring that issue might be unsound. I will investigate more on that later.

Here’s a demonstration using your code example.

use std::any::Any;
use once_cell::sync::OnceCell;

fn main() {
    let s = Set::<Wrapper<for<'a> fn(&'a str)>>::new();
    let mut s: Set<Wrapper<fn(&'static str)>> = s;
    static S: OnceCell<&'static str> = OnceCell::new();
    let f = |s| {
        let _ = S.try_insert(s);
    };
    s.insert(Wrapper(f));
    s.contains(&Wrapper(f));
    println!("{}", S.get().unwrap());
}

// for the record, BTreeSet is indeed covariant, too
use std::collections::BTreeSet;
fn _b_tree_set_is_covariant<'a: 'b, 'b, T>(x: BTreeSet<&'a T>) -> BTreeSet<&'b T> {
    x
}

struct Wrapper<T>(T);

impl<T: 'static> Ord for Wrapper<T> {
    fn cmp(&self, _other: &Self) -> Ordering {
        if let Some(f) = (self as &dyn Any).downcast_ref::<Wrapper<for<'a> fn(&'a str)>>() {
            f.0(&String::from("Hello world!"));
        }
        Ordering::Equal
    }
}
impl<T: 'static> PartialOrd<Self> for Wrapper<T> {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}
impl<T: 'static> Eq for Wrapper<T> {}
impl<T: 'static> PartialEq<Self> for Wrapper<T> {
    fn eq(&self, other: &Self) -> bool {
        self.cmp(other).is_eq()
    }
}

(in the playground)
Example output:

���)

This unsoundness is not a small bug in your code, but an inherent problem of storing a comparison function in a covariant collection. Since BTreeSet will never lose its covariance, I don’t think your argument against removing K: Ord bounds from BTreeMap::new can be salvaged.

@Amanieu
Copy link
Member

Amanieu commented Aug 15, 2021

Incidentally hashbrown already does something similar internally (passing the comparator as an argument) but for different reasons:

  • It makes RawTable independent of the details of how a value is hashed or compared.
  • It makes the RawTable API more flexible since the comparator closure can contain borrowed data that isn't tied to the lifetime of the table.

@steffahn
Copy link
Member

steffahn commented Aug 15, 2021

Incidentally hashbrown already does something similar internally (passing the comparator as an argument)

It’s a very slight similarity only when considering the motivation discussed here (i.e. less monomorphization), since the hasher arguments are generic (impl Fn(&T) -> u64), so at run-time there’s no “passing” but instead still full monomorphization if the calls come from a wrapping HashMap (the passed function items are zero-sized types, not function pointers).

@steffahn

This comment has been minimized.

@dtolnay dtolnay added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Aug 15, 2021
@dtolnay
Copy link
Member

dtolnay commented Aug 15, 2021

@rfcbot fcp merge

#88040 (comment) is persuasive as far as the Ord impl which exists during new being totally useless at any point after new has returned, so I am on board with this.

@rfcbot
Copy link

rfcbot commented Aug 15, 2021

Team member @dtolnay has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Aug 15, 2021
@steffahn
Copy link
Member

@ anyone reviewing this, FYI:

this suggestion

cc @steffahn who suggests me to make this PR.

happened in a (short) topic on Zulip which can be found here (or in the archive).

@nbdd0121
Copy link
Contributor Author

nbdd0121 commented Aug 15, 2021

Surprised to see that covariance are not asserted, filed #88058

EDIT: It is asserted just using a different name from the other containers.

@bors
Copy link
Contributor

bors commented Aug 17, 2021

☔ The latest upstream changes (presumably #88094) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented Aug 18, 2021

☔ The latest upstream changes (presumably #86808) made this pull request unmergeable. Please resolve the merge conflicts.

@rfcbot
Copy link

rfcbot commented Aug 18, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Aug 18, 2021
@stlankes
Copy link
Contributor

stlankes commented Aug 26, 2021

Naive question... Would this a solution for issue #88244 ?

@nbdd0121
Copy link
Contributor Author

Naive question... Would this a solution for issue #88244 ?

Yes

@steffahn
Copy link
Member

steffahn commented Aug 27, 2021

I don’t think that this PR has anything to do with resolving or not resolving #88244. While it makes the particular code example compile again, there’s no reason why other trait bounds on other const functions shouldn’t have the same problem (w.r.t. usability in associated consts for trait impls). Especially the fact that it’s a change/regression in compiler behavior means that either #88244 would be closed as “working as intended” (breaking changes to unstable features are okay) or it needs a bugfix.

@rfcbot rfcbot added the finished-final-comment-period The final comment period is finished for this PR / Issue. label Aug 28, 2021
@rfcbot rfcbot added to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Aug 28, 2021
@rfcbot
Copy link

rfcbot commented Aug 28, 2021

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@m-ou-se
Copy link
Member

m-ou-se commented Aug 31, 2021

@bors r+

@bors
Copy link
Contributor

bors commented Aug 31, 2021

📌 Commit f33f266 has been approved by m-ou-se

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 31, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 1, 2021
Rollup of 9 pull requests

Successful merges:

 - rust-lang#86376 (Emit specific warning to clarify that `#[no_mangle]` should not be applied on foreign statics or functions)
 - rust-lang#88040 (BTree: remove Ord bound from new)
 - rust-lang#88053 (Fix the flock fallback implementation)
 - rust-lang#88350 (add support for clobbering xer, cr, and cr[0-7] for asm! on OpenPower/PowerPC)
 - rust-lang#88410 (Remove bolding on associated constants)
 - rust-lang#88525 (fix(rustc_typeck): produce better errors for dyn auto trait)
 - rust-lang#88542 (Use the return value of readdir_r() instead of errno)
 - rust-lang#88548 (Stabilize `Iterator::intersperse()`)
 - rust-lang#88551 (Stabilize `UnsafeCell::raw_get()`)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 5878780 into rust-lang:master Sep 1, 2021
@rustbot rustbot added this to the 1.56.0 milestone Sep 1, 2021
@nbdd0121 nbdd0121 deleted the btreemap branch September 4, 2021 02:53
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.