Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Distribution for Range(Inclusive) #821

Closed
wants to merge 3 commits into from

Conversation

Ralith
Copy link
Contributor

@Ralith Ralith commented Jun 10, 2019

This enables the convenient syntax of rng.sample(0..10), and hence provides a non-disruptive fix to #744. If people think this is a good idea I should update the documentation to mention sample_single_inclusive more, but I figured I'd get feedback first.

@vks
Copy link
Collaborator

vks commented Jun 11, 2019

I agree that this idea is an elegant fix for #744. I have the following comments:

  • This PR does not use sample_single_inclusive to provide more efficient implementations. Does it make sense to add it to the API surface if we don't use it? Personally, I think we should only add this if we implement more efficient code like for sample_single.
  • If we want to add this convenience-only API, it should be documented more prominently.
  • I could see this replacing gen_range, allowing us to deprecate it.

@dhardy
Copy link
Member

dhardy commented Jun 11, 2019

I'll add a few more comments:

  • This makes some ranges support Distribution, thus code like the following becomes legal, but with poor documentation, e.g. nothing to indicate that this will be slower than Uniform for several samples:
let distr = 0..20;
rng.sample(distr);
  • Since it makes Range and RangeInclusive support Distribution, it begs the question whether the other ranges should be supported. (Note that not all support Iterator and behaviour varies a bit anyway.)
    • RangeFull could be a synonym for Standard
    • RangeTo and RangeToInclusive could become the gen_index(ubound) syntax some people previously asked for (note: these are not usable as iterators since "they have no starting point" even though usage with slices always means "start from 0")
    • RangeFrom is a questionable thing to support, given that semantically it appears to have no upper bound but obviously we do have one (though see docs regarding multiple possible behaviours on overflow of iterators)

It's certainly an intriguing idea and as @vks says provides a path to deprecate gen_range.

@dhardy
Copy link
Member

dhardy commented Jun 11, 2019

In a nutshell, it means we could support things like the example below, which is probably easy to learn for newcomers (and existing users updating to future versions), but has the drawback that we don't have a very good place for documentation. @steveklabnik you must have had plenty of exposure to documentation of things like slice syntax; what's your take on this?

use rand::Rng;
let rng = rand::thread_rng();
let c: char = rng.sample(..);  // equivalent to rng.gen() and rng.sample(Standard), so redundant
let die_roll = rng.sample(1..=6);  // instead of rng.gen_range(1, 7)
let seq = /* some slice/array */;
let i = rng.sample(..seq.len());  // random index; note that we already support seq.choose(&mut rng)

@Ralith
Copy link
Contributor Author

Ralith commented Jun 11, 2019

This PR does not use sample_single_inclusive to provide more efficient implementations. Does it make sense to add it to the API surface if we don't use it? Personally, I think we should only add this if we implement more efficient code like for sample_single.

It's nice to have a consistent API wrt. the non-inclusive case, although if defaulting means the method can be added in the future without breaking semver I have no strong opinion here. If banging out more efficient impls based on the existing ones is straightforward I could have a go.

If we want to add this convenience-only API, it should be documented more prominently.

Agreed, I'm happy to do this if the change is otherwise desired.

nothing to indicate that this will be slower than Uniform for several samples

I think this is the main drawback. However, that applies equally to gen_range or any other hypothetical replacement of it, and I think the documentation would be clear if hung off sample.

RangeFull could be a synonym for Standard

This seems confusing. RangeFull specifically represents all values, and Standard does not. Additionally, gen is already a more concise way to sample Standard.

RangeTo and RangeToInclusive could become the gen_index(ubound) syntax some people previously asked for

What would the type constraint be in this case? Would we need a new trait for types with a lower bound? Bearing in mind that that lower bound is often nonzero, though I'm not sure if there's much of a use case for generating e.g. a value in i32::min_value()..4.

RangeFrom is a questionable thing to support

I think this is symmetrical with RangeTo, and whatever decision is made should be apply to both.

@dhardy
Copy link
Member

dhardy commented Jun 11, 2019

If banging out more efficient impls based on the existing ones is straightforward I could have a go.

It should be easy enough but we probably won't merge this PR soon in any case since 0.7 has seen enough changes; it could make a patch release after 0.7 though. Implementation should consider potential refactoring from deprecating and later removing gen_range. Possibly we would replace sample_single with a generic fn over RangeBounds, though also possibly not.

I think the documentation would be clear if hung off sample.

That's the best I can think of too, however it doesn't cover everything (e.g. (a..b).sample(&mut rng)) uses the other sample fn.

What would the type constraint be in this case?

To reduce confusing it makes sense only to support unsigned ints with RangeTo, if we do support it.

I think this is symmetrical with RangeTo

RangeFrom supports usage as an Iterator while RangeTo doesn't, so I don't agree that it is symmetrical.

@vks
Copy link
Collaborator

vks commented Jun 11, 2019

@dhardy

let die_roll = rng.gen(1..=6); // instead of rng.gen_range(1, 7)

I think you meant rng.sample(1..=6);.

This is an important example, since it prominently shows up in the Rust book.

@Ralith
Copy link
Contributor Author

Ralith commented Jun 11, 2019

Ranges with open endpoints could in general be supported gracefully by the introduction of something like

trait Bounded {
    const MIN: Self;
    const MAX: Self;
}

with impls for all the obvious things, but this would be a significant expansion of the scope of the change.

@dhardy dhardy added the P-postpone Waiting on something else label Jun 16, 2019
src/distributions/uniform.rs Outdated Show resolved Hide resolved
@Ralith
Copy link
Contributor Author

Ralith commented Jul 1, 2019

Rebased and added a little documentation.

@vks
Copy link
Collaborator

vks commented Jul 1, 2019

@dhardy I think we can merge this now? In my opinion, only the currently not really used sample_single_inclusive is a bit controversial.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, @vks. I'm still not 100% sure on whether we should go this route — it looks reasonable, but:

  • If we just remove gen_range some users will likely struggle to find its replacement
  • Keeping gen_range too is redundant; I think ultimately we don't want both?
  • Simply adjusting gen_range may be better: users will have to adjust code on update, but will see a clear error message which should make this easy. Additionally this gives us a better place to document usage of sample_single vs Uniform::new.

src/distributions/uniform.rs Outdated Show resolved Hide resolved
B2: SampleBorrow<Self::X> + Sized
{
let uniform: Self = UniformSampler::new_inclusive(low, high);
uniform.sample(rng)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make more sense to convert sample_single into sample_single_inclusive, then implement sample_single via sample_single_inclusive. At any rate, we should have an optimised implementation of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implement sample_single via sample_single_inclusive

Is there a way to do this without introducing an awkward where B2: From<u8> + Sub<B2> constraint?

src/distributions/uniform.rs Outdated Show resolved Hide resolved
@Ralith
Copy link
Contributor Author

Ralith commented Jul 2, 2019

If we just remove gen_range some users will likely struggle to find its replacement

This can be addressed with an (arbitrarily long) deprecation period, with a helpful message explaining what to do instead.

@dhardy
Copy link
Member

dhardy commented Jul 2, 2019

only the currently not really used sample_single_inclusive is a bit controversial.

I think we should try adapting sample_single to this as suggested, but benchmark (with half-exclusive ranges) to test the impact. If really necessary we could have two implementations.

@vks
Copy link
Collaborator

vks commented Jul 2, 2019

If we just remove gen_range some users will likely struggle to find its replacement
Keeping gen_range too is redundant; I think ultimately we don't want both?

Isn't this solved by deprecation, as @Ralith suggested? I'm not so worried about documentation, we can make the sample(0..10) example very prominent, and it will even be part of the Rust book eventually.

Simply adjusting gen_range may be better: users will have to adjust code on update, but will see a clear error message which should make this easy. Additionally this gives us a better place to document usage of sample_single vs Uniform::new.

I think the biggest drawback of this approach is that it does not allow a deprecation period.

@dhardy
Copy link
Member

dhardy commented Jul 2, 2019

I'm not so worried about documentation, we can make the sample(0..10) example very prominent

I mean documentation of the alternative: Uniform::new

I think the biggest drawback of this approach is that it does not allow a deprecation period.

True. I'm not hugely worried about this given that an upgrade often requires compile-fixes anyway and replacing gen_range(a, b) with gen_range(a..b) via regex is not hard.

I think we should give more weight to arguments about the final design than to arguments about the migration.

@vks
Copy link
Collaborator

vks commented Jul 2, 2019

I think we should give more weight to arguments about the final design than to arguments about the migration.

Fair enough. Another advantage of gen_range(a..b) over sample(a..b) is that we don't have to introduce a suboptimal distribution that is redundant to Uniform.

@dhardy
Copy link
Member

dhardy commented Jul 2, 2019

To what suboptimal distribution are you referring? This PR doesn't include any new distribution objects, and we need sample_single either way (and probably do want to add sample_single_inclusive).

@vks
Copy link
Collaborator

vks commented Jul 2, 2019

This PR doesn't include any new distribution objects

I think it does: impl<T: SampleUniform> Distribution<T> for Range<T>

@Ralith
Copy link
Contributor Author

Ralith commented Jul 9, 2019

Another advantage of gen_range(a..b) over sample(a..b) is that we don't have to introduce a suboptimal distribution that is redundant to Uniform.

Aren't these more efficient than Uniform if only a single sample is taken? Admittedly that may not be in the spirit of the Distribution trait, but they're not strictly redundant.

@dhardy
Copy link
Member

dhardy commented Jul 9, 2019

Rng::gen_range is just a wrapper around UniformSampler::sample_single at the moment. sample_single is (at for some types) more efficient for single samples, yes.

@dhardy dhardy mentioned this pull request Sep 16, 2019
19 tasks
@dhardy
Copy link
Member

dhardy commented Mar 9, 2020

Lets make another stab at resolving this in time for the 0.8 release.

Goals

  • support rng.something(a..b) syntax
  • support rng.something(..b) syntax for types T: Zero (which Zero trait we use is off-topic for now)
  • support rng.something(a..=b) and rng.something(..=b) variants

Approach one: gen_range

Change gen_range to take Range* arguments.

  • this is a breaking change: although the fix is a simple regex, this affects a lot of code

Approach two: sample

Support rng.sample(a..b) etc. (this PR).

  • let distr = a..b; let x = rng.sample(distr); is legal since a..b is a distribution, however...
  • use of Range and RangeTo as distribution objects is sub-optimal for multiple samples; usually we try to support generation of an optimal distribution object where used (although this has exceptions, e.g. existence of two WeightedIndex implementations)
  • it's hard to document the use of a..b as a distribution, e.g.
    • that it is sub-optimal for many samples
    • that rng.sample(..=i32::MAX) has lower-bound of 0 (alternatively, we could only support RangeTo for unsigned types)

Variant A is that we keep gen_range, even though it is redundant.

Variant B is that we deprecate gen_range. This has an issue that it impairs discoverability of range-generation after final removal of this method.

Approach three: a new method

Add a new method, and possibly deprecate gen_range later. This is a viable option, though it is preferable to keep the gen_range name.

Approach four: reject

We don't need to satisfy the above goals, though it's certainly desirable.


My conclusion after re-reading this is still that adjusting gen_range is the preferred option. We can facilitate migration to 0.8 with an update guide and include a Regex — there are a number of editors/tools supporting project-wide search-replace with regex.

@Ralith are you still interested in this? @cramertj, @TheIronBorn, @hmble also commented on #744. In case of mixed feedback or low feedback I will advertise the question elsewhere.

@vks
Copy link
Collaborator

vks commented Mar 9, 2020

There is another approach that is already supported: rng.sample(Uniform::from(a..b))

I agree that approach 1 is preferable, since brevity is the reason for the existence of gen_range. We could provide the regex to fix the code to help a bit with the churn.

@dhardy
Copy link
Member

dhardy commented Apr 6, 2020

Well, lets go with approach 1 then. To do:

  • adjust sample_single to use an inclusive range (maybe?)
  • implement new Rng::gen_range behaviour
  • document changes in the book, in particular in the migration guide

Possibly this list should go in a tracking issue, but it should be okay here. I won't start the work myself immediately, so if someone else wants to, feel free.

@vks
Copy link
Collaborator

vks commented Jul 27, 2020

I updated this PR in #1003. However, this implements approach 2. Should I change it to approach 1?

@dhardy
Copy link
Member

dhardy commented Jul 29, 2020

It seems that you and I seem to agree that 1 is, ultimately, the best option (although I suspect there will be complaints). There haven't been many other people adding their opinion, so lets just go with it.

As for budget for breaking changes, 0.7 has been stable for a long time now and for the most part users won't have compelling reasons to need to update, so a few more breaking changes probably aren't a big deal.

@vks
Copy link
Collaborator

vks commented Jul 30, 2020

Alright, #1003 now implements approach 1.

@vks vks closed this in #1003 Aug 1, 2020
@vks vks added this to the 0.8 release milestone Aug 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-postpone Waiting on something else
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants