-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move sampling methods to Enumerable #10129
Move sampling methods to Enumerable #10129
Conversation
Co-authored-by: Johannes Müller <johannes.mueller@smj-fulda.org>
Oh, needs a conflict resolution, but after that I'll merge it. Thank you! |
…evil/crystal into feature/enumerable-sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I would suggest to mention somewhere in docs about reservoir sampling. Currently I need to get into the issue discussion to understand why the equal probability statement holds.
@bcardiff Do you mean add a doc comment, or just an internal comment about reservoir sampling? The old code was also using it but it wasn't mentioned. I would mention it as an internal comment in case it needs to be updated, but I wouldn't expose this fact. You don't need to know this to use the method; if you are curious you can dig into the code; there are no alternatives; and reading "reservoir sampling" might confuse users more than help them (for instance, Ruby docs never mention algorithms). |
I was thinking of internal docs. I missed that the same algorithm was used already. |
Thinking if we could implement Algorithm L on That algorithm seems to require that |
If you're worried about it returning zero, then there's always the strategy of just trying again if it gives you zero. It's definitely correct and not a bad strategy overall, it's already being used elsewhere. |
Maybe the optimization can be undone and moved to a separate PR? Because now this needs 2 reviews again. |
a75ad77
to
bf8fd37
Compare
Thank you |
Resolves #10118. There are two caveats here:
Indexable
's optimization calls the RNG exactly once, it will produce different results than types that aren'tIndexable
, most notably the collections' own external iterators:Range(Float, Float)#sample(random)
doesn't mean the same thing as the other sampling methods, because in this case it represents a continuous uniform distribution rather than a discrete collection of items. I allowed it anyway for consistency withRange(Int, Int)
. The#sample(n, random)
overload still won't work as it requiresFloat#succ
(c.f. Float-based ranges #9339).