Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rng::iter #275

Closed
wants to merge 3 commits into from
Closed

Rng::iter #275

wants to merge 3 commits into from

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Mar 1, 2018

Add pseudo-iterator over RNGs

This is basically a copy from my old branch plus some doc/test tweaks, plus an extra comment about why we can't iterate over reference types directly (without redundant referencing — yes I tried).

  • where do the extra warnings come from?
  • add some kind of alphanumeric distribution and deprecate gen_ascii_chars — simple but requires deciding what kind of such distributions we want (e.g. something similar to regex range types?), which didn't happen yet.
  • benchmarks

@pitdicker
Copy link
Contributor

TODO 1: where do the extra warnings come from?

Which warnings?

TODO 2: add some kind of alphanumeric distribution and deprecate gen_ascii_chars — simple but requires deciding what kind of such distributions we want (e.g. something similar to regex range types?), which didn't happen yet.

We had an issue about it, dhardy#73, but not a lot of discussion. Personally I would not add any more character distributions besides Uniform (which we have now) and Alphanumeric. I see limited use, but quite some complexity because of Unicode.

@pitdicker
Copy link
Contributor

To be honest I don't follow most of the code here, I know too little about iterators yet. @burdges, I think you are much more expert to review here.

I imagined an iterator to do nothing more than repeatedly calling rng.next_u32 or rng.next_u64. In what situations is this 'iterator' more convenient to use than the functions we already have?

@dhardy
Copy link
Member Author

dhardy commented Mar 2, 2018

The old rng.gen_iter() is essentially the new rng.iter().map(|rng| rng.gen()), so for that use it is less useful. But the new iterator is more flexible allowing things like sampling a given distribution (the "die" example) or applying a custom function to the RNG. The new code also allows some other iterator tricks, e.g. flat_map.

I guess we could make gen_iter() a wrapper around the new code and keep both, if preferred.

I don't know, but one significant advantage is that this "feels" like a lower-level iterator, i.e. on the RNG itself, despite the fact that we can't do that — but alternatively, we might be able to implement a streaming iterator I guess.

@burdges
Copy link
Contributor

burdges commented Mar 2, 2018

I'm dubious about this reimplement standard iterator methods approach.

I'm also confused what roll this plays: What are you trying to achieve? Are you wanting an Iterator that produces Rngs for some reason? Or does your iterator use an Rng to return samples?

@burdges
Copy link
Contributor

burdges commented Mar 2, 2018

If you want an iterator that creates values of a type T: Rand or whatever, then iter::repeat(()).map(|()| rng.gen()) works, right? Adding any variation on Rand and gen works too, right? In this, the closure mutably borrows rng for the duration of the iterator since Iterator::map takes an FnMut.

You want an iterator that returns Rngs for some reason? I think that requires first knowing if your code should ever behave predictably. If so, then each iteration must spawn a new ChaChaRng or whatever. If not, then just use either Rc<RefCell<R>> or Arc<Mutex<R>> where R: Rng, and then either wrap it in an iterator or forget about iteration entirely.

@dhardy
Copy link
Member Author

dhardy commented Mar 2, 2018

The role this plays is to replace gen_ascii_chars and possibly gen_iter which are less flexible; though maybe the latter is convenient enough to be worth keeping. No, I have no interest in producing RNGs via iterator.

Good point about use of iter::repeat(()).map...; maybe we can just remove this instead.

@dhardy
Copy link
Member Author

dhardy commented Mar 3, 2018

Rebased on top of #279.

@pitdicker
Copy link
Contributor

I like the iter::repeat(()).map... approach more to be honest.

But I am curious if there is a meaningful difference in the code they generate. Can you maybe do some benchmarks using both methods and a few different distributions (uniform, Range, Alphanumeric)?

@dhardy
Copy link
Member Author

dhardy commented Mar 6, 2018

Benchmarks show a significant improvement over the old gen_iter and a minor improvement over iter::repeat (no idea why), but still a long way short of the optimal fill_bytes (this is partly inevitable, since iteration calls the distribution/generator independently for each value, and must do so for any non-trivial distribution; it's also partly because fill_bytes doesn't do the allocation the other benches do).

test gen_1k_fill                     ... bench:         286 ns/iter (+/- 17) = 3580 MB/s
test gen_1k_gen_iter                 ... bench:       1,368 ns/iter (+/- 7) = 748 MB/s
test gen_1k_iter1                    ... bench:       1,100 ns/iter (+/- 8) = 930 MB/s
test gen_1k_iter2                    ... bench:       1,099 ns/iter (+/- 23) = 931 MB/s
test gen_1k_iter_repeat              ... bench:       1,133 ns/iter (+/- 26) = 903 MB/s

@dhardy
Copy link
Member Author

dhardy commented Mar 6, 2018

Wait... using Rng::fill I also get approx 900 MB/s.

I think the benchmark above is wrong: transmuting a slice presumably keeps the same length counter, but since u8 is 1/4 the size of u32 it results in effectively 1/4 the length (which brings the results for gen_1k_fill above in line with the other results).

Now I'm only confused why my iterator code is slightly faster than everything else including fill_bytes... not very significant however.

@dhardy
Copy link
Member Author

dhardy commented Mar 6, 2018

Updated benchmarks (here fill is the only variant not allocating a new Vec each round; no idea what's up with performance; also sometimes all iterator variants are approx 6% slower without affecting the fill bench):

test gen_1k_fill                     ... bench:       1,116 ns/iter (+/- 8) = 917 MB/s
test gen_1k_gen_iter                 ... bench:       1,343 ns/iter (+/- 9) = 762 MB/s
test gen_1k_iter1                    ... bench:       1,099 ns/iter (+/- 6) = 931 MB/s
test gen_1k_iter2                    ... bench:       1,099 ns/iter (+/- 6) = 931 MB/s
test gen_1k_iter_repeat              ... bench:       1,133 ns/iter (+/- 5) = 903 MB/s

@pitdicker
Copy link
Contributor

One issue that may cause the lower numbers is rust-lang/rust#47062 (and rust-lang/rust#47665). So I first set export RUSTFLAGS="-C codegen-units=1" before running benchmarks.

That fill_bytes is not faster than the other methods seems reasonable to me, because the implementation of fill_bytes for Xorshift is not doing much more than repeatedly calling next_u32, and some logic the handle odd-length byte slices.

Benchmarks from my PC:

test gen_bytes_xorshift              ... bench:   1,132,738 ns/iter (+/- 9,645) = 904 MB/s
test gen_u32_xorshift                ... bench:       1,375 ns/iter (+/- 11) = 2909 MB/s
test gen_u64_xorshift                ... bench:       2,639 ns/iter (+/- 30) = 3031 MB/s

test gen_1k_fill                     ... bench:       1,134 ns/iter (+/- 12) = 902 MB/s
test gen_1k_gen_iter                 ... bench:         845 ns/iter (+/- 11) = 1211 MB/s
test gen_1k_iter1                    ... bench:       1,119 ns/iter (+/- 13) = 915 MB/s
test gen_1k_iter2                    ... bench:       1,122 ns/iter (+/- 16) = 912 MB/s
test gen_1k_iter_repeat              ... bench:       1,076 ns/iter (+/- 10) = 951 MB/s

And the same, but now with HC-128 (because there fill_bytes is faster):

test gen_bytes_hc128                 ... bench:     455,387 ns/iter (+/- 2,369) = 2248 MB/s
test gen_u32_hc128                   ... bench:       2,827 ns/iter (+/- 10) = 1414 MB/s
test gen_u64_hc128                   ... bench:       4,379 ns/iter (+/- 2) = 1826 MB/s

test gen_1k_fill                     ... bench:         451 ns/iter (+/- 3) = 2270 MB/s
test gen_1k_gen_iter                 ... bench:       1,212 ns/iter (+/- 2) = 844 MB/s
test gen_1k_iter1                    ... bench:         937 ns/iter (+/- 4) = 1092 MB/s
test gen_1k_iter2                    ... bench:         964 ns/iter (+/- 6) = 1062 MB/s
test gen_1k_iter_repeat              ... bench:         762 ns/iter (+/- 3) = 1343 MB/s

@dhardy
Copy link
Member Author

dhardy commented Mar 7, 2018

Very similar results with f32 output:

# weak_rng (Xorshift):
test gen_1k_f32_gen_iter             ... bench:       1,440 ns/iter (+/- 71) = 711 MB/s                                                
test gen_1k_f32_iter                 ... bench:       1,186 ns/iter (+/- 51) = 863 MB/s                                                
test gen_1k_f32_iter_repeat          ... bench:       1,219 ns/iter (+/- 55) = 840 MB/s                                                
test gen_1k_fill                     ... bench:       1,131 ns/iter (+/- 22) = 905 MB/s                                                
test gen_1k_gen_iter                 ... bench:       1,422 ns/iter (+/- 152) = 720 MB/s                                               
test gen_1k_iter                     ... bench:       1,180 ns/iter (+/- 46) = 867 MB/s                                                
test gen_1k_iter_repeat              ... bench:       1,216 ns/iter (+/- 40) = 842 MB/s    

# StdRng (HC-128):
test gen_1k_f32_gen_iter             ... bench:       2,332 ns/iter (+/- 117) = 439 MB/s
test gen_1k_f32_iter                 ... bench:       1,924 ns/iter (+/- 128) = 532 MB/s
test gen_1k_f32_iter_repeat          ... bench:       1,838 ns/iter (+/- 117) = 557 MB/s
test gen_1k_fill                     ... bench:         441 ns/iter (+/- 30) = 2321 MB/s
test gen_1k_gen_iter                 ... bench:       2,359 ns/iter (+/- 125) = 434 MB/s
test gen_1k_iter                     ... bench:       2,158 ns/iter (+/- 76) = 474 MB/s
test gen_1k_iter_repeat              ... bench:       1,939 ns/iter (+/- 104) = 528 MB/s

I guess this means we should deprecate gen_iter and document usage of iter::repeat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants