switch sample_single int to O'neill's modulo method & update gen_rang… #1154

TheIronBorn · 2021-07-27T23:56:28Z

Fixes #1145 by using the modulo-widening-multiplication method while shortcutting modulo in many cases. Specifically the two shortcuts proposed by O'Neill here.

See this chart for 16-bit rejection/acceptance probabilities and modulo calculation chance:

At some ranges below one third the full size the rejection probabilities are equal and so the modulo may slow things down but the fewer rejections of other low ranges more than make up for it. Note that above one-third of the range size no modulo is ever performed and the rejection chances are equal. Note also that we use 32-bits internally for 8/16 bits so this chart isn't comparable for those.

This is around 2.5x faster than the previous method for lower ranges and about 0.92x slower for high ranges, discounting 8/16 bits. It seems the the extra branches slow things down slightly when using the same rejection chance. The 128-bit modulo also seems to slow things down a bit. For 8/16 bits the high ranges are not so affected due to the 32-bit internals.

Benchmarks:

 name       lz_apx ns/iter          shortcut_modulo ns/iter   diff ns/iter   diff %  speedup
-i128_high  2,365,025 (676 MB/s)    2,889,420 (553 MB/s)           524,395   22.17%   x 0.82
+i128_low   1,771,940 (902 MB/s)    1,297,330 (1233 MB/s)         -474,610  -26.78%   x 1.37
+i16_high   683,815 (292 MB/s)      281,753 (709 MB/s)            -402,062  -58.80%   x 2.43
+i16_low    676,730 (295 MB/s)      253,068 (790 MB/s)            -423,662  -62.60%   x 2.67
+i32_high   1,305,030 (306 MB/s)    1,302,990 (306 MB/s)            -2,040   -0.16%   x 1.00
+i32_low    839,545 (476 MB/s)      238,871 (1674 MB/s)           -600,674  -71.55%   x 3.51
-i64_high   1,327,230 (602 MB/s)    1,414,850 (565 MB/s)            87,620    6.60%   x 0.94
+i64_low    849,788 (941 MB/s)      242,690 (3296 MB/s)           -607,098  -71.44%   x 3.50
+i8_high    675,245 (148 MB/s)      241,050 (414 MB/s)            -434,195  -64.30%   x 2.80
+i8_low     675,745 (147 MB/s)      236,577 (422 MB/s)            -439,168  -64.99%   x 2.86

I also changed the gen_range_high benchmarks to use a more sensible region of the integer, just above half where both rejection chances are highest. The old ones didn't seem to have any pattern to them. For 8/16 bits we could try to use ranges even closer to the 8/16 bit max for more rejections thanks to 32-bit internals.

This breaks value stability of course. I wasn't sure if I should update those tests with new values.

And here's a Criterion benchmark as well:

…e_high benchmarks doc shortcutting sample_single int method mention which methods

vks · 2021-07-28T09:20:16Z

This breaks the value-stability tests, so they would need to be updated.

The performance gains look good! How does it compare to just using Uniform?

newpavlov · 2021-07-28T10:00:12Z

Broken value stability means that we should release it only in rand v0.9, no? It looks like we don't have an explicit value stability policy yet, so we probably should add it as well.

vks · 2021-07-28T10:44:03Z

@newpavlov Here is our policy. I don't quite understand what is meant by "portable item", but it sounds like this PR would be considered a breaking change.

newpavlov · 2021-07-28T11:00:55Z

@vks
Ah, yes. I thought I've seen it somewhere, but didn't get as far as the book. We probably should mention it in the rand's README.

I don't quite understand what is meant by "portable item"

I think it was an attempt to introduce two classes of deterministic algorithms: value stable and not. For example, into the latter class probably should fall algorithms working with floats, since we can not provide bit-level value stability for them. Floats being floats and all. But it's not clear where exactly we define class for an algorithm (maybe we should introduce yet another marker trait?).

dhardy · 2021-07-28T12:44:54Z

I don't quite understand what is meant by "portable item"

It's not well worded I agree. Possibly "item" should be replaced with "method".

But, yes, this would be considered a breaking change, and yes the test results need to be updated.

@newpavlov see the last section on that page:

... However, we strive to make them as portable as possible.

TheIronBorn · 2021-07-28T18:09:12Z

Compared to just Uniform

 name       Uniform ns/iter       gen_range ns/iter    diff ns/iter   diff %  speedup
+i128_high  6,775,080 (236 MB/s)  3,057,767 (523 MB/s)   -3,717,313  -54.87%   x 2.22
+i128_low   5,291,480 (302 MB/s)  1,287,980 (1242 MB/s)  -4,003,500  -75.66%   x 4.11
+i16_high   691,000 (289 MB/s)    286,856 (697 MB/s)       -404,144  -58.49%   x 2.41
+i16_low    701,163 (285 MB/s)    262,461 (762 MB/s)       -438,702  -62.57%   x 2.67
+i32_high   1,748,010 (228 MB/s)  1,390,462 (287 MB/s)     -357,548  -20.45%   x 1.26
+i32_low    674,240 (593 MB/s)    227,620 (1757 MB/s)      -446,620  -66.24%   x 2.96
+i64_high   2,415,940 (331 MB/s)  1,478,100 (541 MB/s)     -937,840  -38.82%   x 1.63
+i64_low    1,245,925 (642 MB/s)  255,013 (3137 MB/s)      -990,912  -79.53%   x 4.89
+i8_high    703,913 (142 MB/s)    247,025 (404 MB/s)       -456,888  -64.91%   x 2.85
+i8_low     713,518 (140 MB/s)    244,256 (409 MB/s)       -469,262  -65.77%   x 2.92

dhardy

Looks fine to me. Thanks @TheIronBorn.

We should hold off on merging until we're preparing to release 0.9 (or 1.0 as the case may be).

dhardy

[Ignore]

dhardy

This modifies sample_single_inclusive but not sample.

@TheIronBorn

This is based on @TheIronBorn's work (#1154, #1172), with some changes.

vks · 2022-02-22T17:56:03Z

@dhardy Can we fix sample and merge this, or should we go for #1196 instead?

dhardy · 2022-02-22T18:56:41Z

I'm (still) working on #1196, which will replace this. I'd like to leave it open for now as a reference (though I think we won't end up using this code for any of the variants).

@TheIronBorn

This is based on @TheIronBorn's work (#1154, #1172), with some changes.

@TheIronBorn

This is based on @TheIronBorn's work (#1154, #1172), with some changes.

@TheIronBorn

This is based on @TheIronBorn's work (#1154, #1172), with some changes.

switch sample_single int to O'neill's modulo method & update gen_rang…

e783cb0

…e_high benchmarks doc shortcutting sample_single int method mention which methods

dhardy added the B-value Breakage: changes output values label Jul 28, 2021

update stability test values

c5f0150

dhardy approved these changes Jul 29, 2021

View reviewed changes

remove reference to old benchmarks

a69d6cc

dhardy approved these changes Aug 7, 2021

View reviewed changes

vks mentioned this pull request Aug 24, 2021

Tracker: rand 0.9 #1165

Open

23 tasks

vks added this to the 0.9 release milestone Aug 24, 2021

This was referenced Aug 24, 2021

New publications on integer range sampling and shuffling #570

Closed

new UniformInt bound checks #592

Open

dhardy mentioned this pull request Sep 3, 2021

Look into this alleged optimal algorithm for bounded random integers #1172

Closed

dhardy approved these changes Sep 28, 2021

View reviewed changes

dhardy requested changes Sep 28, 2021

View reviewed changes

dhardy added a commit that referenced this pull request Oct 20, 2021

Impl ONeill, Canon, Canon-Lemire and Bitmask methods for integer types

11c9f2a

This is based on @TheIronBorn's work (#1154, #1172), with some changes.

dhardy mentioned this pull request Oct 20, 2021

WIP: Investigate Canon's uniform-integer sampling method #1196

Closed

dhardy added a commit that referenced this pull request Feb 24, 2022

Impl ONeill, Canon, Canon-Lemire and Bitmask methods for integer types

0584f2c

This is based on @TheIronBorn's work (#1154, #1172), with some changes.

dhardy added a commit that referenced this pull request Feb 24, 2022

Impl ONeill, Canon, Canon-Lemire and Bitmask methods for integer types

00869d7

This is based on @TheIronBorn's work (#1154, #1172), with some changes.

dhardy added a commit that referenced this pull request Feb 28, 2022

UniformInt(single): add ONeill, Canon, Canon-Lemire and Bitmask methods

a87f4e9

This is based on @TheIronBorn's work (#1154, #1172), with some changes.

dhardy mentioned this pull request Feb 17, 2023

Uniform sampling: use Canon's method #1287

Merged

dhardy closed this Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

switch sample_single int to O'neill's modulo method & update gen_rang… #1154

switch sample_single int to O'neill's modulo method & update gen_rang… #1154

TheIronBorn commented Jul 27, 2021

vks commented Jul 28, 2021 •

edited

Loading

newpavlov commented Jul 28, 2021

vks commented Jul 28, 2021

newpavlov commented Jul 28, 2021

dhardy commented Jul 28, 2021 •

edited

Loading

TheIronBorn commented Jul 28, 2021

dhardy left a comment

dhardy left a comment •

edited

Loading

dhardy left a comment

vks commented Feb 22, 2022

dhardy commented Feb 22, 2022

switch sample_single int to O'neill's modulo method & update gen_rang… #1154

switch sample_single int to O'neill's modulo method & update gen_rang… #1154

Conversation

TheIronBorn commented Jul 27, 2021

vks commented Jul 28, 2021 • edited Loading

newpavlov commented Jul 28, 2021

vks commented Jul 28, 2021

newpavlov commented Jul 28, 2021

dhardy commented Jul 28, 2021 • edited Loading

TheIronBorn commented Jul 28, 2021

dhardy left a comment

Choose a reason for hiding this comment

dhardy left a comment • edited Loading

Choose a reason for hiding this comment

dhardy left a comment

Choose a reason for hiding this comment

vks commented Feb 22, 2022

dhardy commented Feb 22, 2022

vks commented Jul 28, 2021 •

edited

Loading

dhardy commented Jul 28, 2021 •

edited

Loading

dhardy left a comment •

edited

Loading