Page MenuHomeFreeBSD

lib/libc/gen: use Lemire's algorithm for arc4random_uniform().
ClosedPublic

Authored by fuz on Nov 18 2024, 12:53 PM.
Tags
None
Referenced Files
F106892498: D47659.id146647.diff
Tue, Jan 7, 1:24 AM
Unknown Object (File)
Mon, Jan 6, 2:30 PM
Unknown Object (File)
Mon, Jan 6, 1:14 PM
Unknown Object (File)
Mon, Jan 6, 1:03 PM
Unknown Object (File)
Mon, Jan 6, 11:58 AM
Unknown Object (File)
Mon, Jan 6, 7:41 AM
Unknown Object (File)
Mon, Jan 6, 7:14 AM
Unknown Object (File)
Mon, Jan 6, 3:26 AM
Subscribers

Details

Summary

Daniel Lemire has published a more efficient range reduction algorithm
for finding a random number in a given range without bias:
https://arxiv.org/pdf/1805.10941

This algorithm is already in use by the Go standard library:
https://cs.opensource.google/go/go/+/refs/tags/go1.23.3:src/math/rand/rand.go;l=161

Reimplement arc4random_uniform using this method.
A microbenchmark shows that performance is improved by around 22% on my Haswell box:

os: FreeBSD
arch: amd64
cpu: Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz
                   │ benchmark.out │
                   │    sec/op     │
Arc4random_uniform     56.53n ± 0%
Fast_uniform           44.00n ± 0%
geomean                49.87n

A new unit test is added to validate that the range reduction works correctly.
We cannot currently validate that there is no bias however.

The paper is referenced in arc4random(3).

Test Plan

passes newly added unit test.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

fuz requested review of this revision.Nov 18 2024, 12:53 PM

Some context on the Lemire idea if others haven't seen it:

https://lemire.me/blog/2019/06/06/nearly-divisionless-random-integer-generation-on-various-systems/

https://shufflesharding.com/posts/dissecting-lemire

https://sts10.github.io/2020/10/10/lemire-neaarly-divisionless-random.html

For bias validation -- you could do some kind of primitive statistical test (histogram). It can't be perfectly reliable but could give a little confidence it isn't totally broken.

lib/libc/gen/arc4random_uniform.c
14–15

I think the original alphabetical include order was correct.

18–19

I have to wonder how bad the 64-bit multiplication is on 32-bit architectures.

22–25

I think we could at least keep this comment describing the purpose of the function.

fuz planned changes to this revision.Nov 18 2024, 3:00 PM

I'll look into adding a statistical test.

lib/libc/gen/arc4random_uniform.c
14–15

Note that this is a complete rewrite—I didn't look at the old code at all as to avoid accidentally copying from it. Neverthless, I agree and will correct the include order.

18–19

We currently support two 32 bit architectures. i386 has a 32×32→64 bit multiplication with the mull instruction, which performs well. Likewise, armv7 has the same with the umull instruction. Running the same benchmark, I get a 15% speedup on i386 and a 70% speedup on armv7 (division is really expensive there as we cannot guarantee the presence of the udiv instruction).

22–25

As I did not start with the original code, there was nothing to keep. I believe the purpose of the function is well described in the man page already.

  • lib/libc: arc4random_uniform include order fix

I was unfortunately not able to construct a statistical test
that is able to distinguish an obviously biased implementation like

uint32_t arc4random_uniform(uint32_t thresh) { return arc4random() % thresh; }

from a proper implementation. Maybe someone with more knowledge of
statistics could help here.

The C code in Lemire's paper uses a uint64_t bound and __uint128_t variable. Maybe it is worth a small note to this effect in the commit message (of course, 32/64 or 64/128 doesn't matter for the algorithm itself).

From @cem's comment I think the point is that the bias can be easily observed with a simple approach like a histogram of results by choosing an appropriate upper bound. For instance, choose an upper bound of 2³²✕⅔; with the naive arc4random() % 2863311531, 1000000 iterations and a 16-bin histogram I get:

0: 125241
1: 124677
2: 124899
3: 125077
4: 125090
5: 83651
6: 62995
7: 62451
8: 61890
9: 62520
10: 41509
11: 0
12: 0
13: 0
14: 0
15: 0

The bias is easily visible. When I perform the experiment with arc4random_uniform(2863311531) the histogram looks as we'd expect:

0: 94402
1: 93591
2: 93493
3: 93299
4: 93506
5: 94024
6: 94089
7: 94252
8: 93504
9: 93060
10: 62780
11: 0
12: 0
13: 0
14: 0
15: 0

But note that bias is apparent when I try the proposed change here -- I get:

0: 125090
1: 124805
2: 124952
3: 124626
4: 124852
5: 83803
6: 62237
7: 62913
8: 62071
9: 62763
10: 41888
11: 0
12: 0
13: 0
14: 0
15: 0

Yeah, you can do a simple bias test with an upper limit of 3 for example. Bucket 0 is more likely than buckets 1 and 2 under a simplistic rand32() % 3 implementation. Or something like that. It is probably easier to measure with a smaller power of 2 generator.

emaste requested changes to this revision.Nov 20 2024, 11:00 PM
This revision now requires changes to proceed.Nov 20 2024, 11:00 PM
lib/libc/gen/arc4random_uniform.c
34
In D47659#1087607, @cem wrote:

Yeah, you can do a simple bias test with an upper limit of 3 for example. Bucket 0 is more likely than buckets 1 and 2 under a simplistic rand32() % 3 implementation. Or something like that. It is probably easier to measure with a smaller power of 2 generator.

Quite on the contrary; with the biased modulo reduction and a limit of 3, 1431655765 numbers each become 1 or 2, but 1431655766 numbers become 0. The bias is very tiny and requires many billion observations to eek out.

IIRC the best way would be to use a threshold around (2/3)*UINT32_MAX, though even that still requires at least 5 observations per bucket for statistical significance of the chi-square test, requiring many billion observations in total. This does not seem suitable for a unit test that is part of a larger test suite.

Yes, yes. It is easier to analyze with e.g. rand2() % 3. But you and Ed have both mentioned a way to measure this for rand32 more tractably -- great!

I am not suggesting a bias test has to be a unit test.

  • arc4random: fix wrong variable name

I'll see that I can write a bias test outside of our test suite. It'll probably take quite a while to run.

lib/libc/gen/arc4random_uniform.c
34

Thank you for catching this. This typo was introduced when I changed the variable names from the concise ones in the paper to more memorable names. Speaks for having a unit test, really...

lib/libc/gen/arc4random_uniform.c
8–10

You might want to use 0BSD instead: https://spdx.org/licenses/0BSD.html

Here is a draft bias test for arc4random_uniform(). It samples the distribution of arc4random_uniform() but doesn't yet have code to do a chi-square test or similar.

https://gist.github.com/clausecker/1a4e0d0358f26ef5a6fd01d74b187799

Using this test, we get for a biased implementation arc4random_uniform(uint32_t threshold) { return arc4random() % theshold; }:

distribution properties:
        threshold:      3221225472
        observations:   21474836480
        least common:   6719 (0 observations)
        most common:    83532884 (35 observations)
        median:         1073740653 (expected 1610612736)
        average:        1342174728.876399 (expected 1610612735.500000)
        deviation:      942708369.486011 (expected 929887696.689840)

sample properties:
        median:         6
        average:        6.666667
        deviation:      3.496032 (expected 2.581989)

for Lemire's algorithm as proposed in this DR:

distribution properties:
	threshold:	3221225472
	observations:	21474836480
	least common:	353 (0 observations)
	most common:	647634042 (29 observations)
	median:		1610608903 (expected 1610612736)
	average:	1610614338.364706 (expected 1610612735.500000)
	deviation:	929886996.823639 (expected 929887696.689840)

sample properties:
	median:		6
	average:	6.666667
	deviation:	2.582016 (expected 2.581989)

and for the currently shipped implementation from OpenBSD:

distribution properties:
	threshold:	3221225472
	observations:	21474836480
	least common:	1683 (0 observations)
	most common:	1512057592 (28 observations)
	median:		1610621068 (expected 1610612736)
	average:	1610616491.083907 (expected 1610612735.500000)
	deviation:	929889833.268859 (expected 929887696.689840)

sample properties:
	median:		6
	average:	6.666667
	deviation:	2.581999 (expected 2.581989)

As you can see, the numbers are pretty much dead on for the implementation we ship and the one from OpenBSD, but have a high discrepancy for a known biased implementation. I hope that shows that the code yields unbiased results.

  • lib/libc/gen: change arc4random_uniform license to CC0, comment variable name changes
  • tools: add arc4random_uniform bias test

As discussed on IRC with @emaste.

@cem Do you think this statistical test instils sufficient confidence to commit this change set?

This revision is now accepted and ready to land.Dec 1 2024, 5:12 PM