Since I prefer Xorshift64* for its robust state, I would use the following initialization and entropy-adding functions myself:
// SPDX-License-Identifier: CC0-1.0
#include <stdint.h>
typedef union {
uint64_t u64;
uint32_t u32[2];
uint16_t u16[4];
uint8_t u8[8];
int64_t i64;
int32_t i32[2];
int16_t i16[4];
int8_t i8[8];
unsigned char uc[8];
signed char sc[8];
char c[8];
} word64;
static word64 prng_state;
void prng_initialize(const uint32_t id[3]) {
static const uint32_t fixed[3] = {
// Your favourite nonzero 32-bit constants here
0xDEADBEEF, 0xF00DCAFE, 0x0600DD06,
};
word64 seed = { .u32 = { 0, 0 } };
for (int_fast8_t w = 2; w >= 0; w--) {
if (seed.u32[0] != id[w])
seed.u32[0] ^= id[w];
if (seed.u32[1] != fixed[w])
seed.u32[1] ^= fixed[3];
for (int_fast8_t n = 16; n >= 0; n--) {
seed.u64 ^= seed.u64 >> 12;
seed.u64 ^= seed.u64 << 25;
seed.u64 ^= seed.u64 >> 27;
}
}
// TODO: the following store should be atomic
prng_state.u64 = seed.u64;
}
void prng_step(void) {
// TODO: the following load should be atomic
uint64_t tmp = prng_state.u64;
tmp ^= tmp >> 12;
tmp ^= tmp << 25;
tmp ^= tmp >> 27;
// TODO: the following store should be atomic
prng_state.u64 = tmp;
}
uint32_t prng_u32(void) {
// TODO: the following load should be atomic
uint64_t tmp = prng_state.u64;
tmp ^= tmp >> 12;
tmp ^= tmp << 25;
tmp ^= tmp >> 27;
// TODO: the following store should be atomic
prng_state.u64 = tmp;
return (tmp * UINT64_C(2685821657736338717)) >> 32;
}
void prng_perturb_by_u32(const uint32_t value) {
// TODO: the following load should be atomic
word64 tmp = prng_state;
for (int_fast8_t w = 1; w >= 0; w--) {
if (tmp.u32[w] != value)
tmp.u32[w] ^= value;
for (int_fast8_t r = 5; r >= 0; r--) {
tmp.u64 ^= tmp.u64 >> 12;
tmp.u64 ^= tmp.u64 << 25;
tmp.u64 ^= tmp.u64 >> 27;
}
}
// TODO: the following store should be atomic
prng_state.u64 = tmp.u64;
}
void prng_perturb_by_u8(const uint8_t value) {
// TODO: the following load should be atomic
word64 tmp = prng_state;
for (int_fast8_t w = 7; w >= 0; w--) {
if (tmp.u8[w] != value)
tmp.u8[w] ^= value;
for (int_fast8_t r = 3; r >= 0; r--) {
tmp.u64 ^= tmp.u64 >> 12;
tmp.u64 ^= tmp.u64 << 25;
tmp.u64 ^= tmp.u64 >> 27;
}
}
// TODO: the following store should be atomic
prng_state.u64 = tmp.u64;
}
SPDX-License-Identifier: CC0-1.0 tells you this code is in Public Domain, per Creative Commons Zero v1.0 Universal.
The notes about atomic loads and stores means that if you want to call these functions from an interrupt context, or if you use an RTOS kernel, the loads and stores need to be atomic/guarded. On ARM, for example, you might wish to save the interrupt mask, disable interrupts, do the load/store, and finally restore the saved mask. It depends on your use case, development environment, and the hardware architecture you use.
The 16 iterations in initialization, and 5 and 3 iterations in the perturbations, are not magic numbers. For initialization, I do recommend anything at or above a dozen (12), but for the perturbations, any positive number of iterations should work well, although I'd be suspicious of less than two iterations. (I did not put them as named preprocessor macros, because making them
larger does not produce any
better results. They only exist to ensure consecutive initializers or state perturbations lead to completely unrelated sequences.)
Because
prng_step() only generates but discards a single number, to properly add entropy you can use
prng_perturb_by_u32(value) or
prng_perturb_by_u8(value). The former modifies the state twice by 32-bit
value (once shifted left by 32 bits), and advances the PRNG both times by 5 steps (arbitrarily chosen, no deeper meaning but to ensure even
prng_perturb(1) has significant effect on the sequence generated). The latter does the same with an 8-bit
value, except on each byte of the 64-bit state, advancing by 3 steps each time. How often you call these should not affect the randomness of the sequence generated, but I wouldn't bother doing it more than once per second or so. There is nothing magic in the perturb functions: they are safe only because Xorshift64* does not have weak/bad states, except for all zeros, and these avoid it by never clearing the 32-bit/8-bit seed part they are about to modify. The same approach
is not safe to do with all PRNGs! It is safe here exactly because we know that there are no "poor" states, only the all-zero forbidden one.
Note that when generating pseudorandom values for a specific range, the standard
((prng_u32() % range) + base) actually biases the resulting distribution for the smallest values, whenever
range is not a power of two. The weight of the bias is small for small ranges, but up to half for 2³¹ <
range < 2³². With very fast PRNGs like Xorshift64* and PCG, the mask/shift and exclude method is superior. It does mean that on average, up to two numbers are generated for each value in range, but these PRNGs are so fast it really does not matter at all; and on some calls, somewhat more numbers may be generated and thus runtime be occasionally longer.
In practice, I use a structure that defines the range generated:
typedef struct {
int32_t base; // Minimum value generated.
uint32_t mask; // 0xFFFFFFFF >> ((limit) ? __builtin_clz(limit) : 32)
uint32_t limit; // Maximum value generated is base + limit.
} i32_range_spec;
where
mask is effectively
limit with all bits less than the most significant bit set, set; you can use
uint32_t u32_mask_for(uint32_t limit) {
limit |= limit >> 1;
limit |= limit >> 2;
limit |= limit >> 4;
limit |= limit >> 8;
limit |= limit >> 16;
return limit;
}
The generator function itself is
int32_t prng_i32_range(const i32_range_spec r) {
while (1) {
uint32_t tmp = prng_u32() & r.mask;
if (tmp <= r.limit)
return r.base + tmp;
}
}
Because
r.limit <= r.mask < 2*r.limit, on average less than two iterations are needed. If
r.limit==0, then
r.mask==0, and the function will always return 0. If
prng_u32() returns a random sequence that includes zero, the loop is quaranteed to terminate.
Fixed-time variants are more difficult, although there are a number of approaches. One is to scale the range and redistribute the bias, i.e.
bias = prng_u32(), then
base + (((((uint64_t)bias) << 32) + (uint64_t)range * (uint64_t)prng_u32()) >> 32) - bias. Scaling alone will bias individual values at regular intervals, but biasing the range before the division causes the biased values to vary randomly, thus distributing the bias across the entire range more or less uniformly. Note that the 64-bit multiplication is actually 32×32=high-32, something natively supported by e.g. ARMv7e-m, so it is quite fast, too, with
exactly two 32-bit pseudorandom numbers used for each result. This isn't perfect, and I haven't run Xorshift64*-high-32bits through BigCrunch or Dieharder to see
exactly how good it is, but it is definitely better than the modulo method. (Main reason is that one needs to run it for a number of ranges, and I only have an aging i5-7200U laptop to number-crunch. I can only dream of a ThreadRipper..)