Author Topic: Resolving contention - create a unique/random sequence from a 96bit ID? use CRC?  (Read 3880 times)

0 Members and 1 Guest are viewing this topic.

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3313
  • Country: ca
You don't need AES, you don't need any cryptography at all. The very simplest 32-bit random number generator will do well for your purpose:

x(n+1) = x(n)*a + b

With a good choice of a and b, it'll produce very good sequences, which will have much better uniform distribution than you need for your tasks.

You can get an initial seed (that is x(0)) from uninitialized RAM which will be random at power-up. Xor it with your unique id if you will.

 
The following users thanked this post: ksjh

Offline Sacodepatatas

  • Regular Contributor
  • *
  • Posts: 116
  • Country: es
Smaller STM32 devices don't have RNG. (Mine doesn't!)

The STM32G030F6P6 has one TRNG, but It is undocumented, you need to use the datasheet of the STM32G041x6/x8 as a guide.
 

Offline ksjh

  • Regular Contributor
  • *
  • Posts: 63
  • Country: de
I definitely agree that there are significantly cheaper, simpler, and more efficient methods - I just lack the knowledge, familiarity, etc. to be very confident that they are worth the additional effort.

Computationally, the effort is much much lower with all presented non-crypto methods (i.e., they are much faster). I would say that they are also very easy to implement. I am not sure why some of the criteria you now mentioned are required for your use-case at all. For example: Why would it matter if you can predict the next pseudo-random number? For the application you described, I would see this as a benefit, not a drawback.

If you want to get an insight into the underlying mathematical foundation (for LFSRs, CRCs etc.), I can highly recommend the book "Finite Fields" by Rudolf Lidl and Harald Niederreiter, https://doi.org/10.1017/CBO9780511525926
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4518
  • Country: gb
  • Doing electronics since the 1960s...
Quote
You can get an initial seed (that is x(0)) from uninitialized RAM which will be random at power-up

That's quite an assumption.

The advantage of using a crypto algorithm (which could be a strong hash too) is that ciphertext is as random as you could ever want. If it wasn't then the cryptosystem is crap. And you need to look no further...
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 9551
  • Country: fi
As ksjh says, I also see a repeated pattern as positive, not negative feature. It allows you to simulate the behavior and get the same result in real life. And if there ever is a problem with sequencing, it will be repeatable. Conversely, absence of a problem is also repeatable.
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4074
  • Country: us
You can get an initial seed (that is x(0)) from uninitialized RAM which will be random at power-up. Xor it with your unique id if you will.

I wouldn't do that unless I had no other choice.  It's not reliably random nor reliably predictable.  And after a non-power-on  reset it won't be randomized at all, just hold whatever was in the memory before.   Using uninitialized ram is fine if you are building a complex RNG that mixes entropy from multiple sources, since at least it won't hurt.  But I wouldn't use it directly as a seed.

If having the same sequence every startup for a given processor is acceptable using the MCU unique ID, hashed down to the target word size is a good option.  If you have an Ethernet interface, the MAC address can be used in the same way.

If you want randomized seeds there are better sources of randomness such as timer events.  Or as I said you can combine multiple entropy sources such that if one of them is bad it doesn't break your system.

If you need secure randomnesss then that is a whole different ballgame.  Almost all sources of easily available randomness are in principle subject to outside interference.  So there is not really any point to discussing without clarifying your threat model.
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 21737
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
You can get an initial seed (that is x(0)) from uninitialized RAM which will be random at power-up.

Everything I have seen leads me to doubt that. Do you have a good source for that assertion?
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4518
  • Country: gb
  • Doing electronics since the 1960s...
Quote
If you have an Ethernet interface, the MAC address can be used in the same way.

In most embedded systems, the MAC is set by the software :)

Well, you can buy a block of MAC numbers, even chips containing them...
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4074
  • Country: us
Depends.  You can buy i2c eeproms with pre programmed unique MAC addresses.  I've used them when I needed a persistent but unique MAC address.  For other applications where I had a good RNG and didn't need persistent addresses I have used randomized addresses.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 172
  • Country: us
  • ASCII > UTF8
Something that I imagine I might get from AES* is that I could take any N bits of its output (for any value of N at any position in the block) and 'feel pretty confident' that it's going to have good properties** no matter what the MCU ID happens to be.

Suppose my application had 20 parameters of different sizes that I want to be random-ish (sequence unique to that device that varies over time). With something like AES I would take the 16 byte block that it generates and slice it up into bitfields, one per parameter. This would let me generate (what appears to be) a totally unique sequence of 'sets of parameters' that update each tick.

What I perceive (correct me if I am wrong) is that this is a low (programmer/designer) effort way to get lots unique sequences of random-ish numbers of different sizes. All of the parameters update at a uniform rate, and probably very little effort has to go into understanding or verifying it's "suitability" since its AES.

*AES128 in counter mode using the 96-bit MCU ID as the key and starting from a zeroed buffer/initial values and copying the output block back into the input.
**As a unique sequence numbers that vary in a non-trival fashion from tick to tick
« Last Edit: February 01, 2025, 07:54:46 pm by incf »
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3313
  • Country: ca
The advantage of using a crypto algorithm (which could be a strong hash too) is that ciphertext is as random as you could ever want. If it wasn't then the cryptosystem is crap. And you need to look no further...

The advantage of Ferrari's motor is that it's hard enough for cracking nuts on its surface. Look no further.
 
The following users thanked this post: mskeete

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3313
  • Country: ca
You can get an initial seed (that is x(0)) from uninitialized RAM which will be random at power-up. Xor it with your unique id if you will.

I wouldn't do that unless I had no other choice.  It's not reliably random nor reliably predictable. 

Unpredictable is exactly what you want. Otherwise, use a fixed seed.

And after a non-power-on  reset it won't be randomized at all, just hold whatever was in the memory before.

Which is a random value from your previous run which will work just fine.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3313
  • Country: ca
You can get an initial seed (that is x(0)) from uninitialized RAM which will be random at power-up.

Everything I have seen leads me to doubt that. Do you have a good source for that assertion?

Yes, every time I used it it was random. I have also seen research on the subject. You can search the Internet and I'm sure you'll find some. What is that you have seen which makes you doubt that?

You wouldn't use it for encryption as it may be accessed/stuffed by the attacker. But there's no security risk in the OP's case.

In the OP's cases, where consumers fight for a resource, they should be able to back off if there's too much load. Thus, they can tell when there was a peak load. For that reason, it would be better to provide a regular pattern (as opposed to random) which can be deducted by other units. Such as, if the pub is very busy every Friday, why don't I go to the pub on Thursday.
 

Offline voltsandjolts

  • Supporter
  • ****
  • Posts: 2655
  • Country: gb
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 21737
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
You can get an initial seed (that is x(0)) from uninitialized RAM which will be random at power-up.

Everything I have seen leads me to doubt that. Do you have a good source for that assertion?

Yes, every time I used it it was random. I have also seen research on the subject. You can search the Internet and I'm sure you'll find some. What is that you have seen which makes you doubt that?

Personal observation, coupled with noting that manufacturers make zero guarantees about uninitialized RAM contents.

Plus you don't give any indication whether your observations are random/identical/different for one device each time it is initialized, ditto batch of devices.

Quote
In the OP's cases, where consumers fight for a resource, they should be able to back off if there's too much load. Thus, they can tell when there was a peak load. For that reason, it would be better to provide a regular pattern (as opposed to random) which can be deducted by other units. Such as, if the pub is very busy every Friday, why don't I go to the pub on Thursday.

There's a general principle... If you know how to optimise, optimise. If you don't know how to optimise, randomise.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3313
  • Country: ca
Plus you don't give any indication whether your observations are random/identical/different for one device each time it is initialized, ditto batch of devices.

It has nothing to do with initialization. At power up, each bit of SRAM randomly assumes either '0' or '1'. When you turn the power off, the value is preserved for some time, but eventually get lost, so the value at new power up assumes a new value which is independent of the previous one. If you wish you can get a real MCU or FPGA and experiment with that.
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 21737
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Plus you don't give any indication whether your observations are random/identical/different for one device each time it is initialized, ditto batch of devices.

It has nothing to do with initialization. At power up, each bit of SRAM randomly assumes either '0' or '1'. When you turn the power off, the value is preserved for some time, but eventually get lost, so the value at new power up assumes a new value which is independent of the previous one. If you wish you can get a real MCU or FPGA and experiment with that.

Er, that is initialisation. Not by software, but initialisation  nonetheless.

Your argument presumes there is no imbalance in the memory cells. Any imbalance will bias the initial value to be one way or the other. There is usually some physical parameter that is imbalanced, be it registration of the various masks and layers, or number of doping atoms. The latter is to be expected; current FET gates have hit a physical limit at five atoms thick.

IMNSNO it would be unwise to go into production of a device where correct operation depends on undefined behaviour. Corporate reputation and product recalls are both expensive!
« Last Edit: February 01, 2025, 10:50:51 pm by tggzzz »
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7468
  • Country: fi
    • My home page and email address
I would simply initialize a Xorshift64* PRNG with the 96-bit ID.

If you can spare an input pin or two, you can also connect a physical oscillator or a white noise generator circuit, to produce random bits you can mix to the PRNG.

As ksjh mentioned, PCG is another PRNG one can use.  The main difference is the state transition function:
Code: [Select]
#include <stdint.h>

uint64_t  xorshift64_state, pcg_state, pcg_inc;

void xorshift64_step(void) {
    uint64_t  tmp = xorshift64_state;
    tmp ^= tmp >> 12;
    tmp ^= tmp << 25;
    tmp ^= tmp >> 27;
    xorshift64_state = tmp;
}

void pcg_step(void) {
    uint64_t  tmp = pcg_state;
    tmp = tmp * UINT64_C(6364136223846793005) + (pcg_inc | 1);
    pcg_state = tmp;
}
I recommend using e.g. godbolt.org to compile them for your target architecture (example is for ARMv7e-m, specifically Cortex-M4), to see if one is superior for your target architecture.  Xorshift64* uses multi-bit shifts, which for some architectures are costly (although on many 8-bit architectures, they simplify to 4-bit, 1-bit, and 3-bit shifts, as entire bytes can be skipped); PCG uses a 64×64=64-bit multiply, which can be extremely costly on some architectures (namely, ARMv6-m/Cortex-M0/M0+, but not so on ARMv7e-m/Cortex-M3/M4/M7 and later).

I do prefer Xorshift64*, because it has no "bad seeds" except for zero, which is not allowed.

Even if the unique 96-bit IDs were sequential or otherwise perfectly correlated, all you need to do is advance the PRNG states a dozen steps or more, and the sequences will be completely different.  Sequences starting from consecutive seeds are NOT similar.  These two PRNG's outputs are also linearly independent, so skipping every N'th number in the sequence will not make it any less random.  If one wants to manipulate the state directly, then lots of warnings do apply: you do need to avoid "bad states" – which, for Xorshift64*, is the single forbidden zero state.
 
The following users thanked this post: ksjh, incf

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7468
  • Country: fi
    • My home page and email address
Since I prefer Xorshift64* for its robust state, I would use the following initialization and entropy-adding functions myself:
Code: [Select]
// SPDX-License-Identifier: CC0-1.0
#include <stdint.h>

typedef union {
    uint64_t  u64;
    uint32_t  u32[2];
    uint16_t  u16[4];
    uint8_t   u8[8];
    int64_t   i64;
    int32_t   i32[2];
    int16_t   i16[4];
    int8_t    i8[8];
    unsigned char  uc[8];
    signed char    sc[8];
    char           c[8];
} word64;

static word64  prng_state;

void prng_initialize(const uint32_t  id[3]) {
    static const uint32_t  fixed[3] = {
        // Your favourite nonzero 32-bit constants here
        0xDEADBEEF, 0xF00DCAFE, 0x0600DD06,
    };
    word64  seed = { .u32 = { 0, 0 } };

    for (int_fast8_t w = 2; w >= 0; w--) {
        if (seed.u32[0] != id[w])
            seed.u32[0] ^= id[w];
        if (seed.u32[1] != fixed[w])
            seed.u32[1] ^= fixed[3];
        for (int_fast8_t n = 16; n >= 0; n--) {
            seed.u64 ^= seed.u64 >> 12;
            seed.u64 ^= seed.u64 << 25;
            seed.u64 ^= seed.u64 >> 27;
        }
    }

    // TODO: the following store should be atomic
    prng_state.u64 = seed.u64;
}

void prng_step(void) {

    // TODO: the following load should be atomic
    uint64_t  tmp = prng_state.u64;

    tmp ^= tmp >> 12;
    tmp ^= tmp << 25;
    tmp ^= tmp >> 27;

    // TODO: the following store should be atomic
    prng_state.u64 = tmp;
}

uint32_t  prng_u32(void) {

    // TODO: the following load should be atomic
    uint64_t  tmp = prng_state.u64;

    tmp ^= tmp >> 12;
    tmp ^= tmp << 25;
    tmp ^= tmp >> 27;

    // TODO: the following store should be atomic
    prng_state.u64 = tmp;

    return (tmp * UINT64_C(2685821657736338717)) >> 32;
}

void prng_perturb_by_u32(const uint32_t  value) {

    // TODO: the following load should be atomic
    word64  tmp = prng_state;

    for (int_fast8_t  w = 1; w >= 0; w--) {
        if (tmp.u32[w] != value)
           tmp.u32[w] ^= value;
        for (int_fast8_t  r = 5; r >= 0; r--) {
            tmp.u64 ^= tmp.u64 >> 12;
            tmp.u64 ^= tmp.u64 << 25;
            tmp.u64 ^= tmp.u64 >> 27;
        }
    }

    // TODO: the following store should be atomic
    prng_state.u64 = tmp.u64;
}

void prng_perturb_by_u8(const uint8_t  value) {

    // TODO: the following load should be atomic
    word64  tmp = prng_state;

    for (int_fast8_t  w = 7; w >= 0; w--) {
        if (tmp.u8[w] != value)
           tmp.u8[w] ^= value;
        for (int_fast8_t  r = 3; r >= 0; r--) {
            tmp.u64 ^= tmp.u64 >> 12;
            tmp.u64 ^= tmp.u64 << 25;
            tmp.u64 ^= tmp.u64 >> 27;
        }
    }

    // TODO: the following store should be atomic
    prng_state.u64 = tmp.u64;
}
SPDX-License-Identifier: CC0-1.0 tells you this code is in Public Domain, per Creative Commons Zero v1.0 Universal.

The notes about atomic loads and stores means that if you want to call these functions from an interrupt context, or if you use an RTOS kernel, the loads and stores need to be atomic/guarded.  On ARM, for example, you might wish to save the interrupt mask, disable interrupts, do the load/store, and finally restore the saved mask.  It depends on your use case, development environment, and the hardware architecture you use.

The 16 iterations in initialization, and 5 and 3 iterations in the perturbations, are not magic numbers.  For initialization, I do recommend anything at or above a dozen (12), but for the perturbations, any positive number of iterations should work well, although I'd be suspicious of less than two iterations.  (I did not put them as named preprocessor macros, because making them larger does not produce any better results.  They only exist to ensure consecutive initializers or state perturbations lead to completely unrelated sequences.)

Because prng_step() only generates but discards a single number, to properly add entropy you can use prng_perturb_by_u32(value) or prng_perturb_by_u8(value).  The former modifies the state twice by 32-bit value (once shifted left by 32 bits), and advances the PRNG both times by 5 steps (arbitrarily chosen, no deeper meaning but to ensure even prng_perturb(1) has significant effect on the sequence generated).  The latter does the same with an 8-bit value, except on each byte of the 64-bit state, advancing by 3 steps each time.  How often you call these should not affect the randomness of the sequence generated, but I wouldn't bother doing it more than once per second or so.  There is nothing magic in the perturb functions: they are safe only because Xorshift64* does not have weak/bad states, except for all zeros, and these avoid it by never clearing the 32-bit/8-bit seed part they are about to modify.  The same approach is not safe to do with all PRNGs!  It is safe here exactly because we know that there are no "poor" states, only the all-zero forbidden one.

Note that when generating pseudorandom values for a specific range, the standard ((prng_u32() % range) + base) actually biases the resulting distribution for the smallest values, whenever range is not a power of two.  The weight of the bias is small for small ranges, but up to half for 2³¹ < range < 2³².  With very fast PRNGs like Xorshift64* and PCG, the mask/shift and exclude method is superior.  It does mean that on average, up to two numbers are generated for each value in range, but these PRNGs are so fast it really does not matter at all; and on some calls, somewhat more numbers may be generated and thus runtime be occasionally longer.

In practice, I use a structure that defines the range generated:
Code: [Select]
typedef struct {
    int32_t   base;   // Minimum value generated.
    uint32_t  mask;   // 0xFFFFFFFF >> ((limit) ? __builtin_clz(limit) : 32)
    uint32_t  limit;  // Maximum value generated is base + limit.
} i32_range_spec;
where mask is effectively limit with all bits less than the most significant bit set, set; you can use
Code: [Select]
uint32_t  u32_mask_for(uint32_t  limit) {
    limit |= limit >> 1;
    limit |= limit >> 2;
    limit |= limit >> 4;
    limit |= limit >> 8;
    limit |= limit >> 16;
    return limit;
}
The generator function itself is
Code: [Select]
int32_t  prng_i32_range(const i32_range_spec r) {
    while (1) {
        uint32_t  tmp = prng_u32() & r.mask;
        if (tmp <= r.limit)
            return r.base + tmp;
    }
}
Because r.limit <= r.mask < 2*r.limit, on average less than two iterations are needed.  If r.limit==0, then r.mask==0, and the function will always return 0.  If prng_u32() returns a random sequence that includes zero, the loop is quaranteed to terminate.

Fixed-time variants are more difficult, although there are a number of approaches.  One is to scale the range and redistribute the bias, i.e. bias = prng_u32(), then base + (((((uint64_t)bias) << 32) + (uint64_t)range * (uint64_t)prng_u32()) >> 32) - bias.  Scaling alone will bias individual values at regular intervals, but biasing the range before the division causes the biased values to vary randomly, thus distributing the bias across the entire range more or less uniformly.  Note that the 64-bit multiplication is actually 32×32=high-32, something natively supported by e.g. ARMv7e-m, so it is quite fast, too, with exactly two 32-bit pseudorandom numbers used for each result.  This isn't perfect, and I haven't run Xorshift64*-high-32bits through BigCrunch or Dieharder to see exactly how good it is, but it is definitely better than the modulo method.  (Main reason is that one needs to run it for a number of ranges, and I only have an aging i5-7200U laptop to number-crunch.  I can only dream of a ThreadRipper..)
« Last Edit: February 02, 2025, 07:31:32 am by Nominal Animal »
 
The following users thanked this post: SiliconWizard, ksjh, incf

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7468
  • Country: fi
    • My home page and email address
If one has an entropy source, a CMOS/TTL -level output one can sample in say the 1ms timer tick interrupt or similar, and use to perturb the PRNG state, you would not even need unique seeds at all.  A good option would be to collect 32 or 64 bits to form a word, then use the word to perturb the Xorshift64* state, for example with exclusive-or or addition (using modulo 2⁶⁴ arithmetic, as C does when using unsigned integer types; I do suggest addition here).  The only check needed for Xorshift64* is to ensure the state does not become zero.  (I haven't looked at PGC to see how sensitive its seeds are.)

For a source of entropy, a good option would be an ATtiny in SOT23-6, or perhaps Padauk PMS150C-U06 (SOT23-6, one-time programmable using Free PDK), clocked from its own RC oscillator, generating a suitable LFSR bit sequence.  The entropy would then come from the differences in the two MCU's timings.  You could also use a yet separate RC oscillator as an input to the SOT23-6 MCU, to get even more entropy (from the timing differences between the two MCUs).

If there is free board space, I'd have the MCU drive a charge pump via PWM to 12-18V for a supply to a standard transistor base-emitter avalanche breakdown white noise generator, amplified and buffered to 2.5V to 5V, sampled by the SOT23-6 MCU (either via a comparator, GPIO pin, or ADC), digitally filtered ("whitened") and combined to 8/32-bit words, and used to perturb the LFSR/Xorshift64* state.

(Sincere apologies to those who hate my wall-of-text posts.  Still working on trying to be less verbose.)

« Last Edit: February 02, 2025, 08:57:47 am by Nominal Animal »
 
The following users thanked this post: ksjh, incf

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16126
  • Country: fr
As ksjh says, I also see a repeated pattern as positive, not negative feature. It allows you to simulate the behavior and get the same result in real life. And if there ever is a problem with sequencing, it will be repeatable. Conversely, absence of a problem is also repeatable.

Exactly. That, this ability to check the behavior on a very large number of successive values and seeds, is why I suggested doing this on computer first. Use C, exact-width integers, and the result will be exactly the same.

I was apparently told (if that was for me) that you don't have the UIDs on a computer. Uh. You just generate a bunch of them randomly (making sure those are unique, by stuffing a table with them and inserting only non-duplicates), and there you go. You can make it analyze statistics on that with a few extra lines of code. Very handy, and a good practice in general (implementing your algorithms first on computer), that some of us recommend on a regular basis.

If you directly code this kind of stuff on the (small) target, you will have a lot of difficulties achieving the same level of verification in any kind of reasonable amount of time, that will look more like shooting in the dark.

My personal "default" for a PRNG is currently Xoshiro256++, which is very good and lightweight enough. It's on par with Xorshift64* IIRC, so either would be fine.

Note: you can use this website to pick and/or test a PRNG: http://www.cacert.at/cgi-bin/rngresults
(you need to generate long sequences for this to be relevant).
« Last Edit: February 02, 2025, 10:23:04 am by SiliconWizard »
 
The following users thanked this post: ksjh

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7468
  • Country: fi
    • My home page and email address
My personal "default" for a PRNG is currently Xoshiro256++, which is very good and lightweight enough. It's on par with Xorshift64* IIRC, so either would be fine.
Yes, noting that Xorshift64* is only "fully random" in its 32 to 40 most significant bits (which is what my suggestions use), and that you are referring to the following 64-bit PRNG that passes all the randomness tests:
Code: [Select]
#include <stdint.h>

typedef union {
    uint64_t     u64[4];
    int64_t      i64[4];
    uint32_t     u32[8];
    int32_t      i32[8];
    uint16_t     u16[16];
    int16_t      i16[16];
    uint8_t       u8[32];
    int8_t        i8[32];
    unsigned char uc[32];
    signed char   sc[32];
    char           c[32];
} word256;

static inline uint64_t u64_rol23(const uint64_t value) { return (value << 23) | (value >> 41); }
static inline uint64_t u64_rol45(const uint64_t value) { return (value << 45) | (value >> 19); }

uint64_t prng_u64(word256 *const state) {
    // TODO: Lock state
    const uint64_t  result = state->u64[0] + u64_rol23(state->u64[0] + state->u64[3]);
    const uint64_t  tmp = state->u64[1] << 17;

    state->u64[2] ^= state->u64[0];
    state->u64[3] ^= state->u64[1];
    state->u64[1] ^= state->u64[2];
    state->u64[0] ^= state->u64[3];

    state->u64[2] ^= tmp;
    state->u64[3] = u64_rol45(state->u64[3]);

    // TODO: Unlock state
    return result;
}
It has a very large period, 2²⁵⁶-1, and all states except zero are valid.  After modifying the state, a dozen iterations suffices to get it to produce good sequences; see fig. 7 in Vigna's paper.

That is, after whatever modifications you do to the state (besides generating 64-bit pseudo-random numbers), do 12 or more iterations discarding the result, and the rest of the sequence will be good.  No need to worry about bad states, except avoid the all-zeros state like for Xorshift64*.

It is extremely interesting for architectures without a fast multiplication.  Even on 8-bit ones, u64_rol23() simplifies to a rotation of the eight bytes with one shift-right-through-carry (and set most significant bit if carry set); u64_rol45() to a rotation of the eight bytes with three shift-right-through-carry (+set MSB if carry) over the eight bytes.  The downside is that for multitasking or use in interrupt context, you either need to lock the 256-bit/16-byte structure for the whole generation, or you need to use 16 to 24 bytes of temporary memory or registers.  I definitely need to investigate this for AVR, especially for use in ATtiny's, for random bit sequence generation (2²⁶²-1 bit period) as I described earlier.
 
The following users thanked this post: ksjh

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3313
  • Country: ca
Imagine you have an X intersection of 2 roads where you must put through N cars every hour in each direction, and you want to minimize the number of collisions by controlling the time every individual car crosses the intersection, but without knowing when the cars in opposite direction cross. This is akin of the task OP wants to solve.

If you try to  estimate the probability of a collision, you'll soon realize that it does depend on the number of cars per hour, but is completely independent of the timing of car crossing - in other words, for any given number of cars per hour, it plainly doesn't matter what you do - the probability of a collision is the same.  Hence you have no way to influence the average number of collisions (unless you find a way to pass information from one road to the other).

Then why in the world would you need a random number generator with superb statistical properties?
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7468
  • Country: fi
    • My home page and email address
Then why in the world would you need a random number generator with superb statistical properties?
Because if you simulate the same situation with correlated sequences of car intervals, you'll discover how to maximize the probability of a traffic jam.

Add even one random interval in the mix, and the probability instantly shrinks, for as long as that interval is within the observable range.

In other words, the problem is not when the random number generator is poor, it is when its output is periodic with respect to other related generators: when the incoming car interval sequences correlate, synchronize.  And when you want to avoid that, why not go for a well-known good, extremely fast PRNG?

If your comment was to my suggestion of using the exclusion method instead of modulo operation, the bias indeed is irrelevant for traffic control.  It just happens that unless you have fast hardware division, or the range generated is a constant that the compiler can turn into multiplication by fractional reciprocal, the exclusion method is faster.  The scaling method (on architectures with "mulhi", 32×32=32 high bits of result multiplication) can be even faster.  As usual, I tried to explain what the problem is, and what the alternatives are, so you can choose for yourself.
« Last Edit: February 02, 2025, 03:18:36 pm by Nominal Animal »
 
The following users thanked this post: ksjh, incf

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 172
  • Country: us
  • ASCII > UTF8
I don't think NorthGuy's analogy is quite right.

Roughly speaking, there is a system comprised of a variable power source, and dozens of variable loads.
There are hundreds of these systems. Each has unique properties that cannot be predicted and do change over time.
 
The supplied voltage (and impedance/current limit) will suddenly rise or drop due to external factors - this process is entirely random from the perspective of the loads. Think solar panels.
Supply voltage/impedance It also can suddenly rise or drop due to changes in loading. Loading changes suddenly and randomly due to external factors outside my control or visibility.
My units have to use all available extra power all of the time up to the limit supported and each load (and the source).
This dictates certain total current which changes drastically over time.

I am seeking to avoid are three conditions [for which I have made up arbitrary names]: oscillation, crashing (voltage goes low enough to cause hard reset/fault/'panic stop'), and lockout ('getting stuck in low efficiency regimes of operation').

As Nominal Animal points out, there are some similar conditions that are commonly observed in traffic jams and other similar systems. "The divers are trying to go as fast as possible on an icy road in traffic, but not so fast that they run into the car in front of them and cause a 30 car pile-up"


If all of the units behave identically (or just too similarly, periodically, etc.) some combination of oscillation, crashing, and lockout is practically guaranteed.

There are many ways to deal with it, and everything that has been posted can be made to work with some effort.

-
addendum: the units use 'perturb and observe' type control loops. If they all perturb (or observe) at the same time at the same time or by the same amount problems happen. If they all 'panic stop' (or resume) at the same low voltage problems happen. If cycles occur due to stopping/starting of units issues will happen. etc. etc.

There is a limit to how 'gentle' / 'overdamped' I can make the system.

This some of why I am enamored by the idea of having ~20 independent randomized parameter sequences that are unique per unit. Bad combinations of parameters can still occasionally occur, but only temporarily.

It's a crutch for me feeling that it's not feasible to fully understand or accurately model every possible permutation of the system.



« Last Edit: February 02, 2025, 05:21:48 pm by incf »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf