It shows how long it takes to reach an optimal entropy state vector from a low entropy state vector for seven PRNGs. With a low entropy, the random numbers generated are likely to be suspect from a randomness perspective. To avoid this, many advocate a warm up by 'burning' the early random numbers via rejection.
It is clear that xorshift64* does not need a warm up. WELL19937a looks like it needs about 130 samples to be burnt to get to an optimal state vector. The Mersenne Twister is horrendous.
This is a bit of a black art and why I considered forcing an initial state vector to have an optimal entropy, avoiding the need to warm up. A non-optimal entropy would be one with either a low Hamming Weight (HW) or a high HW. Forcing an initial state vector to have an optimal entropy works.
However, the initial state vector is only half of the story. Vigna, and nobody else for that matter, considers what happens with the Hamming Weight of the state vector during a generator session. It turns out we have a binomial distribution - a discrete normal distribution. With a 64-bit state vector one PRNG had about 90% of the HW in the range 32 ± 6. 10% therefore were outside this range. Being random, it would be difficult, if not impossible, to know when we drift from the centre area of the distribution's 'bell shape' to a tail of the distribution. How long we remain in the tail is easy to determine, as it will reflect the time to reach an optimal entropy state vector from a low entropy initial state vector - the worst warm up period.
From Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications we have:
That was written in 2010 which justifies my Hamming Method approach, but it does not address the issue of drifting into a distribution's tail later on.Warm up your generator first
If you are not able to generate good quality random seeds e.g. using /dev/random or if you allow the
user to specify the seed values via command-line parameters, then it is often wise to “warm up”
your generator before using the random numbers it generates. In other words, set the seeds to your
desired starting values and then before your program starts its real work, generate a few hundred or
even a few thousand random numbers and discard them (as a rule - the longer the period of the
RNG the more initial numbers you should discard at the beginning). This is important if your seed
values have very low entropy (or more simply they have too many runs of 0 or 1 bits in their binary
representation) – some otherwise good RNGs (especially shift-register-based generators such as MT)
are not very random from their initial state when certain seed values are used. The warm up trick is
generally good advice for many RNGs, particularly those with very long periods.
When we drift into a tail, whenever that is, there is, at the moment, nothing that we can do about it. Firstly, we need to know when we are in a tail and then provide a remedy. Coding to cover both of those events will probably see a generator's throughput take a massive hit.
This is where the 'U turn' comes in. If we use a generator which does not need a warm up, then we do not need to force an initial state vector to have an optimal entropy. We do not need to consider HW.
So, what is the solution to drifting into a tail of the distribution during a generator session. Simple: Don't use a generator which requires a warm up.
A linear congruential generator (LCG) does not need a warm up. PCG32 (Permuted congruential generator) state vector is Donald Knuth's 64-bit LCG. Its output is then permuted where the order is changed by shifting operations to eliminate the weakness' of LCGs and the result of that is the generator's output. PCG32 then does not need a warm up. This was tested and found to be true. Another test considered the HW of the next state vector following a drift into a tail. That HW was back into the 'bell shape' immediately after entering the tail. Needless to say, PractRand will not spot this - it may do if we spent sometime in the tail.
Bernard Widynski's Middle Square Weyl Sequence RNG (MsWs) appears not to need a warm up either, but his paper suggests otherwise. He reckons that we should 'burn' the first five of so random numbers, unless we choose the constant 's' to not have a low entropy state. My MsWs only uses one 's', taken from the website, and that has a HW of 35.
So PCG32 is better than I thought and MsWs with a quality 's'.
I suppose looking at SFC32 and RomuTrio will be a little more complicated, as they do not use a state vector with only one element. I will have a look.
CryptoRndII, as with all CPRNGs, does not have a state vector and therefore does not need warming up or getting involved with HW.
The work I did on HW has not been a waste of time, as I got here because of it.
Overall conclusion:
Avoid generators which may require a warm up, especially a long warm up. The Hamming approach is better than 'burning' regarding the initial state vector, but it is of no help if we drift into a tail later on. It should be noted that drifting into a tail where the entropy is poor is an unlikely event, but imagine drifting into a tail with Mersenne Twister; we could be getting poor quality random numbers for quite a while.
This is contrary to the consensus of warming up. I am saying if we need to warm up a generator, especially a long warm up, then don't use that generator - period.