U turn on Hamming method

General FreeBASIC programming questions.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

U turn on Hamming method

Post by deltarho[1859] »

Here is a snapshot from page 19 of 'An experimental exploration of Marsaglia’s xorshift generators, scrambed' by Sebastiano Vigna in July 2016.

Image

It shows how long it takes to reach an optimal entropy state vector from a low entropy state vector for seven PRNGs. With a low entropy, the random numbers generated are likely to be suspect from a randomness perspective. To avoid this, many advocate a warm up by 'burning' the early random numbers via rejection.

It is clear that xorshift64* does not need a warm up. WELL19937a looks like it needs about 130 samples to be burnt to get to an optimal state vector. The Mersenne Twister is horrendous.

This is a bit of a black art and why I considered forcing an initial state vector to have an optimal entropy, avoiding the need to warm up. A non-optimal entropy would be one with either a low Hamming Weight (HW) or a high HW. Forcing an initial state vector to have an optimal entropy works.

However, the initial state vector is only half of the story. Vigna, and nobody else for that matter, considers what happens with the Hamming Weight of the state vector during a generator session. It turns out we have a binomial distribution - a discrete normal distribution. With a 64-bit state vector one PRNG had about 90% of the HW in the range 32 ± 6. 10% therefore were outside this range. Being random, it would be difficult, if not impossible, to know when we drift from the centre area of the distribution's 'bell shape' to a tail of the distribution. How long we remain in the tail is easy to determine, as it will reflect the time to reach an optimal entropy state vector from a low entropy initial state vector - the worst warm up period.

From Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications we have:
Warm up your generator first

If you are not able to generate good quality random seeds e.g. using /dev/random or if you allow the
user to specify the seed values via command-line parameters, then it is often wise to “warm up”
your generator before using the random numbers it generates. In other words, set the seeds to your
desired starting values and then before your program starts its real work, generate a few hundred or
even a few thousand random numbers and discard them (as a rule - the longer the period of the
RNG the more initial numbers you should discard at the beginning). This is important if your seed
values have very low entropy (or more simply they have too many runs of 0 or 1 bits in their binary
representation) – some otherwise good RNGs (especially shift-register-based generators such as MT)
are not very random from their initial state when certain seed values are used. The warm up trick is
generally good advice for many RNGs, particularly those with very long periods.
That was written in 2010 which justifies my Hamming Method approach, but it does not address the issue of drifting into a distribution's tail later on.

When we drift into a tail, whenever that is, there is, at the moment, nothing that we can do about it. Firstly, we need to know when we are in a tail and then provide a remedy. Coding to cover both of those events will probably see a generator's throughput take a massive hit.

This is where the 'U turn' comes in. If we use a generator which does not need a warm up, then we do not need to force an initial state vector to have an optimal entropy. We do not need to consider HW.

So, what is the solution to drifting into a tail of the distribution during a generator session. Simple: Don't use a generator which requires a warm up.

A linear congruential generator (LCG) does not need a warm up. PCG32 (Permuted congruential generator) state vector is Donald Knuth's 64-bit LCG. Its output is then permuted where the order is changed by shifting operations to eliminate the weakness' of LCGs and the result of that is the generator's output. PCG32 then does not need a warm up. This was tested and found to be true. Another test considered the HW of the next state vector following a drift into a tail. That HW was back into the 'bell shape' immediately after entering the tail. Needless to say, PractRand will not spot this - it may do if we spent sometime in the tail.

Bernard Widynski's Middle Square Weyl Sequence RNG (MsWs) appears not to need a warm up either, but his paper suggests otherwise. He reckons that we should 'burn' the first five of so random numbers, unless we choose the constant 's' to not have a low entropy state. My MsWs only uses one 's', taken from the website, and that has a HW of 35.

So PCG32 is better than I thought and MsWs with a quality 's'.

I suppose looking at SFC32 and RomuTrio will be a little more complicated, as they do not use a state vector with only one element. I will have a look.

CryptoRndII, as with all CPRNGs, does not have a state vector and therefore does not need warming up or getting involved with HW.

The work I did on HW has not been a waste of time, as I got here because of it.

Overall conclusion:

Avoid generators which may require a warm up, especially a long warm up. The Hamming approach is better than 'burning' regarding the initial state vector, but it is of no help if we drift into a tail later on. It should be noted that drifting into a tail where the entropy is poor is an unlikely event, but imagine drifting into a tail with Mersenne Twister; we could be getting poor quality random numbers for quite a while.

This is contrary to the consensus of warming up. I am saying if we need to warm up a generator, especially a long warm up, then don't use that generator - period. Image
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: U turn

Post by deltarho[1859] »

RomuTrio

Does not need a warm up.

From the author:
The state variables must be seeded such that at least one bit of state is non-zero.
Romu generators are forgiving about seeding.
I also tried a few poor seeds, with low entropies, and we were back into the 'bell shape' of the distribution immediately so drifting into a tail is not an issue.

Another plus for RomuTrio. Image
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: U turn

Post by deltarho[1859] »

SFC32

The first three iterations are poor, given a low entropy initial state, and being in a tail for three iterations may not be problematic. I would rather come out of a tail immediately, as with PCG32 and RomuTrio.

The SFC32.bas code forces an optimal HW initially but does nothing regarding a drift into a tail.
adeyblue
Posts: 299
Joined: Nov 07, 2019 20:08

Re: U turn

Post by adeyblue »

Do you ever post these things in the places where the maths and theory people live? You'd probably have far more edifying conversations about these things amongst people who find it equally stimulating.

Heck, I think you have a site, does it have a blog? These posts are obviously all connected to each other in one way or another, and there's a process and iteration of varying degrees linking your posts but if they just live on the forum, nobody has a chance of following along with your ojourney or methods in future since the posts aren't connected or grouped in any way.

This isn't in any way saying don't post these things, just, you're clearly passionate about this and the audience in this forum, well, isn't. It's also a little sad, in a shouting into the void sense, to see you just posting these lengthy posts only to be ignored most of the time when there must be some community who love theorycrafting and testing RNGs as much as you somewhere.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: U turn

Post by deltarho[1859] »

Wow!

Many years ago at the PowerBASIC forums a cryptography guy, who marketed a suite of routines, gave me a tip on some code that I was working on. My code was not crypto' related, but the tip was. The tip fascinated me enough to check it out. I ended up at the Internet Engineering Task Force (IETF) website and became totally hooked on cryptography. The guy above became a mentor, and there was one other who had a site called Crypto Archives. The second guy left PB some years ago and the first guy very rarely visits. His software died when his country effectively put a massive tax on crypto' exports. As the years went by, I knocked out more crypto' procedures than you can shake a stick at.

One thread was about a dozen posts long with only me posting, and one 'Smart Alec' asked if I liked talking to myself. I have since learnt that many are using my procedures, but hardly anyone told me. One guy posted some internet crypto work only recently and his code mentioned 'by David Roberts' about half a dozen times. I remember thinking to myself: "Well, I never."

So, I have got used to 'rambling on' with, apparently, no one being interested. My 'A fast CPRNG' thread is about 4½ years old now and has had 15,416 views. That is many views by an uninterested audience. I wouldn't say that I was passionate about this stuff, but I am drawn into what seems to be an endless opportunity for analysis, and I do have a MSc in Numerical Analysis. I have no idea what attracted me to Numerical Analysis, but it was the subject in my maths BSc that I could not get enough of.

I am a member of a couple of cryptography forums but found them not particularly welcoming and, oh boy, are they a boring lot of individuals totally lacking in any sense of humour. A sense of humour seems to be lacking in 'heavy duty' subjects. There are exceptions, the most notable being Richard Feynman who was quite a character and got himself a Nobel Prize with a few other guys.
there must be some community who love theorycrafting and testing RNGs as much as you somewhere.
I am not so sure that is true.

Most, if not all, of PRNG work seems to be done by individuals who are not members of a community. If I created a forum for them to join, I have a feeling that very few, if any, would sign up. I wouldn't be surprised to learn that many of them are what Carl Jung would describe as being INTP, and they are certainly not team member material. I know that because I am an INTP. Feynman was an exception, being an extrovert if ever there was one.
nobody has a chance of following along with your ojourney or methods in future since the posts aren't connected or grouped in any way.
Perhaps I should knock out a pdf entitled "Ramblings on PRNGs by a non-cryptographer" and put it on my website. The only problem with that is that very few visit my website and the downloads are very few and far between. I mentioned Encrypternet to one of the UK computer magazines, but got no response. I even mentioned it to Bruce Schneier and got back: "Thanks, and good luck". It is a pity because there is nothing like it on the internet and is ideal for people working from home exchanging sensitive material with their employer; something which became common when the pandemic struck. It is not as if I am charging much - it is freeware.

Anyway, adeyblue, I appreciate your post. It was good of you to take the time out and write what you did. It restored my faith in human nature, which has been waning in my twilight years. I am serious - it has done. Thanks again. Image
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: U turn

Post by dodicat »

There are plenty of responses to deltarho's posts.
Quite a few of us now have the practrand test suite and entropy and chi squared tests.
The algorithms are getting faster and faster, and they are not all deltarho's.
You may notice that the posts usually come rolling in a few days after his first treatise of a subject.
This is probably because it takes a day or two to digest, and it takes a little mental strain to leap into the abyss.
...
I would say that at least one of the freebasic rnd algorithms should return a ulong, and not go through the bother of dividing by MAXULONG to return double, because having to re multiply by MAXULONG (for many purposes) again slows them all down.
Regarding warm ups I cheated with my last contribution VIZ:

Code: Select all


namespace rand64
dim as ulongint a,b,c,d,e
const max=18446744073709551615ull

function rndU  as  ulongint
   e = a - ((b shl 7) or (b shr (57)))
   a = b xor ((c shl 13) or (c shr (51)))
   b = c + ((d shl 37) or (d shr (27)))
   c = d + e
   d = e + a
   return d
end function

sub autoinit() constructor
var n=timer
a=n:b=n:c=n:d=n
    for m as long=n to n+2
        a+=m
    print rndU()
next
print "automatic warm up"
end sub
#define rndf  rand64.rndU/rand64.max
#define rangeI(f,l) clngint((rand64.rndU() mod (((l)-(f))+(1))) + (f))
#define rangeF(f,l) Rndf * ((l) - (f)) + (f)
end namespace

sub init(n as long)
    rand64.a=n:rand64.b=n:rand64.c=n:rand64.d=n
    for m as long=n to n+2
        rand64.a+=m
    rand64.rndU()
    next
end sub

'bla
'bla
sleep
 
Nobody needs to be an expert in any branch of maths to contribute.
Sometimes Lay people come up with the most novel solutions when the expert is bogged down with theory and analyses.
- and what's wrong with a little maths in freebasic?
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: U turn

Post by jj2007 »

adeyblue wrote:Do you ever post these things in the places where the maths and theory people live? You'd probably have far more edifying conversations
That's true, but posting it here does no harm, either. I must admit that I've given up the attempt to understand what David posts (it would cost me too much time to keep track), but I still read his posts, and I like that he insists to find the Holy Grail. Keep up the good work, David!
TJF
Posts: 3809
Joined: Dec 06, 2009 22:27
Location: N47°, E15°
Contact:

Re: U turn

Post by TJF »

dodicat wrote:- and what's wrong with a little maths in freebasic?
Nothing is wrong with a little math ...
... but the theme should be clear. The subject "U turn" isn't really descriptive, and the topic isn't well sorted under "Programming -> General". It's more like a "Community Discussion".

Some people here are interested in getting as less of entropy from their code as possible. I'd say it's the natural programmers interest to make the SW outcome predictable. And when a normal programmer comes to that thread and it takes a notable time - especially for non-native speakers like jj2007 or myself - to find out, that there's a further topic that should get ignored, since it's against the "natural" interests, then I understand that adeyblue gets annoyed.

@deltarho[1859]: write whatever you want to write. But try to find a clear subject, put one topic in one thread and place it in the matching section. Be mindful in wasting the time of your readers.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: U turn

Post by deltarho[1859] »

TJF wrote:put one topic in one thread and place it in the matching section.
I agree that should have been done.

I am reminded of the taxes acts here in the UK. In a chancellor's budget, a few changes may be made to the tax laws. A budget or two later, and further changes may be made. Eventually, in considering a tax law, quite a few amendments may have to be consulted, and the whole thing gets unwieldy. So, every few years or so a consolidation taxes act is introduced putting everything under one roof.

In my case, I should have started a thread simply entitled 'Random number generators' and added to it as new ideas 'popped up'. Of course when I started this journey I had no idea that fours years down the line I would still be at it. The journey actually started at PowerBASIC when I was looking at George Masaglia's Multiply-with-carry with a view to using that as opposed to PowerBASIC's RND. That ended up a collaborative effort and was great fun. Later, on my own, Complementary-Multiply-with-carry with lag code was developed and CMWC4096 ported to FreeBASIC.

What I have tended to do is 'write a paper' via a thread with new ideas. This is OK if the papers were put into a folder - a folder being a master thread or a website. As is, the 'papers' are scattered about this forum. Having said that, there are papers published with very long reference sections citing other papers scattered all over the place. I could, of course, have used a reference section citing earlier threads.

Talking off scatter, I should tell you that I am a scatterbrain, which may explain things. 9 Signs That You're A Scatterbrain almost describes me to a tee; I definitely don't do #1.

A friend of mine and I visited his girlfriend one day. She asked: "David, why have you two different coloured socks on?" I looked down, looked back at her and said: "What's wrong with that - I have another pair just like them at home". She could not stop laughing. I did not think it funny - I was just stating a fact. I often find myself entertaining people without any intention of doing so. Image
speedfixer
Posts: 606
Joined: Nov 28, 2012 1:27
Location: CA, USA moving to WA, USA
Contact:

Re: U turn

Post by speedfixer »

I have ZERO qualifications as an expert relating to math, programming, random number theory - any of that stuff. I AM an expert and retired professional on solving problems. Literally: any kind of problem.

Several of your threads have very few, if any, contributions by other forum members.
Almost always, the actual read count is well above any average thread.

Personally, as a problem solver, I always look for the best way to perform tasks that solve some problem.

I view these posts as the same thing: try to solve a set of 'problems.'

What is the <best><fastest><cheapest><smallest><pick a qualifier> way to get acceptably random numbers?
How do we know we achieved that goal?

--

When I try, I understand most of what you have written. I usually don't care to understand: I'm too old to follow along, but trust what you are doing, even when you err. You will revisit the issue to correct yourself - and therefore earned that trust.

Doesn't matter.

You have produced a lot of excellent code and improvements for FB. I'm sure everyone appreciates that.

I really, really enjoy your writing. It is clear, concise, and travels smoothly from point A to point B with occasional entertaining side trips. You stick to traveling to your goal. I have no intent to ever spend deep thought in these topics, but have taken so many educational side trips chasing hints in your citations that I am always entertained.


Even if you don't get a lot of people commenting on your posts, a LOT of people read them. Some people just aren't flexible enough to appreciate what you do. Valuable stuff. Keep it up!

Thank you.




david
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: U turn

Post by deltarho[1859] »

speedfixer wrote:I AM an expert and retired professional on solving problems. Literally: any kind of problem.
Oh boy, you must have some stories to tell. No doubt you will have some which you will not be allowed to tell. Image
Several of your threads have very few, if any, contributions by other forum members.
That may be because I have explained something so well there is no need for comment. I take full responsibility for that. On the other hand, I may have explained something so badly some may be reluctant to admit that they do not understand. I take full responsibility for that as well. Without comment, I cannot tell which is occurring.

[Richard Feynman] truly believed that if you couldn't explain something simply, you didn't understand it. Leonard Susskind.

Sometimes I don't understand something and since I am up to my armpits in the subject what chance do others have?

If I don't understand something which someone has written, I have no problem in saying so and will ask for another explanation. However, not everyone can do that. I sometimes think few people can, and I cannot take responsibility for that.
Almost always, the actual read count is well above any average thread.
I know that and why I keep chugging away.

What gave me the most pleasure in your post, David, was: "I really, really enjoy your writing." Whilst I strive to explain something simply I am aware that I sometimes fail but, I don't know why, more importantly is that I have managed to provide a good read. I remember doing a 'heavy duty' post at PowerBASIC on encryption, and one guy came back with something like: “I didn't understand a word of that, David, but it was an enjoyable read.” That made me smile.
jj2007 wrote:Keep up the good work, David!
speedfixer wrote:Keep it up!
Thanks, guys, I will.

Your cheques are in the post by the way. Image
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: U turn on Hamming method

Post by deltarho[1859] »

@dodicat

Since d is the Ulongint generator return, I am calling a, b, c, e the state vector. I think that 256 bits of state vector is a bit overkill for a 2^64 period generator.

Anyway, forcing a poor entropy state vector, that is with no warm up, it looks like we need three samples at least to achieve an optimal state vector. Forcing an optimal initial state vector would remove the need for a warm up, but we would be faced with residing in a tail, if we drifted into one, for three iterations. Not a big issue, and I doubt that PractRand would complain; and you haven't had a complaint.

With have a similar situation to SFC32 where I would prefer to come out of the tail immediately. Since we have three which do not stay in the tail, PCG32, MsWs with a good 's', and RomuTrio then I would give your method a miss.

It is worth noting that since the state vector is random and falling into the tail is an unlikely event, then staying in the tail longer than one iteration is of itself telling us that something is wrong.

I still feel that dodicat's method is quite an achievement for a rank amateur. Maybe one day I shall attempt a PRNG myself but, after all these years, I am still not ready to have a go.

PS I changed the title of this thread.
speedfixer
Posts: 606
Joined: Nov 28, 2012 1:27
Location: CA, USA moving to WA, USA
Contact:

Re: U turn on Hamming method

Post by speedfixer »

No doubt you will have some which you will not be allowed to tell.
Like the several 2AM calls getting me out of bed from generals at the Pentagon, basically asking for insurance/assurance about something that is to happen during the next hour?
Or when I was on the phone with JPL when 911 was happening?
Or sitting in front of and explaining to the FAA what was going on when planes were in the air for too long between LAX and SFO?

Yeah, I have a few of those stories.
I AM allowed to talk about the ghosts I have chased! (Hint: they weren't ghosts.)
Or why I had to explain the reason someone died in the wrong place.
Or how chasing the ghosts answered a 10 year long tech mystery affecting a very large part of southern California. (I implemeted an engineering fix then, but big business doesn't care: fault still exists.)
That may be because I have explained something so well there is no need for comment.
I have repeatedly found that trying to explain something to someone else helps MY understanding of a subject. Maybe hearing the nonsense in my head before I speak is the reason I reconsider what I know!

david
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: U turn on Hamming method

Post by deltarho[1859] »

speedfixer wrote:I have repeatedly found that trying to explain something to someone else helps MY understanding of a subject.
Very true. I have taught myself quite a bit when writing Help files. Image

Open a text editor and write a Help topic. If you are not happy with it, then tweak it, again and again. Eventually, when you are happy with it, you may find that you have a much better understanding yourself. With a better understanding, ideas start flooding in. If I was a psychologist, I may have a reason why that works. Not unlike a thought experiment, which Einstein used quite a lot. I compose off-line, as I am doing now, and do it all the time.
dafhi
Posts: 1641
Joined: Jun 04, 2005 9:51

Re: U turn on Hamming method

Post by dafhi »

rho, loved your socks story!

i love pixels, statistics, and the idea of emergent systems. i discovered a sort of DNA for fuzzy object representation which i'm REALLY excited about completing. paramount is a 2 part rng which atm is the bane of my existence xD
Post Reply