need particular prng

General FreeBASIC programming questions.
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: need particular prng

Post by dodicat »

For fun, I have sewn up the 32 bit functions from fbmath.bi via OOP.

Code: Select all



#ifndef __FBMATH_BI__ 
#define __FBMATH_BI__

# if __FB_LANG__ = "qb"
# error not supported in qb dialect
# endif

Namespace FbmathBItest

Type genrand Extends Object
      Declare abstract Function Rnd() As Double
      Declare abstract Function rnd32() As Ulong
      Declare abstract Function range(f As Long,l As Long) As Long
End Type


'' "FAST" PRNG from 'Numerical recipes in C' chapter 7.1
''
Type RndFAST32 Extends genrand
      iseed As Ulong = 327680
      Declare virtual Function Rnd() As Double override
      Declare virtual Function rnd32() As Ulong override
      Declare virtual Function range(f As Long,l As Long) As Long override
End Type

Private Function RndFAST32.rnd32() As Ulong
      this.iseed = this.iseed * 1664525 + 1013904223
      Return this.iseed
End Function

Private Function RndFAST32.rnd() As Double
      Return this.rnd32()/Cdbl(4294967296ull)
End Function

Private Function RndFAST32.range(f As Long,l As Long) As Long
      Return (this.rnd32() Mod ((l-f)+1)) + f
End Function

''
'' Middle Square Weyl Sequence PRNG / Bernard Widynski / 20 May 2020
'' https://arxiv.org/abs/1704.00358v5
'' 
''
Type RndMSWS32 Extends genrand
      s As Ulongint = &hb5ad4eceda1ce2a9ull
      w As Ulongint
      x As Ulongint
      Declare virtual Function Rnd() As Double override
      Declare virtual Function rnd32() As Ulong override
      Declare virtual Function range(f As Long,l As Long) As Long override
End Type

Private Function RndMSWS32.rnd32() As Ulong
      With This
            .x *= .x
            .w += .s
            .x += .w
            .x = ( .x Shl 32 ) Or ( .x Shr 32 )
            Return .x
      End With
End Function

Private Function RndMSWS32.rnd() As Double
      Return this.rnd32()/Cdbl(4294967296ull)
End Function

Private Function RndMSWS32.range(f As Long,l As Long) As Long
      Return (this.rnd32() Mod ((l-f)+1)) + f
End Function


''
'' Squares: A Fast Counter Based PRNG / Bernard Widynski / 5 May 2020
'' https://arxiv.org/abs/2004.06278
''
Type RndSquares32 Extends genrand
      key As Ulongint = &hb5ad4eceda1ce2a9ull
      ctr As Ulongint
      Declare virtual Function Rnd() As Double override
      Declare virtual Function rnd32() As Ulong override
      Declare virtual Function range(f As Long,l As Long) As Long override
End Type

Private Function RndSquares32.rnd32() As Ulong
      Dim As Ulongint x = this.key * this.ctr
      Dim As Ulongint y = x
      Dim As Ulongint z = y + key
      x = x * x + y
      x = ( x Shr 32 ) Or ( x Shl 32 )
      x = x * x + z
      x = ( x Shr 32 ) Or ( x Shl 32 )
      this.ctr += 1
      Return (x * x + y) Shr 32
End Function

Private Function RndSquares32.rnd() As Double
      Return this.rnd32()/Cdbl(4294967296ull)
End Function

Private Function RndSquares32.range(f As Long,l As Long) As Long
      Return (this.rnd32() Mod ((l-f)+1)) + f
End Function


''
'' *Really* minimal PCG32 code / (c) 2014 M.E. O'Neill / pcg-random.org
'' Licensed under Apache License 2.0 (NO WARRANTY, etc. see website)
''
Type RndPCG32 Extends genrand
      state As Ulongint=1000000
      inc_ As Ulongint
      Declare virtual Function Rnd() As Double override
      Declare virtual Function rnd32() As Ulong override
      Declare virtual Function range(f As Long,l As Long) As Long override
End Type

Private Function RndPCG32.rnd32() As Ulong
      With This
            Var oldstate = this.state
            '' Advance internal state
            .state = oldstate * 6364136223846793005ULL + ( .inc_ Or 1 )
            '' Calculate output function (XSH RR), uses old state for max ILP
            Dim As Ulong xorshifted = ((oldstate Shr 18u) xor oldstate) Shr 27u
            Dim As Ulong rot = oldstate Shr 59u
            Return (xorshifted Shr rot) Or (xorshifted Shl ((-rot) And 31))
      End With
End Function

Private Function RndPCG32.rnd() As Double
      Return this.rnd32()/Cdbl(4294967296ull)
End Function

Private Function RndPCG32.range(f As Long,l As Long) As Long
      Return (this.rnd32() Mod ((l-f)+1)) + f
End Function


Function randomF(Byref method As genrand) As Double
      If method Is RndFAST32 Then
            Static As rndfast32 k
            Return k.rnd()
      End If
      If method Is RndMSWS32 Then
            Static As RndMSWS32  k
            Return k.rnd()
      End If
      If method Is RndSquares32 Then
            Static As RndSquares32  k
            Return k.rnd()
      End If 
      If method Is RndPCG32 Then
            Static As RndPCG32  k
            Return k.rnd()
      End If 
End Function

Function randomI(Byref method As genrand) As Ulong
      If method Is RndFAST32 Then
            Static As rndfast32 k
            Return k.rnd32()
      End If
      If method Is RndMSWS32 Then
            Static As RndMSWS32  k
            Return k.rnd32()
      End If
      If method Is RndSquares32 Then
            Static As RndSquares32  k
            Return k.rnd32()
      End If 
      If method Is RndPCG32 Then
            Static As RndPCG32  k
            Return k.rnd32()
      End If 
End Function

Function rangeI(Byref method As genrand,f As Long,l As Long) As Long
      If method Is RndFAST32 Then
            Static As rndfast32 k
            Return k.range(f,l)
      End If
      If method Is RndMSWS32 Then
            Static As RndMSWS32  k
            Return k.range(f,l)
      End If
      If method Is RndSquares32 Then
            Static As RndSquares32  k
            Return k.range(f,l)
      End If 
      If method Is RndPCG32 Then
            Static As RndPCG32  k
            Return k.range(f,l)
      End If 
End Function



End Namespace
#endif '' __FBMATH_BI__
'===================================================================
Using FbmathBItest
Print "Warmup"
For k As Long=1 To 4
      Print randomf(RndFAST32),randomi(RndFAST32),rangei(RndFAST32,-5,5),"RndFAST32"
      Print randomf(RndMSWS32),randomi(RndMSWS32),rangei(RndMSWS32,-5,5),"RndMSWS32"
      Print randomf(RndSquares32),randomi(RndSquares32),rangei(RndSquares32,-5,5),"RndSquares32"
      Print randomf(RndPCG32),randomi(RndPCG32),rangei(RndPCG32,-5,5),"RndPCG32"
      Print
Next k
Print "Buckets"
Print "RndFAST32","RndMSWS32","RndSquares32","RndPCG32"
Dim As Long a(1 To 5),b(1 To 5),c(1 To 5),d(1 To 5)
Dim As Long lim=1000000
For n As Long=1 To lim
      a(rangei(RndFAST32,1,5))+=1
      b(rangei(RndMSWS32,1,5))+=1
      c(rangei(RndSquares32,1,5))+=1
      d(rangei(RndPCG32,1,5))+=1
Next

For n As Long=1 To 5
      Print a(n),b(n),c(n),d(n)
Next
sleep


 
conclusion:
They all seem to be working.
I see that RndPCG32 needs a warmup.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: need particular prng

Post by deltarho[1859] »

jj2007 wrote:Why do we still need the others? PCG32II, for example?
Of the four, PCG32, Middle Square Weyl Sequence, Squares, and xoroshiro128**, the last three are fairly recent whereas the first has been with us a few years now.

Bernard Widynski's Squares (14 Apr 2020), three rounds, seemed OK but someone spotted a statistical weakness resulting in a revision (23 Nov 2020), four rounds. The first paper has been updated (8 July 2021) to include the four rounds version. The statistical weakness of the three rounds version is no longer mentioned. As far as I know, Middle Square Weyl Sequence and xoroshiro128** have not received any revisions. Melissa O'Neills PCG32 paper has received some criticism for not being sufficiently mathematical, unlike George Marsaglia's papers, but, as far as I know, the generator has not been questioned.

So, PCG32 is still a good bet - having passed the test of time - so to speak.

If a generator performs slowly, then that may impact on an application's performance, so a faster generator is needed. However, replacing the faster generator with an even faster one may be of little benefit. We may not notice any improvement in our applications by replacing PCG32 with SFC32. If there is no improvement, then there is no justification for replacing, especially since PCG32 has a proven record. There may be occasions where an ultra-fast generator is beneficial compared with a fast generator, but I reckon those occasions will be few and far between.

I should point out that my generator speed tests remove the For/Next overhead to give a 'raw' speed. The MHz results will be greater than not taking the For/Next overhead into account. I am also now of the opinion that using gcc with optimization may not be the best method for comparing generators. If algorithms are optimized differently, then we end up comparing apples with oranges. To compare algorithms, it may be better to forget about throughput and use gas. Dodicat is getting good results with his generator. Using gas SFC32 is faster, with 53-bit granularity, than dodicat in 32-bit mode but slower in 64-bit mode. SFC32 does not benefit from 64-bit mode. Of course, many will want to use gcc, but it will be anyone's guess what gcc will do with our code.

Another issue is quality of randomness. PractRand is good at separating the wheat from the chaff. LCGs fail quickly. However, PractRand has its limitations: It cannot describe the quality of the wheat. All four of the generators above pass PractRand for masses of data, but we are being told no evidence can be found to suggest non-randomness, we are not told which has the better randomness. One way of separating them is with distribution uniformity, but hardly any people give that a second thought and there are no decent tools for checking that.

In conclusion, I doubt that anyone would 'get the sack' for using PCG32, but they may get a few raised eyebrows if they used Mersenne Twister in a new application.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: need particular prng

Post by deltarho[1859] »

OK, so I looked at using gas for comparison purposes and removed the For/Next overhead. At one time, I had issues using 'Asm nop' in 64-bit mode. This is no longer true, and I'm not sure when it became OK. I am guessing the problem was with fbc only. So, we now have a bare-bones method of comparison with a 'raw' throughput and without interference from gcc optimization.

dodicat's venture into PRNG authoring passes PractRand to at least 4TB so warrants inclusion in this test.

32-bit mode (32-bit granularity)

The fastest by a long way was CryptoRndII. This is not surprising given that the Ulongs are got from RAM and the only arithmetic is converting to floats, [0,1). The downside is that it is Windows only. If we use fxm's threadpooling the speed is about the same, but we are no longer dependent upon Windows. However, users will now have to acquaint themselves with fxm's threadpooling, a bit of a task for some I should imagine.

The next fastest was SFC32 and no one gets an honourable mention. It is blistering for a PRNG.

32-bit mode (53-bit granularity)

Most of the generators use two 32-bit values, so their throughput is reduced. CryptoRndII loads the rax register and is faster than its 32-bit granularity counterpart, so gets the gold medal here as well. dodicat's method would have come in at second, but is still beaten by SFC32 using two 32-bit values.

64-bit mode (32-bit granularity). Needless to say I am now using gas64.

CryptoRndII reigns supreme here too.

SFC32 benefits from 64-bit but not as much as PCG32II/MsWs (Middle Square Weyl Sequence) where their 64-bit boost is astonishing, no idea why. We can toss a coin which to use.

64-bit mode (53-bit granularity)

dodicat wins here. xoroshiro128**/xoshiro256**, both 64-bit generators, are very close to dodicat. They did poorly in the 32-bit tests.

I am guessing, but I reckon most users will be wanting 32-bit granularity over 53-bit. For general purpose, Windows/Linux, in 32-bit mode SFC32 is a clear winner. In 64-bit mode we have a choice between PCG32II or MsWs with SFC32 getting an honourable mention. So PCG32II still has a role to play. I would favour PCG32II simply because MsWs was published on May 20, 2020 and PCG32II has been with us a few years now. In addition, although MsWs can use different sequences they are not easily determined, so I only use one. PCG32II can use 2^63 different sequences and are easily determined.

When switching to gcc with optimization, things may get a little complicated. For a graphics application requiring a hundred or so random numbers during initialization, speed will not be an issue, nor will top quality randomness be an issue either. However, if a fair number are required during the life of an application both aspects may be an issue, and we may need to look further than the conclusions above. Just looking at an algorithm and second guessing how gcc will handle it is probably not possible.

At the end of PCG32II I have:

Code: Select all

Dim Shared as PCG32 pcg
#undef Rnd
#define Rnd pcg.randse
So, when I use Rnd in my code, pcg.randse will be used. We can do something similar with my other generators. With SFC32 I have changed 'Declare Function Rnd()' to 'Declare Function RndS()'; using Rnd was problematic.

At the head of our main application we can have a bunch of includes; with full paths if needed.

'#include "PCG32II.bas"
'#include "MsWs.bas"
'#include "CryptoRndII.bas"
'#include "xoroshiro128.bas"
'#include "xoshiro256.bas"
'#include "drSquares4.bas"
'#include "SFC32.bas"
'#include "dodicat.bas"

Working our way through that lot should allow us to fine tune our selection.

You will not catch me doing that very regularly. In fact, I have not done it yet because my back has not been up against the wall, requiring the fastest generator that I can lay my hands on. It is a case of 'if needs be' and only you will know that.

For fun I looked at a plotting application which executes 14 rounds of:

Code: Select all

For i = 1 to 1280*1024
  PSet (Int(Rnd*1280), Int(Rnd*1024))
Next
using gcc -O3 in 32-bit mode. That is over 36 million Rnds.

SFC32 was the winner and CryptoRndII did not do well.

I should add that after 14 rounds it was rare to see any original pixels standing; occasionally with just one or two. Any more than that and we would have an issue with the generator.

In 64-bit mode, xoshiro256** and drSquares4 where neck and neck; and xoshiro256** is a 64-bit generator. I did not see that coming. Therein lies the problem with gcc. With real-world code, I doubt if anyone could guess which would be the fastest. The spread between the fastest and slowest was about 20%. That is significant.

I am afraid that it is a bit of a 'black art' with gcc. 9.3SJLJ will see off 5.2 most of the time but, every now and again, 5.2 will come up trumps.

PS This a good place to mention dodicat in dispatches. Taking on the likes of Melissa O'Neill and Sebastiano Vigna, both Professors of Computer Science, by a hobbyist programmer using a hobbyist BASIC, is no mean feat. Yet here we have a 64-bit generator passing PractRand to at least 4TB with a competitive throughput. However, it needs a name - no PRNG author uses their name in their PRNGs. dodicat should come back with a name for his PRNG. Image
dafhi
Posts: 1640
Joined: Jun 04, 2005 9:51

Re: need particular prng

Post by dafhi »

@del - your enthusiasm is infectious. i'm glad you have a winner. is yours crypto-safe? the crypto-safe algo i may check out more is Melgo. also, i'll have to look at fxm's thread thing. i don't know what thread safe refers to, but with my new algo i get a glimpse into streams

Code: Select all

hash.ini seed '' this is the sub that i wanted low warmup overhead
i = hash.valu(seed2) '' different value each time
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: need particular prng

Post by deltarho[1859] »

dafhi wrote:is yours crypto-safe?
No. 'Crypto-safe' means impossible to predict. PRNGs are ultimately predictable; some are easy, some are not.

If the author of Melgo is the same Ribeiro Alvo that I know, then he is a member of the PowerBASIC forum, from 2010. I try hard not to make exaggerated claims. Ribeiro, on the other hand, does not seem to try as hard as I do. That is as good as my diplomacy gets. Image Melgo is not a CPRNG.
i don't know what thread safe refers to
If we use the same location for the state vector then more than one generator will result in collisions with reduction in throughput for each generator to say nothing of the sequences being fragmented. Thread safety ensures no collisions, so throughput is not encumbered. All the generators above are thread safe except dodicat's generator, but that can be made thread safe.

Sebastiano Vigna found that seeds with a high percentage of ones or zeros had a poor quality of randomness in the early stages of generation. By 'burning' the early requests, the ones and zeros tend toward an even balance and the quality of randomness improves. The larger the state vector, the longer the warm-up is needed. For 32 bits, for example, a population count of 16 ones, a Hamming weight of 16, is the ideal state for a seed. I used to use random seeds and burn them an arbitrary number of times. What I now do is poll the computer's time stamp until an ideal Hamming weight is returned. The time stamp is incrementing at a fair rip, so it should be difficult, if not impossible, to predict the value returned. This will take a variable time but, in practice, I have found that only a handful of micro-seconds is required. To make prediction even more difficult because the time stamp is sequential, the seed's bytes are subject to a Knuth shuffle. Imagine a normal distribution. If we initially land in a tail, we have to wait with a warm-up to see us move toward the centre of the bell shape. With the Hamming weight/Knuth shuffle method we land, almost immediately, in the centre of the bell. Shuffling has no effect on the Hamming weight. We don't have to experiment with generators to determine the best warm-up period for each. The Hamming method puts us in the best place for any PRNG. Using the time stamp is not my idea; Melissa O'Neill mentions it in one of her blogs. Forcing a balanced Hamming weight and shuffling is, as far as I know, down to me. Of course, it would not have occurred to me without Vigna's research. See HammingSeed() and ShuffleUlong() in SFC32.bas above. The Union in ShuffleUlong() is dodicat's idea.
dafhi
Posts: 1640
Joined: Jun 04, 2005 9:51

Re: need particular prng

Post by dafhi »

deltarho[1859] wrote:No. 'Crypto-safe' means impossible to predict. PRNGs are ultimately predictable; some are easy, some are not.
my bad. i knew better. I swear I saw one that looked like dodicat's years ago when i discovered melgo.

soon after i saw MsWs a = a * a
I began my tossed salad technique and eventually found the nearly flat generator

Code: Select all

a *= a
a xor= b
b += 1 '' 1 5 9 etc..  (3 7 11 miss half values)
that one would be awesome for thread-safeness

fast forward to today and different algorithm, I found that I can reduce the warm up with some seed-based multiplies and a gen cycle
dafhi
Posts: 1640
Joined: Jun 04, 2005 9:51

Re: need particular prng

Post by dafhi »

looking for the generator that i remember looking like dodicat's,
i found something else https://www.romu-random.org/
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: need particular prng

Post by jj2007 »

dafhi wrote:i found something else https://www.romu-random.org/
It looks a bit fishy, like publicity for pills that make you lose weight. Look at the images under "Top Quality"...
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: need particular prng

Post by deltarho[1859] »

dafhi wrote:i remember looking like dodicat's
RomuTrio looks like some of Sebestiano Vigna's generators, which feature ROTL. ROTL is expensive so we need to use it sparingly; one or two instances at most. It may not look like it but dodicat's method uses three which slows it down a bit. dodicat's saving grace is in not using multiplication, just a few additions and a XOR.

Here is dodicat's method with rotl defined:

Code: Select all

Function rndU ByRef As Ulongint
#define rotl(x,k) ( (x Shl k) Or (x Shr(64-k)) )
  e = a - rotl(b,7)
  a = b xor rotl(c,13)
  b = c + rotl(d,37)
  c = d + e
  d = e + a
  Return d
End Function 
RomuTrio also looks like Melissa O'Neill's PCG32 where she uses Donald Knuth's 64-bit LCG multiplier and permutes it.

RomuTrio is then a mix of Vigna and O'Neill.

What has me scratching my head is that 15241094284759029579 is not a prime number; it factorizes as 5080364761586343193 * 3.

Anyway, I will kick it around a bit.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: need particular prng

Post by deltarho[1859] »

The first test that I ever do is the average of 10^8 [0,1). I have just got 0.5000197498556336, so its distribution uniformity appears OK.

Speedwise PCG32II is about 2.6 times faster. SFC32 leaves it standing.

RomuTrio is no slouch, but it is not breaking news either. The Romu website talks about why it is fast but neglects to compare speed with other generators, as Bernard Widynski does in his papers.

I'll do more tests.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: need particular prng

Post by deltarho[1859] »

Just out of interest this is what we get with a rotl(x,k)

Code: Select all

| <-     x     -> |
| k |     x-k     |
|     x-k     | k |
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: need particular prng

Post by deltarho[1859] »

Instead of 15241094284759029579 it seems to me that we can use Donald Knuth's prime of 6364136223846793005.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: need particular prng

Post by deltarho[1859] »

I need to check what I am doing, but I am getting a PractRand catastrophic failure at 2MB. Image
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: need particular prng

Post by deltarho[1859] »

My mistake. This is a 64-bit generator - I was piping Ulong instead of Ulongint. I have just gone past 256GB.

Speedwise I was not comparing like with like being a 64-bit generator. In 64-bit mode it looks like it is about 17% faster than dodicat the winner in the '64-bit mode (53-bit granularity)' test above. I'll do more work on the speed tests.
dafhi
Posts: 1640
Joined: Jun 04, 2005 9:51

Re: need particular prng

Post by dafhi »

i like the fact that you can consistently do tests. my interests shift frequently based upon how much debugging i find myself doing xD

jj2007 i appreciate what you say. i wouldn't necessarily know at this point

still haven't found mysterious generator; it was VERY similar to dodicat's. dodicat may be super-psychic is all i can tell. it had at least 4 internal state variables with a lot of a=e, e=f and at least 2 rotates

i am liking RoMu. haven't tested it much but the idea that period isn't (greatly, or at all? i forget) affected by seed is super good
Post Reply