Oh no, not another PRNG

General FreeBASIC programming questions.
deltarho[1859]
Posts: 4313
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Oh no, not another PRNG

Post by deltarho[1859] »

Yep. I keep an eye on what Melissa O'Neill and Sebastian Vigna are up to.

At the end of last year Vigna attacked O'Neill's PCG generators in no uncertain terms. He effectively concluded with "don't touch them". Vigna and O'Neill have been at each other's throats for a few years now, they are both professors. O'Neill is fairly easy going and boxes Queensbury rules whereas Vigna gets a bit hot under the collar and is a bare fist fighter.

However, Vigna was attacking O'Neill's 128-bit generators and made no reference to PCG32. PCG32 gets to 1TB with PractRand whereas Vigna's generators, for the most part, fail catastrophically at 64GB. We have a bit of 'sour grapes' afoot here.

O'Neill has settled with her PCGs whereas Vigna keeps coming out with new versions. He uses a variety of names, and they may end with +, *, ++ and **. In the last few years I have lost count.

Still, like I said I keep an eye on what is happening. At his site he had a 'Shootout' going back to 2014. His latest 'Shootout' is last year. I looked at C code (64 bits) with xoroshiro128++ and xoroshiro128**. The first one gave me an average in [0,1) of 0.583 instead of 0.5. Blimey, I thought, he cannot even get that right and there was no point in getting PractRand to look at it. The second one is a different story - that gave me 0.5000008010909391. WHAT! I have never seen that accuracy before. So, at least we know we have a uniform distribution if nothing else.

Is this worth pursuing? Yes, it is because it outputs 64 bits. All my other generators output 32 bits. If we wanted a 53-bit granularity double with PCG32, for example, we have to request two 32-bit outputs which, obviously, slows down the throughput. With xoroshiro128** it defaults to a 64-bit output and has a period of 2^128.

PCG32II knocked out 10^9 53-bit doubles in 7.5968s (3 tests). xoroshiro128** knocked out 10^9 53-bit doubles in 3.2221s (3 tests). In fact, PCG32II takes 3.2933s (3 tests) to knockout 32-bit doubles so xoroshiro128** is marginally faster than PCG32II.

I reckon that a period of 2^64 is fine for most purposes. A few years ago Vigna reckoned that a period of 2^128 was better and there wasn't a real need for anything greater. OK, I will buy that. A few months later he started to publish generators with periods of 2^256. What do we do with a bloke like that?

Whilst I have been writing this PractRand has been giving xoroshiro128** a workout. It is in the early hours in the UK and my CPU is getting a bit warm, even with a liquid cooler.

<Yawn>

I have just hit 1TB with only one small anomaly at 16GB. The test took 2.6 hours. I am using a warm up of 32KB and started a new run with PractRand initially using 1KB. I stopped it at 1GB without any anomalies in the early generations.

This is top drawer stuff: PractRand to 1TB, marginally faster than PCG32II, outputs 53-bit doubles and has a period of 2^128. 53-bit doubles is great for Monte Carlo work.

Sorry, O'Neill but it looks like Vigna has come up trumps with xoroshiro128**.

I have a bit of work to do yet - I only have a 64-bit output and a float output. I aim to provide the same level of functionality as PCG32II.

I used a simple test program.

Code: Select all

'#Console On
'#include "PCG32II.bas"
'#include "xoroshiro128.bas"
 
Dim As Ulong i
Dim As Double t, x
 
t = Timer
For i = 1 to 10^9
  x = RND
Next
t = Timer - t
Print t
 
Sleep
With both includes commented we are using Mersenne Twister (MT). xoroshiro128** is about 3.7 times faster than MT.
deltarho[1859]
Posts: 4313
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

The above timings were for 32-bit.

For 64-bit PCG32II 32-bit doubles took 1.0276s and xoroshiro128** 53-bit doubles took 0.8507s - that is 20.7% faster. xoroshiro128** is about 7.7 times faster than MT. In 64-bit mode xoroshiro128** really belts along.
dafhi
Posts: 1645
Joined: Jun 04, 2005 9:51

Re: Oh no, not another PRNG

Post by dafhi »

thanks for the update

this works great for path tracing (mont carlo 3d graphics)

Code: Select all

a *= a
a xor= b
b += 1
not so much for cryptography. my math background is high school with a some exposure to 'the beyond' in college.

I suspect that MsWs warms up pretty fast. My own gen which breezes pract to 64T is a combination of 1 xorshift and 1 weyl sequence, but is horribly difficult to find good values for. One day I'd like to sit down and really get to know relative primes. Otherwise, like Vigna, 'back to the drawing' defines my existence. I suspect that inspiration will strike soon
deltarho[1859]
Posts: 4313
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

64T! I thought PractRand only went to 32TB.

Anyway, how long did a 64TB test take. That would take about 7 days on my machine.
dafhi
Posts: 1645
Joined: Jun 04, 2005 9:51

Re: Oh no, not another PRNG

Post by dafhi »

practrand can test well into the exabyte range, if I remember.

I think it took 4 days just for the 64T pass .. I pushed for 128T but I think it froze. Waited 4 extra days beyond the expected report time.

edit: my bad. I hit 32T. I pushed for 64
edit: yep, practrand testing limit. Part 2 Section D
deltarho[1859]
Posts: 4313
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

dafhi wrote:Waited 4 extra days beyond the expected report time.
You must have the patience of a saint - I don't.

32TB on my machine would take about 3.4 days. PractRand''s author, Chris Doty-Humphrey, reckons that if we get to 1TB in one piece then we probably have a winner. That is a heck of a lot of floats.

I've finished coding but want to 'kick it around' before publishing, probably tomorrow.

I thought I'd be using PCG32II for a few years - I got that wrong. Image
dafhi
Posts: 1645
Joined: Jun 04, 2005 9:51

Re: Oh no, not another PRNG

Post by dafhi »

I'd put my money on Melissa. If she were to release a new generator, it will put all others to shame.

MsWs is pretty awesome
deltarho[1859]
Posts: 4313
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

I made a mistake with my PractRand test. I've got so used to testing 32-bit outputs I used stdin32 instead of stdin64. I didn't expect much difference and there wasn't - I got to 1TB with no early generation anomalies and only two small anomalies. I have seen two small anomalies with Intel's RdRand.
dafhi wrote:I'd put my money on Melissa. If she were to release a new generator, it will put all others to shame.
The competition between Vigna and O'Neill is certainly not doing us any harm. I've just been to pcg-random.org and O'Neill has added a few more swipes at Vigna's generators in her blog section but not xoroshiro128**.

MsWs is so very different. We have been well spoilt this last few years - kept me off the streets - so to speak. Image
srvaldez
Posts: 3379
Joined: Sep 25, 2005 21:54

Re: Oh no, not another PRNG

Post by srvaldez »

Hi deltarho[1859]
I translated xoroshiro128starstar.c but am not sure how you would use it, I guess that there are special values that one would assign to s(0) and s(1)
but what are the purposes of the jump subs and why/when would you use them?

Code: Select all

s(0)=1
s(1)=2
jump()
for i as integer = 1 to 10
	print next_()
next
sleep
deltarho[1859]
Posts: 4313
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

s[0] and s[1] is the state vector. The jump is for parallel computing - I don't use it. I'll be pushing some code out shortly. I mentioned: "I've finished coding but want to 'kick it around' before publishing, probably tomorrow."
deltarho[1859]
Posts: 4313
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

Here is xoroshiro128.bas using Sebastian Vigna's xoroshiro128**. It should be noted that is was written with David Blackman. As a 'by the way' O'Neill and Blackman get on quite well. Image

Benefits:
Passes PractRand to 1TB - will test beyond that as time permits.
Outputs Doubles with 53-bit granularity.
It is marginally faster in 32-bit mode than PCG32II which outputs with 32-bit granularity and much faster in 64-bit mode.
Has a sequence repeat, period, after 2^128 randm number requests.
Is thread safe.

If we did nothing but request random numbers then with my machine, running at 3.9GHz, the sequence would repeat after 2.833 trillion x The estimated current age of the universe (13.5 billion years). Even with a 64-bit period we get 659 x The estimated current age of the universe. PowerBASIC's Rnd has a period of 32-bits and will repeat after 54 minutes.

The state vector is two Ulongints. MyRandomize("") gives a random state vector and MyRandomize( <Val(String)> ) is used for fixed seeding; MyRandomize("123456789") for example. Don't worry about the quality of fixed seeding as the generator is warmed up using 8K of 64-bit values, in any event, which should prove more than adequate. On my machine warming up takes 27 microseconds in 32-bit mode and 7 microseconds in 64-bt mode.

The function rand returns a Ulongint. There is no randse, which my other generators use, because the output is 64-bit giving 53-bit granularity and called randD.

There are two range functions: One is discrete outputting Longs and the other is continuous outputting Doubles. The discrete function uses dodicat's world famous MOD method.

We have GetSnapshot and SetSnapshot with regard the state vector. The Constructor employs MyRandomize( "" ) followed GetSnapshot. We can then repeat a sequence from the outset, after the warm up, without calling GetSnapshot. These are very useful procedures and PCG32II has had them for a while. I never got around to including them with my other generators.

At the bottom of xoroshiro128.bas is: ... [1]

Code: Select all

Dim Shared As xoroshiro128 x128
#undef Rnd
#define Rnd x128.randD
so we will use x128 in our code.

On a Copy&Paste from the forum any use of Rnd will get replaced with x128.randD when we include xoroshiro128.bas. For example, UEZ's Fireworks has had ' #include "E:\FreeBASIC\xoroshiro128\xoroshiro128.bas" ' put at the head - job done. The Fireworks experience has not been enhanced, I don't think, but I wanted to test the above idea and, of course, I did it because I could. Having said that because Fireworks uses dodicat's Regulate procedure then Fireworks may have a little more idle time. We can, of course, develop using Rnd and that will also get replaced with x128.randD when we include xoroshiro128.bas. A random range is often used in applications and Rnd's use in them will also get replaced but it is better to use one of the range functions in this package; especially the discrete version.

[1] Note that the above tip assumes a primary thread only - you will need to use another 'Dim Shared As xoroshiro128 ...' otherwise you will not have thread safety; the x128's will collide.

Following xoroshiro128.bas is a usage example and typical output.

xoroshiro128.bas

Code: Select all

' http://prng.di.unimi.it/xoroshiro128starstar.c
 
/'
static inline uint64_t rotl(const uint64_t x, int k) {
	return (x << k) | (x >> (64 - k));
}
 
static uint64_t s[2];
 
uint64_t next(void) {
	const uint64_t s0 = s[0];
	uint64_t s1 = s[1];
	const uint64_t result = rotl(s0 * 5, 7) * 9;
 
	s1 ^= s0;
	s[0] = rotl(s0, 24) ^ s1 ^ (s1 << 16); // a, b
	s[1] = rotl(s1, 37); // c
 
	return result;
}
'/
#define Make64Bit ( Cast( Ulongint, Rnd*(2^32) ) Shl 32) Or Cast( Ulongint, Rnd*(2^32) )
#define rotl(x,k) ( (x Shl k) Or (x Shr(64-k)) ) ' Note the extra parentheses
 
#macro Engine
  s0 = This.s(0)
  s1 = This.s(1)
  result = rotl(s0*5, 7)*9
  s1 Xor= s0
  This.s(0) =  rotl(s0,24) Xor s1 Xor (s1 shl 16)
  This.s(1) = rotl(s1,37)
#endmacro
 
' Generator is Sebastian Vigna's xoroshiro128**
 
Type xoroshiro128
  Public:
  Declare Constructor
  Declare Sub MyRandomize( seed as string )
  Declare Function rand() As Ulongint
  Declare Function randD() As Double
  Declare Function range overload( Byval One As Long, Byval Two As Long ) As Long
  Declare Function range overload ( byval One as double, Byval Two as double ) as double
  Declare Sub GetSnapshot()
  Declare Sub SetSnapshot()
  Private:
  As Ulongint s(0 To 1)
  As Ulongint snapshot(0 to 1)
End Type
 
Private Function Get64Bit() As Ulongint
  Dim As Ulongint dummy = Make64Bit
  Return rotl(dummy, 16)
End Function
 
Private Sub xoroshiro128.MyRandomize( seed as string )
Dim As Ulong i
  ' If seed = "" then use cryptographic random numbers else use Mersenne Twister
  If seed = "" Then Randomize , 5 Else Randomize Val(seed), 3
  This.s(0) = Get64Bit
  This.s(1) = Get64Bit
  ' Warm up generator - essential
  For i = 1 To 8192
    this.rand()
  Next i
End Sub
 
Private Function xoroshiro128.rand() As ulongint
  Dim As Ulongint s0, s1, result
  Engine
  Return result
End Function
 
Private Function xoroshiro128.randD() As Double
Dim As ulongint s0, s1, result
  Engine
  Return result/2^64
End Function
 
Private Function xoroshiro128.range( Byval One As Long, Byval Two As Long ) As Long
  Dim As ulongint s0, s1, result
  Engine
  ' Edit: Changed 'result' to Culng(result) - MOD is slow when working with 64-bit.
  Return Clng(Culng(result) Mod ( Two-One+1 )) + One ' By dodicat
End Function
 
Private Function xoroshiro128.range( Byval One As Double, Byval Two As Double ) As Double
  Dim As Ulongint s0, s1, result
  Engine
  Return result/2^64*( Two - One ) + One
End Function
 
Private Sub xoroshiro128.GetSnapshot()
  This.snapshot(0) = This.s(0)
  This.snapshot(1) = This.s(1)
End Sub
 
Private Sub xoroshiro128.SetSnapshot()
  This.s(0) = This.snapshot(0)
  This.s(1) = This.snapshot(1)
End Sub
 
Constructor xoroshiro128
  This.MyRandomize( "" )
  This.GetSnapshot
End Constructor
 
Dim Shared As xoroshiro128 x128
#undef Rnd
#define Rnd x128.randD
Usage example:

Code: Select all

'#Console On
#Include "xoroshiro128.bas"

Dim As ulong i
Dim As Double tot

Print "Random floats [0,1) (53-bit granularity)"
Print
For i = 1 To 6
  Print Rnd
Next
Print
x128.SetSnapshot
Print "SetSnapshot executed - no GetSnapshot executed by user"
Print
For i = 1 To 6
  Print Rnd
Next
Print
For i = 1 To 10^8
  tot += Rnd
Next
Print "Average [0,1)";
Print tot/10^8
Print
Print "Random Ulongints"
Print
For i = 1 To 6
  Print x128.rand
Next
Print

Print "Discrete range, 5 to 20"
Print
For i  = 1 To 6
  Print x128.range(5,20)
Next
Print

Print "Continuous range, 1. to 10."
Print
For i  = 1 To 6
  Print x128.range(1.,10.)
Next
Print

Print "Randomize - random seed - GetSnapshot executed"
Print
x128.MyRandomize( "" )
x128.GetSnapshot
For i = 1 To 6
  Print Rnd
Next
Print
x128.SetSnapshot
Print "SetSnapshot executed"
Print
For i = 1 To 6
  Print Rnd
Next
Print

Print "Randomize - fixed seed - GetSnapshot executed"
Print
x128.MyRandomize( "123456" )
x128.GetSnapshot
For i = 1 To 6
  Print Rnd
Next
Print
x128.SetSnapshot
Print "SetSnapshot executed"
Print
For i = 1 To 6
  Print Rnd
Next
Print

Print "        Done"

Sleep
Typical output:

Code: Select all

Random floats [0,1) (53-bit granularity)

 0.4049058391992926
 0.002008626534225989
 0.456443287069138
 0.04185532376693672
 0.2414107387053619
 0.7075873397474699

SetSnapshot executed - no GetSnapshot executed by user

 0.4049058391992926
 0.002008626534225989
 0.456443287069138
 0.04185532376693672
 0.2414107387053619
 0.7075873397474699

Average [0,1) 0.5000014459389598

Random Ulongints

1601524761088817538
12336677007064841681
14695406431682103560
12754510237835497810
4165814302991093027
17288837923636644936

Discrete range, 5 to 20

 9
 14
 19
 10
 11
 20

Continuous range, 1. to 10.

 8.902743816548306
 5.189807367551428
 4.003102644336549
 9.649630648651838
 7.006182164074057
 6.712164673793058

Randomize - random seed - GetSnapshot executed

 0.1005232132552173
 0.8435787285616101
 0.214314369648843
 0.2620323823819769
 0.7587725589312045
 0.389518875631241

SetSnapshot executed

 0.1005232132552173
 0.8435787285616101
 0.214314369648843
 0.2620323823819769
 0.7587725589312045
 0.389518875631241

Randomize - fixed seed - GetSnapshot executed

 0.2030120460316923
 0.7836776707789495
 0.06840766063610897
 0.1860217678276363
 0.8058282182269503
 0.1017877019893167

SetSnapshot executed

 0.2030120460316923
 0.7836776707789495
 0.06840766063610897
 0.1860217678276363
 0.8058282182269503
 0.1017877019893167

        Done
Last edited by deltarho[1859] on Jul 09, 2020 11:45, edited 1 time in total.
srvaldez
Posts: 3379
Joined: Sep 25, 2005 21:54

Re: Oh no, not another PRNG

Post by srvaldez »

thank you
deltarho[1859]
Posts: 4313
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

I have a bit more time to describe the 'jump'.

Imagine a sequence being represented as a circle. We enter the circle at some point, via a seed, work our way around until we get back to our entry point. Thereafter, we repeat numbers.

Consider 2^128 = 2^125 x 2^3 and break our circle into 8 (2^3) segments each having a period of 2^125. If we enter a segment and stay within it, we now have 8 sequences with no possibility of an overlap. We can always enter the same segment by employing an offset value. This is done with the 'jump' code.

If we now cut the 8 arcs from our circle we could create 8 smaller circles. This is what we do with PCG32II and MsWs. With PCG32II we have 2^63 small circles each with a period of 2^64. Adding all the sequences we get 2^127 numbers. We can only have 2^63 smaller circles because PCG32II's algorithm cannot accommodate even number sequences. PCG32II and MsWs are easier to use as we only supply a sequence/circle number. With the larger circle approach we have to use the 'jump' code. The first jump function, effectively, does what PCG32II and MsWs does. The second jump function, the long jump, gives us 2^96 sequences with periods of 2^32. With Vigna's 256-bit generators we could have 2^128 sequences each with 2^128 periods. My mind is now starting to boggle.

If we have a generator with an exceptionally large period there is very little chance of an overlap but those generators tend to have issues like large state vectors.
Last edited by deltarho[1859] on Jul 09, 2020 10:44, edited 4 times in total.
deltarho[1859]
Posts: 4313
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

With the usage code some of you may wonder why I use SetSnapshot with the fixed seed rather than initializing again; SetSnapshot is very much faster.
srvaldez
Posts: 3379
Joined: Sep 25, 2005 21:54

Re: Oh no, not another PRNG

Post by srvaldez »

@deltarho[1859]
regarding the jump subs, I guess that you would only call it once or occasionally and thereafter just call next?
btw, well done, there's a small penalty for overloading but your use of macros should minimize it and it's well worth it in ease of use.
Post Reply