I used my PCG32II and SFC32 code as templates, allowing me to write the following in short order.
At Mark Overton's Romu website in his blog, Apr 8, 2020, he mentions that seeding via SplitMix64 or SplitMix32 is now recommended. I have gone with my Hamming weight/Knuth shuffle' approach to ensure good quality seeds without the need for warming up.
There is no provision for fixed seeding - the state vector, three Ulongints, are all random. However, as with my latest generators, a GetSnapshot procedure and SetSnapshot procedure are provided. GetSnapshot is invoked during the Constructor.
The Initialize procedure is also invoked in the Constructor, and there is no need for us to use it directly.
In regard to initialize, I played around with the delay in reading the timestamp twice and settled on 'mov ecx, 1000'. I am reading the lower 32 bits twice because the upper 32 bits don't change much. With
Code: Select all
For i As Ulong = 1 to 3
y = HammingSeed64 : ShuffleUlongint(y)
Print y;" ";Bin( y, 64 )
Next
Code: Select all
8011899340525116528 0110111100101111111111000000000001011000001011111001110001110000
14976171395718786844 1100111111010110000011001000100110001001001011110010111100011100
11631811118190534959 1010000101101100011111101001010110000010001011111010000100101111
I am not using Overton's multiplier of 15241094284759029579 but 6364136223846793005 which is what Donald Knuth uses with his 64-bit LCG and Melissa O'Neill's PCG32. I got to 2TB with PractRand without any anomalies using that. It may not matter.
In 32-bit mode RomuTrio is nothing special. However, in 64-bit mode it blossoms, especially with gcc. Here are some 64-bit generator performance figures using gcc 9.3SJLJ/-O3 and gcc 5.2/-O3. Nine tests were done and the median noted. The For/Next loop overhead was taken into account to give a 'raw' throughput.
Code: Select all
9.3 5.2
RomuTrio 323MHz 220MHz
xoshiro256** 279MHz 201MHz
xoroshiro128** 258MHz 205MHz
dodicat 236MHz 198MHz
Remember that RomuTrio outputs 64-bit so the [0,1) double has a granularity of 53-bit making it ideal, along with its speed, for Monte Carlo work. Talking of Monte Carlo work top quality distribution uniformity is necessary and John Walker's Ent.exe is good for checking that, so I dumped 100MB into a file and got this:
Code: Select all
Entropy = 7.999998 bits per byte.
Optimum compression would reduce the size
of this 104857600 byte file by 0 percent.
Chi square distribution for 104857600 samples is 255.88, and randomly
would exceed this value 47.28 percent of the times.
Arithmetic mean value of data bytes is 127.5201 (127.5 = random).
Monte Carlo value for Pi is 3.140844617 (error 0.02 percent).
Serial correlation coefficient is 0.000140 (totally uncorrelated = 0.0).
Following the source code is a Usage example.
RomuTrio.bas ( Latest version at the end of this page )
Code: Select all
#define rotl(x,k) ( (x Shl k) Or (x Shr(64-k)) )
#macro engine
Dim As Ulongint xp = StateVector(0), yp = StateVector(1), zp = StateVector(2)
StateVector(0) = 6364136223846793005ull * zp
StateVector(1) = yp - xp : StateVector(1) = rotl(StateVector(1),12)
StateVector(2) = zp - yp: StateVector(2) = rotl(StateVector(2),44)
#endmacro
Type RomuTrio
Public:
Declare Constructor
Declare Sub Initialize()
Declare Function rand() As Ulongint
Declare Function rnd01() As Double
Declare Function range overload( Byval One As Long, Byval Two As Long ) As Long
Declare function range overload ( byval One as double, Byval Two as double ) as double
Declare Sub GetSnapshot()
Declare Sub SetSnapShot()
Private:
As Ulongint StateVector(0 to 2)
As Ulongint snapStateVector(0 to 2)
End Type
Function HammingSeed64() As Ulongint
Dim As Ulong seed0, seed1, numBits = 0
Dim As Ulongint Seed, CopySeed
While numBits <> 32
Asm
rdtsc
mov Dword Ptr [seed0], eax
mov ecx, 1000 ' Let time stamp spin a little
0:
dec ecx
jnz 0b
rdtsc
mov Dword Ptr [seed1], eax
End Asm
Cast(Ulong Ptr,@Seed)[0]=seed0
Cast(Ulong Ptr,@Seed)[1]=seed1
CopySeed = Seed : numBits = 0
While CopySeed <> 0
CopySeed = CopySeed And CopySeed - 1
numBits = numBits + 1
Wend
Wend
Return Seed
End Function
Sub ShuffleUlongint( ByRef x As Ulongint )
#define range(f,l) Int(Rnd*(((l)+1)-(f))+(f)) ' By dodicat
Union localUDT
As Ulongint ul
As Byte b(7)
End Union
Dim As localUDT l
l.ul = x
For i As Ulong = 0 to 6
Swap l.b(i), l.b(range(i, 7))
Next
x = l.ul ' <- without this x will not change
End Sub
Private Function RomuTrio.rand() As Ulongint
Engine
Return xp
End Function
Private Function RomuTrio.rnd01() As double
Engine
Return xp/2^64
End Function
Private Function RomuTrio.range( Byval One As Long, Byval Two As Long ) As Long
Engine
Return Clng( xp Mod ( Two-One+1 ) ) + One ' By dodicat
End Function
Private Function RomuTrio.range( Byval One As Double, Byval Two As Double ) As Double
Engine
Return xp/2^64*( Two - One ) + One
End Function
Sub RomuTrio.GetSnapshot( )
For i As Ulong = 0 to 2
snapStateVector(i) = StateVector(i)
Next
End Sub
Private Sub RomuTrio.SetSnapshot( )
For i As Ulong = 0 To 2
StateVector(i)= snapStateVector(i)
Next
End Sub
Sub RomuTrio.Initialize( )
StateVector(0) = HammingSeed64 : ShuffleUlongint( StateVector(0) )
StateVector(1) = HammingSeed64 : ShuffleUlongint( StateVector(1) )
StateVector(2) = HammingSeed64 : ShuffleUlongint( StateVector(2) )
End Sub
Constructor RomuTrio
Initialize
GetSnapshot
End Constructor
Code: Select all
#Include Once "RomuTrio.bas"
Dim As RomuTrio RT
Dim As Ulong i
Dim As Double tot, k
Print "6 Random Ulongs" : Print
For i = 1 to 6
Print RT.Rand
Next
Print
Print "6 Random floats ( 53-bit granularity )" : Print
For i = 1 To 6
Print RT.Rnd01
Next
Print
Print "Continue random floats" : Print
For i = 1 to 4
Print RT.Rnd01
Next
Print
RT.GetSnapshot
Print "Random second stage (GetSnapshot before)" : Print
For i = 1 to 6
Print RT.Rnd01
Next
Print
RT.SetSnapshot
Print "Repeat second stage (SetSnapshot before)" : Print
For i = 1 to 6
Print RT.Rnd01
Next
Print
Print "Accuracy over 10^8 bounded integers ( 0, 255 ) Theoretical result = 127.5" : Print
For i = 1 to 10^8
k += RT.range( 0, 255 )
Next
Print k/10^8
Print
Print "Accuracy over 10^8 continuous floats ( 0, 255 ) Theoretical result = 127.5" : Print
For i = 1 to 10^8
tot += RT.range( 0., 255. )
Next
Print tot/10^8
k = 0 : tot = 0
Print
Print "Accuracy over 10^8 bounded integers ( -10, 10 ) Theoretical result = 0" : Print
For i = 1 to 10^8
k += RT.range( -10, 10 )
Next
Print k/10^8
Print
Print "Accuracy over 10^8 continuous floats ( -10, 10 ) Theoretical result = 0" : Print
For i = 1 to 10^8
tot += RT.range( -10., 10. )
Next
Print tot/10^8
Sleep