RomuTrio package

General FreeBASIC programming questions.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

RomuTrio package

Post by deltarho[1859] »

Following on from the work done in need particular prng where dafhi introduced us to Romu here is some working code.

I used my PCG32II and SFC32 code as templates, allowing me to write the following in short order.

At Mark Overton's Romu website in his blog, Apr 8, 2020, he mentions that seeding via SplitMix64 or SplitMix32 is now recommended. I have gone with my Hamming weight/Knuth shuffle' approach to ensure good quality seeds without the need for warming up.

There is no provision for fixed seeding - the state vector, three Ulongints, are all random. However, as with my latest generators, a GetSnapshot procedure and SetSnapshot procedure are provided. GetSnapshot is invoked during the Constructor.

The Initialize procedure is also invoked in the Constructor, and there is no need for us to use it directly.

In regard to initialize, I played around with the delay in reading the timestamp twice and settled on 'mov ecx, 1000'. I am reading the lower 32 bits twice because the upper 32 bits don't change much. With

Code: Select all

For i As Ulong = 1 to 3
  y = HammingSeed64 : ShuffleUlongint(y)
  Print y;" ";Bin( y, 64 )
Next 
I got this:

Code: Select all

8011899340525116528 0110111100101111111111000000000001011000001011111001110001110000
14976171395718786844 1100111111010110000011001000100110001001001011110010111100011100
11631811118190534959 1010000101101100011111101001010110000010001011111010000100101111
Each Bin has 32 '1's, the target Hamming weight. I figured that would do.

I am not using Overton's multiplier of 15241094284759029579 but 6364136223846793005 which is what Donald Knuth uses with his 64-bit LCG and Melissa O'Neill's PCG32. I got to 2TB with PractRand without any anomalies using that. It may not matter.

In 32-bit mode RomuTrio is nothing special. However, in 64-bit mode it blossoms, especially with gcc. Here are some 64-bit generator performance figures using gcc 9.3SJLJ/-O3 and gcc 5.2/-O3. Nine tests were done and the median noted. The For/Next loop overhead was taken into account to give a 'raw' throughput.

Code: Select all

                9.3    5.2
RomuTrio       323MHz 220MHz
xoshiro256**   279MHz 201MHz
xoroshiro128** 258MHz 205MHz
dodicat        236MHz 198MHz
The gcc 9.3 figures are remarkable compared with gcc 5.2. We should not expect this to be the case in general. It just so happens that gcc 9.3 is doing a remarkable job with the above tasks. With 5.2 there isn't much in it and dodicat's algorithm is giving the Vigna generators a run for their money. 9.3 is struggling to optimize dodicat's algorithm compared with the other three. There is probably a good reason for that, but it is beyond my pay grade to fathom that one out.

Remember that RomuTrio outputs 64-bit so the [0,1) double has a granularity of 53-bit making it ideal, along with its speed, for Monte Carlo work. Talking of Monte Carlo work top quality distribution uniformity is necessary and John Walker's Ent.exe is good for checking that, so I dumped 100MB into a file and got this:

Code: Select all

Entropy = 7.999998 bits per byte.
 
Optimum compression would reduce the size
of this 104857600 byte file by 0 percent.
 
Chi square distribution for 104857600 samples is 255.88, and randomly
would exceed this value 47.28 percent of the times.
 
Arithmetic mean value of data bytes is 127.5201 (127.5 = random).
Monte Carlo value for Pi is 3.140844617 (error 0.02 percent).
Serial correlation coefficient is 0.000140 (totally uncorrelated = 0.0).
The Chi square value is one of the best that I have ever seen. The arithmetic mean and Monte Carlo value could be better. The Serial correlation coefficient is fine. No grounds for concern there.

Following the source code is a Usage example.

RomuTrio.bas ( Latest version at the end of this page )

Code: Select all

#define rotl(x,k) ( (x Shl k) Or (x Shr(64-k)) )

#macro engine
  Dim As Ulongint xp = StateVector(0), yp = StateVector(1), zp = StateVector(2)
  StateVector(0) = 6364136223846793005ull * zp
  StateVector(1) = yp - xp : StateVector(1) = rotl(StateVector(1),12)
  StateVector(2) = zp - yp: StateVector(2) = rotl(StateVector(2),44)
#endmacro

Type RomuTrio
  Public:
  Declare Constructor
  Declare Sub Initialize()
  Declare Function rand() As Ulongint
  Declare Function rnd01() As Double
  Declare Function range overload( Byval One As Long, Byval Two As Long ) As Long
  Declare function range overload ( byval One as double, Byval Two as double ) as double
  Declare Sub GetSnapshot()
  Declare Sub SetSnapShot()
  Private:
  As Ulongint StateVector(0 to 2)
  As Ulongint snapStateVector(0 to 2)
End Type

Function HammingSeed64() As Ulongint
Dim As Ulong seed0, seed1, numBits = 0
Dim As Ulongint Seed, CopySeed
While numBits <> 32
  Asm
    rdtsc
    mov Dword Ptr [seed0], eax
    mov ecx, 1000 ' Let time stamp spin a little
    0:
    dec ecx
    jnz 0b
    rdtsc
    mov Dword Ptr [seed1], eax
  End Asm
  Cast(Ulong Ptr,@Seed)[0]=seed0
  Cast(Ulong Ptr,@Seed)[1]=seed1
  CopySeed = Seed : numBits = 0
  While CopySeed <> 0
    CopySeed = CopySeed And CopySeed - 1
    numBits = numBits + 1
  Wend
Wend
Return Seed
End Function

Sub ShuffleUlongint( ByRef x As Ulongint )
#define range(f,l) Int(Rnd*(((l)+1)-(f))+(f)) ' By dodicat
Union localUDT
  As Ulongint ul
  As Byte b(7)
End Union
Dim As localUDT l
l.ul = x
For i As Ulong = 0 to 6
  Swap l.b(i), l.b(range(i, 7))
Next
x = l.ul ' <- without this x will not change
End Sub

Private Function RomuTrio.rand() As Ulongint
  Engine
  Return xp
End Function

Private Function RomuTrio.rnd01() As double
  Engine
  Return xp/2^64
End Function

Private Function RomuTrio.range( Byval One As Long, Byval Two As Long ) As Long 
  Engine
  Return Clng( xp Mod ( Two-One+1 ) ) + One ' By dodicat
End Function

Private Function RomuTrio.range( Byval One As Double, Byval Two As Double ) As Double
  Engine
  Return xp/2^64*( Two - One ) + One
End Function

Sub RomuTrio.GetSnapshot( )
  For i As Ulong = 0 to 2
    snapStateVector(i) = StateVector(i)
  Next
End Sub
 
Private Sub RomuTrio.SetSnapshot( )
  For i As Ulong = 0 To 2
    StateVector(i)= snapStateVector(i)
  Next
End Sub

Sub RomuTrio.Initialize( )
  StateVector(0) = HammingSeed64 : ShuffleUlongint( StateVector(0) )
  StateVector(1) = HammingSeed64 : ShuffleUlongint( StateVector(1) )
  StateVector(2) = HammingSeed64 : ShuffleUlongint( StateVector(2) )
End Sub
 
Constructor RomuTrio
  Initialize
  GetSnapshot
End Constructor
Usage example

Code: Select all

#Include Once "RomuTrio.bas"
 
Dim As RomuTrio RT
 
Dim As Ulong i
Dim As Double tot, k

Print "6 Random Ulongs" : Print
For i = 1 to 6
  Print RT.Rand
Next
Print
Print "6 Random floats ( 53-bit granularity )" : Print
For i = 1 To 6
  Print RT.Rnd01
Next
Print
Print "Continue random floats" : Print
For i = 1 to 4
  Print RT.Rnd01
Next
Print
RT.GetSnapshot
Print "Random second stage (GetSnapshot before)" : Print
For i = 1 to 6
  Print RT.Rnd01
Next
Print
RT.SetSnapshot
Print "Repeat second stage (SetSnapshot before)" : Print
For i = 1 to 6
  Print RT.Rnd01
Next
Print
Print "Accuracy over 10^8 bounded integers ( 0, 255 ) Theoretical result = 127.5" : Print
For i = 1 to 10^8
  k += RT.range( 0, 255 )
Next
Print k/10^8
Print
Print "Accuracy over 10^8 continuous floats ( 0, 255 ) Theoretical result = 127.5" : Print
For i = 1 to 10^8
  tot += RT.range( 0., 255. )
Next
Print tot/10^8
k = 0 : tot = 0
Print
Print "Accuracy over 10^8 bounded integers ( -10, 10 ) Theoretical result = 0" : Print
For i = 1 to 10^8
  k += RT.range( -10, 10 )
Next
Print k/10^8
Print
Print "Accuracy over 10^8 continuous floats ( -10, 10 ) Theoretical result = 0" : Print
For i = 1 to 10^8
  tot += RT.range( -10., 10. )
Next
Print tot/10^8

Sleep
Last edited by deltarho[1859] on Aug 05, 2021 5:48, edited 2 times in total.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: RomuTrio package

Post by jj2007 »

Works like a charm, with Gas, Gcc and FB64!
deltarho[1859] wrote:Conditions to post here (in this thread) are: See list. Image
Very fanny ;-)
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: RomuTrio package

Post by dodicat »

Forgot a few regionals.
Nowt - North England.
Love-Wimbledon
Feck All -Irish (Father Ted)
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: RomuTrio package

Post by deltarho[1859] »

nowt is in both lists.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: RomuTrio package

Post by deltarho[1859] »

A moderator has objected to my 'Conditions to post here (in this thread) are:'. There has been no objection to 'I forgot to mention ...'. Both are inflammatory, so I have removed them.

Added: I need to learn how to 'turn the other cheek' but being dragged up in Middlesbrough, England does not lend itself to such behaviour and teaching old dogs new tricks does not come easy. However, for the community, I can see the merit. It does, of course, rely on the moderators taking immediate and appropriate action against those who have little or no regard for the Forum Policy. A word in the offender's shell-like or a slap on the wrists is not good enough.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: RomuTrio package

Post by deltarho[1859] »

Reviewing seeding.

The reason why we pass rdtsc twice, with a delay, is because the uppermost 32-bit of the full 64-bit timestamp will be pretty much empty and only gets incremented about once every 1.2 seconds on my machine. However, with the delay we don't get another four bytes of information, but only two bytes with the other two bytes duplicated, so six bytes of information in total. Because of the shuffling, it doesn't really matter whether we have four bytes of information and four duplicated or six bytes of information with two duplicated. All we want is 64-bits with a Hamming weight of 32.

So, we use one pass of rdtsc and put the 32-bit into both the lower and upper 32-bit of a Ulongint.

When it comes to the next Ulongint we can copy the first one and shuffle it. With the next Ulongint we can copy the second one and shuffle that.

So, we get three distinct Ulongint with only one pass of rdtsc.

This is a typical initial state vector:

Code: Select all

853589853589DEDE
3535DEDE85898589
DE853589358589DE
That will do. Image

We could just use the first one three times, but having three sits better with me. On what grounds do I say that? Erm, absolutely none - it just sits better. Image

Ten million passes of Initialize were executed, and the longest time to populate the state vector was 235 micro-seconds. Even if it took four times as long, on my machine, which, I doubt, would happen, we are talking less than one millisecond to populate the state vector with each element having a Hamming weight of 32.

ShuffleUlongint remains the same, but HammingSeed64 and Initialize need replacing with:

Code: Select all

Function HammingSeed64() As Ulongint
Dim As Ulong tscSeed, numBits = 0
Dim As Ulongint Seed, CopySeed
  While numBits <> 32
    Asm
      rdtsc
      mov Dword Ptr [tscSeed], eax
    End Asm
    Seed = (Cast( Ulongint, tscSeed ) Shl 32) Or Cast( Ulongint, tscSeed)
    CopySeed = Seed : numBits = 0
    While CopySeed <> 0
      CopySeed = CopySeed And CopySeed - 1
      numBits += 1
    Wend
  Wend
  Return Seed
End Function
 
Sub RomuTrio.Initialize( )
  Randomize ' for ShuffleUlongint
  StateVector(0) = HammingSeed64 : ShuffleUlongint( StateVector(0) )
  StateVector(1) = StateVector(0) : ShuffleUlongint( StateVector(1) )
  StateVector(2) = StateVector(1) : ShuffleUlongint( StateVector(2) )
End Sub
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: RomuTrio package

Post by deltarho[1859] »

With 32-bit PRNGs if we wanted 53-bit granularity we have to request two random Ulongs. With 64-bit PRNGs we only have to request one random Ulongint.

What I am looking at here is the opposite. With a 64-bit output, it is converted into two Ulongs which are then returned as Doubles with 32-bit granularity. I had no idea what to expect.

A Sub is employed as GetDuo( ByRef As Double, ByRef As Double ). I have seen a few graphics programs using pairs of random [0,1).

With my 'raw throughput' benchmark test considering the For/Next overhead, the Duo is faster than PCG32II but not as fast as SFC32. However, with my 'real world' plotting program it is faster than SFC32.

I am not going to mention MHz because you would not believe me. With 64-bit mode we are now well into GHz territory. With the latest generators it is like requesting constant variables rather than random numbers.

Here it is in action with a Monte Carlo estimating PI using gcc 9.3SJLJ/-O3, 64-bit

Code: Select all

#include Once "RomuTrio.bas"
 
Dim As Ulong n = 10^9, lCount
Dim As Double t, x, y
Dim as RomuTrio rt
 
t = Timer
For i As Ulong = 1 to n
  rt.GetDuo( x, y )
  If Sqr( x*x + y*y) <= 1 Then lCount += 1
Next
t = Timer - t
Print 4*lCount/n;" ";t
 
Sleep
which gave, on one run, 3.1415665 3.605163645101129.

That is five and a half significant figures in 3.6 seconds on a loop with one billion passes.

Add to RomuTrio.bas 'Declare Sub GetDuo( ByRef as Double, ByRef as Double )' in the RomuTrio Type and

Code: Select all

Private Sub RomuTrio.GetDuo( ByRef One as Double, ByRef Two as Double)
  Engine
  One = Cast(Ulong Ptr,@xp)[0]/2^32 ' MrSwiss
  Two = Cast(Ulong Ptr,@xp)[1]/2^32
End Sub
dafhi
Posts: 1641
Joined: Jun 04, 2005 9:51

Re: RomuTrio package

Post by dafhi »

it's weird; i noticed my monte carlo results seemed high

this fixed it

Code: Select all

  var c = 3e6
  const   iMax = 1 / ( 2 ^ len8 - .64 )
  for i = 1 to c
    var x = rng * iMax
i jump the gun frequently but here, i'm calling it. a few more digits may help. also there's another technique to improving mc accuracy: stratification, which i haven't studied
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: RomuTrio package

Post by deltarho[1859] »

@dafhi

Image
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: RomuTrio package

Post by deltarho[1859] »

Just found this, by dodicat, on my system. I cannot remember copying it.

Code: Select all

#include "crt.bi"
 
screen 19
dim as double t=timer
for n as long=1 to 10000000
   ' if rand > 32767 then beep 'short range
    pset (rand mod 800,rand mod 600),15
next
print timer-t;"   Press a key"
sleep
 
 t=timer
for n as long=1 to 10000000
    pset (rnd*800,rnd*600),0
next
print timer-t,"   Done"
sleep
Anyway I ran it, gcc 9.3SJLJ/-O3 64-bit, and got:

Code: Select all

0.735... Press a key
 
0.538... Done
Rewriting to use GetDuo

Code: Select all

#include "RomuTrio.bas"
Dim As RomuTrio rt
 
screen 19
dim as double t=timer
for n as long=1 to 10000000
  pset (rt.rand mod 800,rt.rand mod 600),15
next
print timer-t;"   Press a key"
sleep
 
Dim As Double x, y
 t=timer
for n as long=1 to 10000000
  rt.GetDuo( x, y )
  pset (x*800,y*600),0
next
print timer-t,"   Done"
sleep
I get

Code: Select all

0.496... Press a key
0.452... Done
The interesting thing about random numbers is that the more complex/slow a statement is, the less impact a random number generator has on the statement's performance. Using a generator which is very much faster may only slightly increase a statement's performance. Having said that, RomuTrio's rand is 33% faster with [0,1) being 16% faster. I wrote GetDuo for fun, but it looks powerful.

Using 'for n as long=1 to 800*600*11' RomuTrio had a handful more pixels left to 'sit on' than crt.bi. crt.bi may be using a LCG generator which have an excellent distribution uniformity but a poor quality randomness, that is they fail PractRand.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: RomuTrio package

Post by deltarho[1859] »

The Constructor employs GetSnapshot. At any time, we can SetSnapshot to return the initial state. However, if we employ GetSnapshot then the initial state will be overwritten by the current state. I had a situation where I did not want to lose the initial state.

So, we now have GetInitialState in the Private section of 'Type RomuTrio' and that will populate the InitialState vector. We will get an illegal access if we go anywhere near GetInitialState - it is for the Constructor. We do not have a corresponding SetInitialState. What we do have is a Public RestoreInitialState.

We can use the GetSnapshot/SetSnapshot combination but at anytime we can return to the application's 'Start Up' position. The InitialState vector, once populated, is carved in stone and will persist for the application session.

Of course, if we use SetSnapshot and we have not used GetSnapshot then we will load an empty vector because the Constructor used to use GetSnapshot. So, a short circuit check is made of the snapStateVector and if all elements are found empty the InitialState vector will get loaded otherwise snapStateVector will be loaded. In this sense then, it is 'business as usual'.

Here is some example code using the various snapshot procedures.

Code: Select all

#include "RomuTrio.bas"
Dim As RomuTrio rt
Print "Start Up" : Print
For i As Ulong = 1 to 6
  Print rt.rnd01
Next
Print
Print "GetSnapshot invoked" : Print
rt.GetSnapshot
For i As Ulong = 1 to 6
  Print rt.rnd01
Next
Print
Print "SetSnapshot invoked" : Print
rt.SetSnapshot
For i As Ulong = 1 to 6
  Print rt.rnd01
Next
Print
print "RestoreInitialState invoked - basically a generator reset to 'Start Up'" : print
rt.RestoreInitialState
For i As Ulong = 1 to 6
  Print rt.rnd01
Next

Sleep
Here is a typical output:

Code: Select all

Start Up
 
 0.9107384032843766
 0.2793494384781189
 0.1549023524711499
 0.5955807374375624
 0.6831861970776139
 0.1662708707981613
 
GetSnapshot invoked
 
 0.4754528975504551
 0.2244936143302844
 0.8109375036604107
 0.4563857991705063
 0.4440687828308343
 0.5536293010789357
 
SetSnapshot invoked
 
 0.4754528975504551
 0.2244936143302844
 0.8109375036604107
 0.4563857991705063
 0.4440687828308343
 0.5536293010789357
 
RestoreInitialState invoked - basically a generator reset to 'Start Up'
 
 0.9107384032843766
 0.2793494384781189
 0.1549023524711499
 0.5955807374375624
 0.6831861970776139
 0.1662708707981613
Considering the recent additions, here is the latest version of RomuTrio.bas.

RomuTrio.bas

Code: Select all

#define rotl(x,k) ( (x Shl k) Or (x Shr(64-k)) )
#macro engine
  Dim As Ulongint xp = StateVector(0), yp = StateVector(1), zp = StateVector(2)
  StateVector(0) = 6364136223846793005ull * zp
  StateVector(1) = yp - xp : StateVector(1) = rotl(StateVector(1),12)
  StateVector(2) = zp - yp: StateVector(2) = rotl(StateVector(2),44)
#endmacro
 
Type RomuTrio
  Public:
  declare Constructor
  Declare Sub Initialize()
  Declare Function rand() As Ulongint
	Declare Function rnd01() As Double
  Declare Sub GetDuo( byref as Double, byref as Double )
  Declare Function range overload( Byval As Long, Byval As Long ) As Long
  declare function range overload ( byval as double, Byval as double ) as double
  Declare Sub GetSnapshot()
  Declare Sub SetSnapShot()
  Declare Sub RestoreInitialState()
  Private:
  Declare Sub GetInitialState()
  As Ulongint StateVector(0 to 2)
  As Ulongint snapStateVector(0 to 2)
  As Ulongint InitialState(0 To 2)
End Type
 
Function HammingSeed64() As Ulongint
Dim As Ulong tscSeed, numBits = 0
Dim As Ulongint Seed, CopySeed
While numBits <> 32
  Asm
    rdtsc
    mov Dword Ptr [tscSeed], eax
  End Asm
  Seed = (Cast( Ulongint, tscSeed ) Shl 32) Or Cast( Ulongint, tscSeed)
  CopySeed = Seed : numBits = 0
  While CopySeed <> 0
    CopySeed = CopySeed And CopySeed - 1
    numBits += 1
  Wend
Wend
Return Seed
End Function
 
Sub ShuffleUlongint( ByRef x As Ulongint )
#define range(f,l) Int(Rnd*(((l)+1)-(f))+(f)) ' By dodicat
Union localUDT
  As Ulongint ul
  As Byte b(7)
End Union
Dim As localUDT l
l.ul = x
For i As Ulong = 0 to 6
  Swap l.b(i), l.b(range(i, 7))
Next
x = l.ul ' <- without this x will not change
End Sub
 
Private Function RomuTrio.rand() As Ulongint
  Engine
  Return xp
End Function
 
Private Function RomuTrio.rnd01() As double
  Engine
  Return xp/2^64
end function
 
Private sub RomuTrio.GetDuo( ByRef One As Double, ByRef Two as Double )
  Engine
  One = cast(ulong ptr,@xp)[0]/2^32 ' MrSwiss
  Two = cast(ulong ptr,@xp)[1]/2^32
end sub
 
Private Function RomuTrio.range( Byval One As Long, Byval Two As Long ) As Long
  Engine
  return clng( xp Mod ( Two-One+1 ) ) + One ' By dodicat
End Function
 
Private Function RomuTrio.range( Byval One As Double, Byval Two As Double ) As Double
  Engine
  Return xp/2^64*( Two - One ) + One
End Function
 
Sub RomuTrio.GetSnapshot()
  For i As Ulong = 0 to 2
    snapStateVector(i) = StateVector(i)
  Next
End Sub
 
Private Sub RomuTrio.SetSnapshot()
  If snapStateVector(0)=0 andalso  snapStateVector(1)=0 andalso snapStateVector(2)=0 then
    For i As Ulong = 0 To 2
      StateVector(i)= InitialState(i)
    Next
  Else
    For i As Ulong = 0 To 2
      StateVector(i)= snapStateVector(i)
    Next
  End If
End Sub
 
Sub RomuTrio.GetInitialState()
  For i As Ulong = 0 to 2
    InitialState(i) = StateVector(i)
  Next
End Sub
 
Private Sub RomuTrio.RestoreInitialState()
  For i As Ulong = 0 To 2
    StateVector(i)= InitialState(i)
  Next
End Sub
 
Sub RomuTrio.Initialize( )
  Randomize ' for ShuffleUlongint
  StateVector(0) = HammingSeed64 : ShuffleUlongint( StateVector(0) )
  StateVector(1) = StateVector(0) : ShuffleUlongint( StateVector(1) )
  StateVector(2) = StateVector(1) : ShuffleUlongint( StateVector(2) )
End Sub
 
Constructor RomuTrio
  Initialize
  GetInitialState
End Constructor
The eagle-eyed among you may notice some procedure placeholders. I am looking at a Load/Save Current/Initial vector states facility and whether to make it manual or automatic, and we need to be careful not to end up with the initial state existing in perpetuity. Image
Last edited by deltarho[1859] on Aug 07, 2021 4:47, edited 1 time in total.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: RomuTrio package

Post by deltarho[1859] »

I have just removed the procedure placeholders above. What I had in mind will work with one generator but not two or more because it looks like we cannot associate a particular generator with its saved data.

So, that is me done with RomuTrio. I doubted that it would be a runner when I started, but it looks like the best generator for 64-bit mode so far. Ironic considering that all my work is in 32-bit. Image
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: RomuTrio package

Post by deltarho[1859] »

it looks like we cannot associate a particular generator with its saved data.
fxm came up with a good idea, but I am now thinking my idea of save/load state vectors was 'barmy'.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: RomuTrio package

Post by deltarho[1859] »

A couple of weeks ago I dropped an email to Mark Overton regarding considering using a multiplier of 6364136223846793005 as opposed to 15241094284759029579 as I got no PractRand anomalies to 2TB using the former.

His response did not address that, but he wrote the following.
Thank you for the kind email about Romu. I'm glad you like it.

PractRand has a subtle bug that causes the tails of its statistical distributions to be too high, resulting in false anomalies. A work-around is to run PractRand two or three times with different seeds to determine whether the anomaly is due to the PRNG or PractRand. Have you tried multiple runs? Another check is to run the job twice as long (to 4 TB in your case) to see whether the anomaly persists.

In my PractRand runs, all the Romu generators reached 2^50 bytes without trouble, although PractRand reported a couple of false anomalies along the way.
I am not so sure about a 'subtle bug'. If we call an optimal Hamming weight a warm area and a low entropy Hamming weight a cool area then I reckon that during a generator's session, we momentarily drift into a cool area and PractRand spots this and issues a small anomaly report and that this is an inherent weakness of PRNGs. However, I think that this a small trade-off for using a PRNG compared to using a CPRNG given that the former are significantly faster. If I am correct then if we are close to a cool area, on seeding, a 'warm up' may push us further into a cool area. So, always using a 'warm up' may be a bad idea. This strengthens my Hamming/Knuth seeding method, which ensures that we always initialize a generator in a warm area.

Anyway, I accept his advice on multiple PractRand tests and running to 4TB.
Overton wrote:reached 2^50 bytes without trouble
That is a lot - 1024TB. On my machine, we are talking about a PractRand run of about three months. PractRand defaults to 2^45 bytes (32TB).
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: RomuTrio package

Post by deltarho[1859] »

I have returned to HammingSeed64.

With a Knuth Shuffle on eight bytes, although unlikely, it is possible to not get a shuffle since any byte can shuffle with itself. We could use Sattolo's algorithm, where any byte cannot shuffle with itself. In any event, the 'quality' of the shuffle will vary.

It could be argued that for four bytes the Intel instruction bswap would be better. For example, with four bytes a, b, c, and d we get d, c, b, and a after a bswap. With regard rdtsc the fastest changing byte is d. If we read the time stamp quickly in succession only d, and perhaps c, would change. After a bswap the two 'readings' would be very different.

With regard 64-bit, I have looked at reading the time stamp twice with a delay between the readings. It is not easy to come up with an appropriate delay and, of course, we are dependent upon the speed of the CPU. The reason for using a Hamming Weight is to avoid the need to warm up a generator - we jump to an optimal weighting rather than burn numbers to get us to an optimal weight. We can employ a similar strategy with the time stamp. Remembering that if x is a random number then so is 'x + c' where c is a constant. The lower 32 bits of the time stamp will rotate every 2^32 increments. If we add 2^30, 2^32/4, to our first reading of the time stamp, then we are effectively jumping a quarter of a rotation without having to wait. On my machine, this is equivalent to waiting just over 300 milliseconds. The speed of the CPU is now academic.

We no longer need the ShuffleUlongint Sub.

Here are half a dozen 3 x HammingSeed64 calls using the above technique. No breathing space was used - they are running flat out. We are getting a good spread.

Code: Select all

AC0B47F3AC0B4733
46E171F346E17133
79017CF379017C33
 
E1AA89F3E1AA8933
0A3793F30A379333
B8C29CF3B8C29C33
 
E270AAF3E270AA33
5F01B4F35F01B433
3083BDF33083BD33
 
6A18CBF36A18CB33
CD07E0F3CD07E033
8481F7F38481F733
 
DEF108F4DEF10834
DF8912F4DF891234
7B231CF47B231C34
 
D9CA29F4D9CA2934
664D33F4664D3334
B3E03CF4B3E03C34
Ten million passes of a new Initialize were executed, and the longest time to populate the state vector was 300 milliseconds. This is very much slower than the Hamming/Knuth method, but the spread is always good. Previously the time stamp was only read once followed by two Knuth Shuffles. It is worth noting that we, on average, take 100 milliseconds to blink our eyes, so we are talking three eye blinks. I like to put things in perspective. Image Can you live with 300 milliseconds? I can.

Here is the new HammingSeed64.

HammingSeed64

Code: Select all

Function HammingSeed64() As Ulongint
Dim As Ulong tscSeed0, tscSeed1, numBits
Dim As Ulongint Seed, CopySeed
While numBits <> 32
  Asm
    rdtsc
    mov ecx, eax
    bswap eax ' fast moving to upper bits
    mov Dword Ptr [tscSeed0], eax
    add ecx, 1073741824 ' A quarter of a spin
    bswap ecx ' fast moving to upper bits
    mov Dword Ptr [tscSeed1], ecx
  End Asm
  Seed = (Cast( Ulongint, tscSeed0 ) Shl 32) Or Cast(Ulongint, tscSeed1)
  CopySeed = Seed : numBits = 0
  While CopySeed <> 0
    CopySeed = CopySeed And CopySeed - 1
    numBits += 1
  Wend
Wend
Return Seed
End Function
Delete ShuffleUlongint()

Here is the new Initialize.

Initialize

Code: Select all

Sub RomuTrio.Initialize( )
  StateVector(0) = HammingSeed64
  StateVector(1) = HammingSeed64
  StateVector(2) = HammingSeed64
End Sub
Last edited by deltarho[1859] on Aug 21, 2021 9:51, edited 1 time in total.
Post Reply