Rnd6 for Windows

Windows specific questions.
deltarho[1859]
Posts: 4550
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Rnd6 for Windows

Post by deltarho[1859] »

Yes, putting Rnd6 into a static library was a bad move.

Throughput in MHZ (gcc -O3). First in pair is source code; second is library

Code: Select all

              32-bit    64-bit
Rnd6         301/200   312/188
Range Int    302/130   303/131
Range float  315/ 74   318/ 76
The source code is faster than the library for every metric. The floating point range is the hardest hit.

There is little difference between 32-bit and 64-bit whether source code or library.

The throughput with FB #5 in 32-bit mode is 0.5MHz.

I am not going to drop the library approach and leave you with it, so here is the source code.

Although Rnd6 is now a viable generator in its own right, it is really a FB #5 replacement. CryptoRndII with its twin buffering and thread pooling has twice the throughput. Rnd6 is ideal for initializing applications where only a small number of [0,1)/range values are required. Rnd6 is now 3.75 times faster than Mersenne Twister.

I wondered whether the 40*1024*4 and 40*1024-1 would be replaced by 163840 and 40959 respectively. They were, but not by gcc. They were replaced by fbc in the emitted C file.

A deltarho/dodicat production. :D

Rnd6.bas

Code: Select all

#Include Once "windows.bi"
#Inclib "bcrypt"
#Include Once "win/wincrypt.bi"
 
Dim Shared As BCRYPT_ALG_HANDLE ptr hRand
 
Function Rnd6() As Double
  Static As Uint32 count
  Static As Uint32 Buffer(40*1024-1)
  If count = 0 then BCryptGenRandom(hRand, Cast(Puchar, @Buffer(0)), 40*1024*4, 0)
  Function = Buffer(count)/2^32
  count += 1
  If count = 40*1024-1 Then count = 0
End Function
 
Sub Rnd6Buffer( a() As UInt8, BufferSize as UInt32 )
  redim a(0 to BufferSize - 1) as UInt8
  BCryptGenRandom(hRand, @a(0), BufferSize, 0)
End Sub
 
' Long range
Function Rnd6range Overload( First As Int32, Last As Int32 ) As Int32
  Static As Uint32 count
  Static As Uint32 Buffer(40*1024-1)
  If count = 0 then BCryptGenRandom(hRand, Cast(Puchar, @Buffer(0)), 40*1024*4, 0)
  Function = CLng( Buffer(count) Mod (Last-First+1)) + First ' Mod by dodicat
  count += 1
  If count = 40*1024-1 Then count = 0
End Function
 
' Floating point range
Function Rnd6range Overload( First As Double, Last As Double ) As Double
  Static As Uint32 count
  Static As Uint32 Buffer(40*1024-1)
  If count = 0 then BCryptGenRandom(hRand, Cast(Puchar, @Buffer(0)), 40*1024*4, 0)
  Function = Buffer(count)/2^32 * ( Last - First ) + First
  count += 1
  If count = 40*1024-1 Then count = 0
End Function
 
Sub on_init( ) Constructor
  BCryptOpenAlgorithmProvider(@hRand, BCRYPT_RNG_ALGORITHM, 0, 0)
End Sub
 
Sub on_exit( ) Destructor
  If hRand Then BCryptCloseAlgorithmProvider(hRand, 0)
End Sub 
deltarho[1859]
Posts: 4550
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Rnd6 for Windows

Post by deltarho[1859] »

I was never sold on dodicat's timing code, so I looked at it from a different angle.

Since we are only using a single thread, then when a buffer is being refreshed, we cannot read any values from it. It follows, then the buffer refresh method will introduce a stutter. The stutter time will depend upon the buffer size employed.

If the buffer is 40*1024 bytes as above, then the stutter time on my machine is about 100µs. I will not perceive that.

I looked at the MHz throughput for a buffer size from 1000 bytes to 40000 bytes and found that it was virtually the same for all buffer sizes.

That was not expected.

However, the explanation is simple. If we double the buffer size, the stutter time will double. At the same time, the number of buffer refreshes will be halved. The throughput will be unaltered. If we halved the buffer size, the stutter time will be halved. At the same time the number of buffer refreshes will be doubled. The throughput will be unaltered.

So there isn't a 'sweet spot' buffer size. The 300MHz throughput that I am getting is down to the speed of my CPU and nothing else.

So what buffer size should we use? With a buffer size of 40*1024 if I always request less than that for Rnd6 or a range, then I will never see a buffer refresh and therefore never have a stutter. I cannot see my ever needing to request more than 40*1024 Rnd6 or range, so I will keep 40*1024. Even if I requested more, I would need to request about one million times 40*1024 to get a stutter of 0.1 seconds; the blink of an eye when I would start to perceive a stutter.

So regarding a buffer size, it is your choice.

:)
deltarho[1859]
Posts: 4550
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Rnd6 for Windows

Post by deltarho[1859] »

It gets more intriguing. If your CPU is twice as fast as mine, then for a given buffer size the stutter time will be halved. We will then see the number of buffer refreshes doubled. So the throughput of about 300MHz will be the same no matter how fast your CPU is.

:)
deltarho[1859]
Posts: 4550
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Rnd6 for Windows

Post by deltarho[1859] »

Give this a run (gcc -O3)

Code: Select all

#include "Rnd6.bas"
Dim As Double t = Timer
For i As Ulong = 1 to 10^8
  Rnd6
Next
t = Timer -t
Print 100/t
Sleep
I've just got 320MHz with my 3.9GHz turbo.
srvaldez
Posts: 3505
Joined: Sep 25, 2005 21:54

Re: Rnd6 for Windows

Post by srvaldez »

deltarho[1859]
the results vary from run to run, with -O3 I get between 680 and 698
deltarho[1859]
Posts: 4550
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Rnd6 for Windows

Post by deltarho[1859] »

Thanks, srvaldez.

Well, I got that wrong.

Yes the buffer is halved and the buffer refresh is doubled, but the buffer is 'burned up' much faster as well.
I get between 680 and 698
That is only a 2.6% difference - we are talking PCs here.

You have some beast there. :)
deltarho[1859]
Posts: 4550
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Rnd6 for Windows

Post by deltarho[1859] »

I reckon that I am done with this thread now, save to say Rnd6 is not to be treated as a general purpose generator; PCG32II, MsWsII, CryptoRndII and others are very much faster and have more functionality.

Having said that, Rnd6 is faster than FB #1 to FB #%. It could be used instead of FB #2 and FB #3; probably the most used generators. It is also a CPRNG rather than a PRNG and has two range procedures built in.

As usual, a lack of feedback has been noted.

:)
srvaldez
Posts: 3505
Joined: Sep 25, 2005 21:54

Re: Rnd6 for Windows

Post by srvaldez »

👍😁
yes, sadly the lack of participation is dismal, especially when you go out of your way to help someone 😒
hhr
Posts: 238
Joined: Nov 29, 2019 10:41

Re: Rnd6 for Windows

Post by hhr »

I often had the impression that the optimisation on my computer would be of no use.

gas32: 14 MHz
gcc32: 13.8 MHz
gcc32 -O 2: 14.5 MHz
gcc32 -O 3: 17 MHz

gas64: 18.5 MHz
gcc64: 19.7 MHz
gcc64 -O 2: 20.5 MHz
gcc64 -O 3: 21.5 MHz

Pentium Dual-Core CPU E5300 @ 2.60GHz 2.60GHz
RAM 4.00 GB
Windows7 64 Bit

With gas32 and FBrng:
Rnd6 : 7.3 s; 13.7 MHz
Rnd#5: 117.5 s; 0.85 MHz
Rnd#3: 2.0 s; 49.6 MHz
deltarho[1859]
Posts: 4550
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Rnd6 for Windows

Post by deltarho[1859] »

Thanks, hhr.

I was surprised to see Rnd#3 being faster than Rnd6.

The point of this thread, of course, was to replace Rnd#5 and CryptGenRandom. dodicat's buffer refresh is a bonus.
Post Reply