PCG32II Help file

General FreeBASIC programming questions.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: PCG32II Help file

Post by deltarho[1859] »

@Provoni

Your query has come in at an opportune moment because dodicat has come up with a method regarding an integral number in the range which is coming in at 62% faster in 32-bit mode and 25% faster in 64-bit mode than PCG32II's published method. It still may not be fast enough for you but 595MHz in 32-bit mode is blinding fast and as fast as a random [0,1). I have further testing to do but if all is well I publish a new PCG32II.bas and will let you know.
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: PCG32II Help file

Post by Provoni »

hey deltarho[1859],

An integer is needed, not a float. I am thinking it may be the overhead of the function call also and I am trying to convert the code so that it can be used locally in the thread.

You say that it is running at 300 to 600 Mhz. My program does about 100,000,000 iterations per second (100 Mhz) with the Lehmer generator where one random number is needed per iteration. The Lehmer generator is a small part of the code and does not much influence the speed of the program much. After replacing the Lehmer with "random_number=pcg.Range(1,s)" the speed drops to 20,000,000 iterations per second. If it is running at 300 to 600 Mhz then it should not slow down my program with a factor of 5 so I do not understand what is going on.

Actually the same things happens when calling any of the standard FreeBASIC generators and that's why I had to resort to a local generator.

I replaced:

Code: Select all

state=48271*state and 2147483647
random_number=1+s*state shr 31
With:

Code: Select all

random_number=pcg.Range(1,s)
I don't understand how to convert this.state or this.sequence to local variables in the following code:

Code: Select all

Function pcg32.range( Byval One As Double, Byval Two As Double ) As Double
  Dim TempVar As Ulong
  Dim As Ulongint oldstate = this.state
  this.state = oldstate * 6364136223846793005ULL + this.sequence
  Dim As Ulong xorshifted = ((oldstate Shr 18u) xor oldstate) Shr 27u
  Dim As Ulong rot = oldstate Shr 59u
  TempVar = (xorshifted Shr rot) Or (xorshifted Shl ((-rot) And 31))
  Return TempVar/4294967296.0*( Two - One ) + One
end function
Thanks
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: PCG32II Help file

Post by Provoni »

Your "pcg.Range(1,s)" does about 300 Mhz for me on a single thread. Obviously the multi-threading is causing a massive slow down somehow.
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: PCG32II Help file

Post by Provoni »

Here's some (very bad) code to illustrate the problem. It generates (300,000,000 / threads) random numbers per single thread. Change threads from 1 to 4 to see the effect.

- At 1 thread PCG32II and FreeBASIC rng 4 take 1 second for the 300,000,000 random numbers.
- At 4 threads PCG32II takes 3.3 seconds and rng 4 about 7 seconds for the 300,000,000. This kind of slow down should not occur since the 300,000,000 number is divided by the number of threads.
- The custom rnd shows that the slow down from sleep 10 and threadwait are minimal in my (very bad) code.

From 2017: viewtopic.php?f=3&t=25603&p=231065&hili ... ng#p231065

Code: Select all

screenres 640,480,32

randomize timer,4

#include "PCG32II.bas"
Dim shared pcg as pcg32

declare sub freebasic_rnd(byval nopointer as any ptr)
declare sub pcg32_rnd(byval nopointer as any ptr)
declare sub custom_rnd(byval nopointer as any ptr)

dim as integer i,j,k
dim shared as integer threads=4 'change to 4 and compare timings
dim as any ptr thread_ptr(threads)

dim as double t=timer
for i=1 to threads
   thread_ptr(i)=threadcreate(@freebasic_rnd,0)
   sleep 10
next i
for i=1 to threads
   threadwait(thread_ptr(i))
next i
print "FreeBASIC rnd timing: "+str(timer-t)

sleep 100

t=timer
for i=1 to threads
   thread_ptr(i)=threadcreate(@pcg32_rnd,0)
   sleep 10
next i
for i=1 to threads
   threadwait(thread_ptr(i))
next i
print "PCG32II   rnd timing: "+str(timer-t)

sleep 100

t=timer
for i=1 to threads
   thread_ptr(i)=threadcreate(@custom_rnd,0)
   sleep 10
next i
for i=1 to threads
   threadwait(thread_ptr(i))
next i
print "Custom    rnd timing: "+str(timer-t)

sleep

sub freebasic_rnd(byval nopointer as any ptr)
   
   dim as longint i,j
   
   for i=1 to 300000000/threads   
      j+=rnd*123   
   next i
   
end sub

sub pcg32_rnd(byval nopointer as any ptr)
   
   dim as longint i,j,s=123
   
   for i=1 to 300000000/threads
      j+=pcg.range(1,s)
   next i
   
end sub

sub custom_rnd(byval nopointer as any ptr)
   
   dim as longint i,j,m=123
   
   for i=1 to 300000000/threads
      m=(214013*m+2531011)mod 2147483648
      j+=((m shr 16)/32768)*123 '32767
   next i
   
end sub
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: PCG32II Help file

Post by deltarho[1859] »

The Lehmer random number generator is a type of linear congruential generator (LCG) and they are blindingly fast. However, the 32-bit versions fail the PractRand test very shortly after they start to generate numbers. The 64-bit versions are better, but they fail PractRand not long after the 32-bit versions. LCGs are OK if the quality of randomness is not an issue or when just a few KB are needed before their lack of randomness manifests itself.
Obviously, the multi-threading is causing a massive slow down somehow.
Hmmm, it shouldn't because PCG32II is thread safe. PCG32 was not thread safe. I have tested PCG32II with multi-threading and the throughput was the same for all threads.
If it is running at 300 to 600 Mhz then it should not slow down my program with a factor of 5 so I do not understand what is going on.
Neither do I.
I don't understand how to convert this.state or this.sequence to local variables
You have posted the float version.

PCG32 used a function and if called from different threads we got massive collisions as would happen with all of FreeBASIC's generators because none of them are thread safe.

What you want is a macro and I will look into that.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: PCG32II Help file

Post by deltarho[1859] »

Your last post came in whilst I was composing my last post. I'll have a look at it.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: PCG32II Help file

Post by deltarho[1859] »

I have not read your code yet but have run it.

One thread:
FreeBASIC 4.86
PCG32II 0.61
Custom 5.47

Four threads:
FreeBASIC 6.23
PCG32II 0.23
Custom 1.59

This seems to contradict your results.

I will now read your code.
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: PCG32II Help file

Post by Provoni »

Without optimizations on FreeBASIC-1.07.1-win64-gcc-5.2.0:

1 thread:

FreeBASIC rng 4: 10.65
PCG32II: 3.34
Custom: 10.22

4 threads:

FreeBASIC rng 4: 13.39
PCG32II: 7.28
Custom: 2.81

8 threads:

FreeBASIC rng 4: 13.32
PCG32II: 8.28
Custom: 1.58

The PCG32II was downloaded from your source a few days ago.
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: PCG32II Help file

Post by Provoni »

Sorry to cause all this stir deltarho[1857] but PCG32II is performing up to par now! I was simply not using a unique generator for every thread.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: PCG32II Help file

Post by deltarho[1859] »

OK, I have spotted your problem.

Dim shared pcg as pcg32

All your threads are sharing the same generator so you will get collisions.

For four threads you need
Dim shared as pcg32 pcgA, pcgB, pcgC, pcgD.

pcgA should be given to the primary thread and the others to the other threads.

Not quite sure how to introduce them to your code yet but you may know how to do that since you are better acquainted with it.

The principle, though, is that each thread should invoke its own generator and that way we do not get collisions.

If you look in the Help file, about a third down in the 'Usage examples' section, you will see a multi-threading example.
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: PCG32II Help file

Post by Provoni »

Hey deltarho[1859],

I read your comment in the other thread and

Code: Select all

Dim shared pcg(threads) as pcg32
has fixed the speed issue for me. But is that okay to do?
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: PCG32II Help file

Post by deltarho[1859] »

You keep posting whilst I am composing my posts. Image

Anyway, PCG32II's reputation has been restored and seems to be coming up trumps in your code.
But is that okay to do?
Yep

The amazing thing is that with four generators working your PC is actually generating four times as many random numbers, aren't threads wonderful?
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: PCG32II Help file

Post by Provoni »

Hey deltarho[1859],
deltarho[1859] wrote: You keep posting whilst I am composing my posts.
Haha, sorry.

Probably a noob question but how does your PCG32II know whether the use Function pcg32.range( Byval One As Long, Byval Two As Long ) As Long or Function pcg32.range( Byval One As Double, Byval Two As Double ) As Double. Just want to make sure that my program is using the integer version. Thanks.
deltarho[1859] wrote: The amazing thing is that with four generators working your PC is actually generating four times as many random numbers, aren't threads wonderful?
How about 30. :)
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: PCG32II Help file

Post by deltarho[1859] »

Probably a noob question

Code: Select all

Declare Function range Overload( Byval One As Long, Byval Two As Long ) As Long
Declare Function range Overload ( Byval One As double, Byval Two As Double ) as Double
in the 'Type pcg32' declaration.
How about 30. :)
You are joking, right?

I would like to see an Intel Core i9-9900 with 8 cores/16 threads doing its stuff.
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: PCG32II Help file

Post by Provoni »

Nope, my program (AZdecrypt) can use up to 65,536 threads (current artificial limit) and the problems it solves can be parallelized by dividing the restarts over the amount of threads. Here is the project page: http://www.zodiackillersite.com/viewtop ... =81&t=3198

I am currently testing whether the improved randomness from PCG32II over the Lehmer is worth taking. If so I will let you know and quote your PCG32II as being used in the readme file. Thanks for your help so far!

Code: Select all

Declare Function range Overload( Byval One As Long, Byval Two As Long ) As Long
Declare Function range Overload ( Byval One As double, Byval Two As Double ) as Double
I do not understand the Overload functionality. If I specify pcg.range(1,123) will it then invoke the Long version?
Post Reply