Fastest PRNG around, the XORSHIFT

Post your FreeBASIC source, examples, tips and tricks here. Please don’t post code without including an explanation.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Fastest PRNG around, the XORSHIFT

Post by deltarho[1859] »

MrSwiss wrote:It isn't: 2^32 but: 2^32-1
I agree that the maximum value of Ulong is 2^32-1 but there are 2^32 values including zero, so we must divide by 2^32. This is why we have floating point returns of [0,1) since (2^32-1)/2^32 < 1.

If you look at Greg Lyon's FB version of the Mersenne Twister C code we have

Code: Select all

'/* generates a random number on [0,1)-real-interval */
'double genrand_real2(void)
'{
'    return genrand_int32()*(1.0/4294967296.0);
'    /* divided by 2^32 */
'}
Note the 4294967296.0.

This is from a recent paper, Middle Square Weyl Sequence RNG by Bernard Widynski, where he writes "We can demonstrate uniformity by showing that the probability of any given output is 1/2^32. That is because there are 2^32 possible outcomes.

I could go on but you will find that all 32-bit generators divide by 2^32 for a floating-point value.

Paul Doe uses

Code: Select all

'' To get a value in the 0 <= n < 1 range
? ( mswsrnd() / ulongMax )
where ulongMax is from

Code: Select all

const _
  ulongMax => culng( -1 ), _
  ulongIntMax => culngint( -1 )
Unfortunately that is wrong because that would give a value in the range 0 <= n <= 1 introducing a bias; albeit a small one. It must be small otherwise it would not pass PractRand to 8TB.
(Btw. if you feel the need to get angry at someone, taget yourself this time.)
No comment. Image
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Fastest PRNG around, the XORSHIFT

Post by MrSwiss »

I'm referring to the use of 2^32 in the #Define (where it is wrong) and not the division to get a Double [0 .. 1):

Code: Select all

#define Get64Bit (Cast( Ulongint, Rnd*(2^32) ) Shl 32) Or Cast( Ulongint, Rnd*(2^32) )
' should be ...
#define Get64Bit ( (CULngInt(Rnd() * (2^32-1)) Shl 32) + (Rnd() * (2^32-1)) )
Btw. the cast is only in FBC 32 needed! (and, only when shifting > 31)

By using the Const, we can easily simplify:

Code: Select all

#define Get64Bit ( (CULngInt(Rnd() * ULmax) Shl 32) + (Rnd() * ULmax) )
coderJeff
Site Admin
Posts: 4313
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Fastest PRNG around, the XORSHIFT

Post by coderJeff »

Perhaps, compare to a single decimal digit:
- minimum value = 0
- maximum value = 9
- number of possible values = 10
- expected probability of any digit = 1/10th

rnd() function returns [0,1), i.e. 0.000... to 0.999..., but never 1.0. (The number of decimal places is limited by hardware, so no, it's not the mathematical identity 1 = 0.999... in this case)

So what happens when multiplying by the maximum digit value 9?

If rounding, as in the case of 'int( rnd() * 9 )', then the bias is introduced that the digits '0' and '9' will only appear half as often as the other digits.

And in the case 'fix( rnd() * 9 )', then the bias is introduced that digit '9' will never be a result.

In general the mapping becomes 'int( rnd() * N ) mod N', where N is the number of possible values and not the maximum value. For built-in power of 2 integer types, like long, longint, etc, the int/fix/mod comes for free when the full number of possible values is used and the result is stuffed in a fixed size integer type.

Basically, what deltarho said.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Fastest PRNG around, the XORSHIFT

Post by deltarho[1859] »

MrSwiss wrote:You are also inducing unnecessary 'number crunching' which causes speed loss. Use a once defined Const instead ...
Get64Bit is not used during the random number generation it is only used in MyRandomize which is invoked by the Constructor; If we saved a few microseconds it is not going to make a blind bit of difference to an application session.

It is worth noting that an optimizing compiler would not leave 2^32 unattended. From the Middle Square Weyl Sequence paper.
In random number generation, a floating-point number in the range of 0 to 1 is often required. In this section, we address the generation of floating-point numbers. For the msws generator, one can simply divide by 2^32 as shown below.

#define two32 4294967296.
double r;
r = msws() / two32;

This will produce a 32-bit resolution floating-point number in [0,1). A good compiler will convert a division by a constant to a multiplication. If your compiler does not do this then you would multiply by 2.3283064365386962890625e-10 instead of dividing by 2^32.
In my tests dividing by 2^32 or a Const has no effect on the throughput. In the old days we had to do the optimizing.
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Fastest PRNG around, the XORSHIFT

Post by MrSwiss »

deltarho[1859] wrote:In my tests dividing by 2^32 or a Const has no effect on the throughput. In the old days we had to do the optimizing.
No, this needs conversion to: multiplication (division is slower, uses more CPU/FPU cycles)
and this one, is used on every single call (as in the quoted article!).

Code: Select all

Const As ULong ULmax = &hFFFFFFFF
Const As Double DmulF = 1.0 / (ULmax + 1ull)  ' multiply factor for [0,1) range generation
Says it all, no optimizer dependency, e.g. FBC Win 32 (standalone) -gen gas (without GCC installed).
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Fastest PRNG around, the XORSHIFT

Post by deltarho[1859] »

OK, I will buy that.

In the above generator I replaced /2^32 with *DmulF.

For [0,1) calculations the throughput increased by 5% using gas. For gcc optimized with -Wc -02 there was no difference.

The throughput for gcc was over 10x greater than gas. I don't use gas, I prefer electric. Image
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Fastest PRNG around, the XORSHIFT

Post by deltarho[1859] »

Come to think of it a simpler version is:

Code: Select all

Const As Double DmulF = 1.0/2^32
Test

Code: Select all

Const As ULong ULmax = &hFFFFFFFF
Const As Double DmulF = 1.0 / (ULmax + 1ull)
Const As Double DmulX = 1.0/2^32
print DmulF
print DmulX
Sleep
gives

Code: Select all

 2.328306436538696e-010
 2.328306436538696e-010
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Fastest PRNG around, the XORSHIFT

Post by MrSwiss »

deltarho[1859] wrote:Come to think of it a simpler version is:
I was expecting such a comment from somebody, that only speaks decimal.
For those speaking binary too, as well as using both constants in code, my
version makes probably more sense.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Fastest PRNG around, the XORSHIFT

Post by deltarho[1859] »

MrSwiss wrote:I was expecting such a comment from somebody, that only speaks decimal.
You have a rather awkward way of communicating with people which is probably why your posts get a poor response.
using both constants in code, my version makes probably more sense.
But if only the second is needed then the first is obligatory.

Better sense for me is
Const As ULong ULmax = &hFFFFFFFF
Const As Double DmulF = 1.0/2^32

where we can use either or both.

Image
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Fastest PRNG around, the XORSHIFT

Post by MrSwiss »

deltarho[1859] wrote:But if only the second is needed then the first is obligatory.
Well, as it is in my code I need LUmax, even before ever touching DmulF.
In the initializer procedures of the PRNG.
Therefore, the use, in the second constant's definition, comes 'for free'.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Fastest PRNG around, the XORSHIFT

Post by deltarho[1859] »

In your case, yes, as for the forum I'll let members decide - I will wager that the majority will agree with me.
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Fastest PRNG around, the XORSHIFT

Post by MrSwiss »

I'll give you two possible solutions of 'Pythagoras' aka: 'distance', between two 2D vectors,
often used in conjunction, with fbGFX coding. (Aka: hypotenuse)

This one is used, by the more advanced coders:

Code: Select all

#Define distance(hd, vd)	( Sqr((hd)*(hd) + (vd)*(vd)) )	' vd = vertical distance | hd = horizontal dist.
This one is not that often seen:

Code: Select all

#Define distance(hd, vd)	( Sqr((hd)^2 + (vd)^2) )
Why do you think, that that is so? (rhetoric question)

Answer: the first method is faster, than the second (the result is the same).
The 'power of' operator is very costly, in terms of speed, compared to 'mul'.
(that's my reason to avoid it, whenever possible)
You where the one, that once stated: "speed, speed, speed", remember?
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Fastest PRNG around, the XORSHIFT

Post by deltarho[1859] »

MrSwiss wrote:Answer: the first method is faster, than the second (the result is the same).
The 'power of' operator is very costly, in terms of speed, compared to 'mul'.
(that's my reason to avoid it, whenever possible)
I already knew that Operator ^ (Exponentiate) is very costly - I did not start programming last Wednesday. In one of my Monte Carlo applications I use 'If Sqr( x*x + y*y) <= 1 Then lCount += 1' and not 'If Sqr( x^2 + y^2) <= 1 Then lCount += 1'.

When distance(hd, vd) is encountered in the source code it is replaced by the body of the defintion. In the first case * will be calculated and in the second case ^ will be calculated for each and every instance of distance(hd,vd) in the source code.

When we use /2^32 in the value of Const it is only determined once and not every time the symbol name is encountered in the source code.
coderJeff
Site Admin
Posts: 4313
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Fastest PRNG around, the XORSHIFT

Post by coderJeff »

MrSwiss wrote:Answer: the first method is faster, than the second (the result is the same).
Careful with integers if the calculation might overflow the integer type.

Code: Select all

dim x as ulongint = 4000000000
dim y as ulongint = 3000000000

print sqr( x ^ 2 + y ^ 2 ) '' 5000000000 -- correct
print sqr( x * x + y * y ) '' 2559932797.22153 -- wrong
For integer operands, power operator (x^y) implicitly converts to floating point (cdbl(x)^cdbl(y))

And for floating point (single/double) operands. fbc will optimize (x^2) to (x*x).
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Fastest PRNG around, the XORSHIFT

Post by deltarho[1859] »

That is interesting. If MrSwiss' vd and hd are integers the second example will find itself in the float environment which is slower than the integer environment. On the other hand if vd and hd are floats then the two examples should perform equally. The Monte Carlo app that I referred to was originally written in PowerBASIC and that would compile x^2 as given because that is a straight BASIC to asm to native compiler, like gas.
Post Reply