Threadsafe RANDOMIZE and RND
Threadsafe RANDOMIZE and RND
I merged changes to fbc 1.08 to make RANDOMIZE and RND thread-safe.
pull request: https://github.com/freebasic/fbc/pull/264
bug report: https://sourceforge.net/p/fbc/bugs/914/
In the end, I added a new mutex to the rtlib to serialize access to the single instance global state for the random number generators.
- No change in performance on single threaded (i.e. non-multi-threaded programs)
- You might find RND to perform a little slower on multi-threaded programs due the mutex
- I had considered using thread local storage (TLS) but adds complexity and extra pointer look-ups and would still have the overhead of the generic RANDOMIZE and RND interface.
- I think using the mutex is best we can do with the generic & simple RANDOMIZE + RND design & interface
To expose some internals I also added a new header: ./inc/fbc-int/math.bi
Current version of full ./fbc-int/math.bi can be seen at https://github.com/freebasic/fbc/blob/m ... nt/math.bi
I'll show some examples following this post soon that might help explain what's going on in the rtlib.
I think FB's RANDOMIZE & RND is OK for general use. But, if maximum random number bitrate is the goal, then FB's built-in RND won't be best solution. To make fbc's RND fast-as-possible, I think would need a different design and API than what's being done with current rtlib.
pull request: https://github.com/freebasic/fbc/pull/264
bug report: https://sourceforge.net/p/fbc/bugs/914/
In the end, I added a new mutex to the rtlib to serialize access to the single instance global state for the random number generators.
- No change in performance on single threaded (i.e. non-multi-threaded programs)
- You might find RND to perform a little slower on multi-threaded programs due the mutex
- I had considered using thread local storage (TLS) but adds complexity and extra pointer look-ups and would still have the overhead of the generic RANDOMIZE and RND interface.
- I think using the mutex is best we can do with the generic & simple RANDOMIZE + RND design & interface
To expose some internals I also added a new header: ./inc/fbc-int/math.bi
Current version of full ./fbc-int/math.bi can be seen at https://github.com/freebasic/fbc/blob/m ... nt/math.bi
I'll show some examples following this post soon that might help explain what's going on in the rtlib.
I think FB's RANDOMIZE & RND is OK for general use. But, if maximum random number bitrate is the goal, then FB's built-in RND won't be best solution. To make fbc's RND fast-as-possible, I think would need a different design and API than what's being done with current rtlib.
Re: Threadsafe RANDOMIZE and RND
PCG32? Middle-Square Weyl Sequence? Squares RND? These are some implementations that can be used without needing to alter the API too much (if at all), and in the two latter cases their implementation is trivial...coderJeff wrote:...To make fbc's RND fast-as-possible, I think would need a different design and API than what's being done with current rtlib.
Re: Threadsafe RANDOMIZE and RND
HI paul doe,
Indeed, it would be very straight forward to add additional RNG's with current API.
What I'm talking about in design and API is:
- Current RANDOMIZE & RND operates on single global state
- runtime selectable RNG (i.e. RANDOMIZE selects an RNG to access through RND) - a virtue and a curse
- function RND( arg as single = 1.0 ) takes parameter, typically unused
- rtlib RND functions have an 'if' statement that checks 'arg' to return last number
- RND converts a 32-bit ulong to a double
- Overall, the overhead is multiple function pointer look-ups, plus a typically unused parameter, an if statement, and a conversion.
For "Fastest":
- local automatic storage (no need for mutex locks)
- directly instance a specific RNG and state (i.e. a template or class)
- if RNG math is 'small', inline the function as an expression (fbc doesn't do inline - can only fake it with a macro)
The example programs for testing "performance" of RNG's tends to be calling the function and doing pretty much nothing with the result.
Indeed, it would be very straight forward to add additional RNG's with current API.
What I'm talking about in design and API is:
- Current RANDOMIZE & RND operates on single global state
- runtime selectable RNG (i.e. RANDOMIZE selects an RNG to access through RND) - a virtue and a curse
- function RND( arg as single = 1.0 ) takes parameter, typically unused
- rtlib RND functions have an 'if' statement that checks 'arg' to return last number
- RND converts a 32-bit ulong to a double
- Overall, the overhead is multiple function pointer look-ups, plus a typically unused parameter, an if statement, and a conversion.
For "Fastest":
- local automatic storage (no need for mutex locks)
- directly instance a specific RNG and state (i.e. a template or class)
- if RNG math is 'small', inline the function as an expression (fbc doesn't do inline - can only fake it with a macro)
The example programs for testing "performance" of RNG's tends to be calling the function and doing pretty much nothing with the result.
Re: Threadsafe RANDOMIZE and RND
Current version of full ./fbc-int/math.bi can be seen at https://github.com/freebasic/fbc/blob/m ... nt/math.bi
The examples following are based on the header as of the first addition.
Example #1: RANDOMIZE & RND in the FBC namespace, plus enum FB_RND_ALGORITHMS
1) the 'fbc' namespace is being used to for names relating to fbc internals. If the interface is ever formally published, names should go in the 'fb' namespace. the '#undef' statements remove RANDOMIZE & RND from the global namespace.
2) on windows fbc automatically adds "advapi32" import library if built-in RANDOMIZE & RND are used. Because we are removing the built-ins from the namespace, we need to do it manually for the fbc namespace.
3) enumeration of 'FB_RND_ALGORITHMS': I tend to prefer named things rather than magic constants. This formalizes the selection of random number generator
4) the extern "rtlib" block puts the RANDOMIZE & RND functions in to the fbc namespace and can then be called using 'fbc.RANDOMIZE' or 'fbc.RND' respectively.
EDIT: FB.FB_RND_ALGORITHMS is defined in fbmath.bi
The examples following are based on the header as of the first addition.
Example #1: RANDOMIZE & RND in the FBC namespace, plus enum FB_RND_ALGORITHMS
Code: Select all
'' move built-ins out of the global namespace
#undef rnd
#undef randomize
#if defined( __FB_CYGWIN__) or defined(__FB_WIN32__)
#inclib "advapi32"
#endif
namespace FBC
enum FB_RND_ALGORITHMS
FB_RND_AUTO
FB_RND_CRT
FB_RND_FAST
FB_RND_MTWIST
FB_RND_QB
FB_RND_REAL
end enum
extern "rtlib"
declare sub randomize alias "fb_Randomize" ( byval seed as double = -1.0, byval algorithm as long = FB_RND_AUTO )
declare function rnd alias "fb_Rnd" ( byval n as single = 1.0 ) as double
end extern
end namespace
2) on windows fbc automatically adds "advapi32" import library if built-in RANDOMIZE & RND are used. Because we are removing the built-ins from the namespace, we need to do it manually for the fbc namespace.
3) enumeration of 'FB_RND_ALGORITHMS': I tend to prefer named things rather than magic constants. This formalizes the selection of random number generator
4) the extern "rtlib" block puts the RANDOMIZE & RND functions in to the fbc namespace and can then be called using 'fbc.RANDOMIZE' or 'fbc.RND' respectively.
Code: Select all
#include once "fbc-int/math.bi"
'' fbc-int/math.bi automatically includes fbmath.bi
fbc.randomize , fb.FB_RND_FAST
for i as integer = 1 to 10
print fbc.RND()
next
Last edited by coderJeff on Nov 07, 2020 13:11, edited 1 time in total.
Re: Threadsafe RANDOMIZE and RND
Example #2: FBC.RND32() returns 32-bit random number
With FBC.RND32():
- still thread safe and mutex is used if mutlithreaded
- does not expect any extra parameter
- does not perform any extra if statement internally
- does not do any conversion to double
- returns a 32-bit ulong only
With this rtlib entry point, can avoid some of the overhead of RND() while still remaining threadsafe.
Code: Select all
#include once "fbc-int/math.bi"
for i as integer = 1 to 10
print fbc.rnd32()
next
- still thread safe and mutex is used if mutlithreaded
- does not expect any extra parameter
- does not perform any extra if statement internally
- does not do any conversion to double
- returns a 32-bit ulong only
With this rtlib entry point, can avoid some of the overhead of RND() while still remaining threadsafe.
Re: Threadsafe RANDOMIZE and RND
Example #3: Examining Internals with fbc.RndGetInternals( @info ) info = fbc.RndGetState()
Thefbc.RndGetInternals( @info ) fbc.RndGetState() function will retrieve some internal information about the random number generator state.
This example uses the function and displays the internal information in what is hopefully a human readable format.
Sample output on win64:
Out of all the internal RNG functions exposed, this structure is the one most likely to change if there are additional updates to the RANDOMIZE & RND internals.
The
This example uses the function and displays the internal information in what is hopefully a human readable format.
Code: Select all
#include once "fbmath.bi"
#include once "fbc-int/math.bi"
function RndAlgoToStr( byval algo as fb.FB_RND_ALGORITHMS ) as string
static algos(0 to 5) as zstring ptr = _
{ _
@"FB_RND_AUTO", _
@"FB_RND_CRT", _
@"FB_RND_FAST", _
@"FB_RND_MTWIST", _
@"FB_RND_QB", _
@"FB_RND_REAL" _
}
if( (algo >=0) and (algo <= 5) ) then
function = *algos( algo) & " (" & algo & ")"
else
function = "Unknown" & " (" & algo & ")"
end if
end function
'' MAIN
dim info as fbc.FB_RNDSTATE ptr
'' get the internal state
info = fbc.RndGetState( )
print "RNDSTATE Address : " & hex( cuint(info), sizeof(any ptr)*2 ) & " (ulong ptr)"
'' Select an radom number generator
fbc.RANDOMIZE , fb.FB_RND_MTWIST
print "Algorithm : " & RndAlgoToStr( info->algorithm )
print "FB_RND_MTWIST, FB_RND_REAL:"
print " State Length : " & info->length & " (# of bytes)"
print "Interface:"
print " RND(single) as double : " & hex( cuint(info->rndproc), sizeof(any ptr)*2 ) & " (procptr)"
print " RND32() as ulong : " & hex( cuint(info->rndproc32), sizeof(any ptr)*2 ) & " (procptr)"
print "FB_RND_MTWIST, FB_RND_REAL:"
print " State address : " & hex( cuint(@(info->state32(0))), sizeof(any ptr)*2 ) & " (ulong ptr)"
print " State index : " & hex( cuint(info->index32), sizeof(any ptr)*2 ) & " (ulong ptr)"
print "FB_RND_FAST, FB_RND_QB :"
print " State value (iseed64) : " & hex( culngint(info->iseed64), sizeof(ulongint)*2 ) & " (ulongint)"
print " State value (iseed32) : " & hex( culng(info->iseed32), sizeof(ulong)*2 ) & " (ulong)"
Code: Select all
RNDSTATE Address : 00000000004090C0 (ulong ptr)
Algorithm : FB_RND_MTWIST (3)
FB_RND_MTWIST, FB_RND_REAL:
State Length : 624 (# of bytes)
Interface:
RND(single) as double : 0000000000402B95 (procptr)
RND32() as ulong : 0000000000402955 (procptr)
FB_RND_MTWIST, FB_RND_REAL:
State address : 00000000004090E8 (ulong ptr)
State index : 0000000000409AA8 (ulong ptr)
FB_RND_FAST, FB_RND_QB :
State value (iseed64) : 0000000000000000 (ulongint)
State value (iseed32) : 00000000 (ulong)
Last edited by coderJeff on Nov 07, 2020 14:24, edited 1 time in total.
Reason: 'fbc.RndGetInternals( @info )' change to 'info = fbc.RndGetState()'
Reason: 'fbc.RndGetInternals( @info )' change to 'info = fbc.RndGetState()'
-
- Posts: 4308
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: Threadsafe RANDOMIZE and RND
Unless I am wrong it seems to me that a mutex is used to ensure that the random number generator state, that is the state vector, is not being accessed by more than one thread at a time whilst using the one state vector location.
I looked at Mersenne Twister a little while ago using two threads and found 16%/16% because of collisions. Getting 50%/50% is a vast improvement but getting a total exceeding 100% is not possible, I reckon, using the one state vector location.
With thread safety and one state vector location we get what I would call sequence sharing, that is some generated numbers would go to one thread and the others would go to the other thread. The quality of randomness would not be compromised. I would suggest that the quality of randomness may be improved as the serial correlation coefficient may be smaller for the child generations compared with the parent.
With PCG32II, for example, we get thread safety not from using a mutex, or whatever, but by using separate locations for the state vector. With no collisions, the throughput for each generator is the same as with a single instance. Not only that we can have each generator using its own sequence.
Now whilst making thread safety available to the FreeBASIC generators is laudable, especially with fbc.rnd32() and fbc.RndGetInternals( @info ), it seems to me to be a little late in the day given that they are out shined by modern generators from Melissa O'Neill, David Blackman & Sebastiano Vigna, and Bernard Widynski. These generators are much faster and have a much better quality of randomness and can avail themselves to 32-bit ulong output and access to the state vector. With my latest Vigna generators we get 64-bit ulongint output.
Having said that, a graphics program which uses a few hundred RNDs, at most, as part of its initialization could probably get away with using any of the FreeBASIC generators; except for #5. There are no tools available that I am aware of which can give a reasonable assessment, from a randomness perspective, of a few hundred random numbers. However, if speed and/or quality of randomness are issues then the FreeBASIC generators simply no longer pass muster. At PowerBASIC there are quite few RND diehards who reckon that PowerBASIC's RND is 'good enough' and provide no evidence to support their claim. On shown where PowerBASIC's RND is not good enough they cease posting. Of course, that is true for anyone wearing blinkers.
Suppose that RND drops to 80%, say, then two threads will see 160%. That is impossible using just one state vector location. With two threads, the best would be roughly 50% each.coderJeff wrote:- You might find RND to perform a little slower on multi-threaded programs due the mutex
I looked at Mersenne Twister a little while ago using two threads and found 16%/16% because of collisions. Getting 50%/50% is a vast improvement but getting a total exceeding 100% is not possible, I reckon, using the one state vector location.
With thread safety and one state vector location we get what I would call sequence sharing, that is some generated numbers would go to one thread and the others would go to the other thread. The quality of randomness would not be compromised. I would suggest that the quality of randomness may be improved as the serial correlation coefficient may be smaller for the child generations compared with the parent.
With PCG32II, for example, we get thread safety not from using a mutex, or whatever, but by using separate locations for the state vector. With no collisions, the throughput for each generator is the same as with a single instance. Not only that we can have each generator using its own sequence.
Now whilst making thread safety available to the FreeBASIC generators is laudable, especially with fbc.rnd32() and fbc.RndGetInternals( @info ), it seems to me to be a little late in the day given that they are out shined by modern generators from Melissa O'Neill, David Blackman & Sebastiano Vigna, and Bernard Widynski. These generators are much faster and have a much better quality of randomness and can avail themselves to 32-bit ulong output and access to the state vector. With my latest Vigna generators we get 64-bit ulongint output.
Having said that, a graphics program which uses a few hundred RNDs, at most, as part of its initialization could probably get away with using any of the FreeBASIC generators; except for #5. There are no tools available that I am aware of which can give a reasonable assessment, from a randomness perspective, of a few hundred random numbers. However, if speed and/or quality of randomness are issues then the FreeBASIC generators simply no longer pass muster. At PowerBASIC there are quite few RND diehards who reckon that PowerBASIC's RND is 'good enough' and provide no evidence to support their claim. On shown where PowerBASIC's RND is not good enough they cease posting. Of course, that is true for anyone wearing blinkers.
Re: Threadsafe RANDOMIZE and RND
Hi deltarho. Correct, the mutex is to ensure only one thread accesses the single state vector.
fbc.rnd() & fbc.rnd32() automatically lock/unlock the mutex when getting the next value.
However, fbc-int/math.bi also exposes fbc.MathLock() and fbc.MathUnlock(). Andfbc.RndGetInternals( @info ) fbc.RndGetState() will return function pointers to the random number generator procedures - which just do the math part, no locking or unlocking. Which does it make it possible to lock, generate a bunch of numbers at once and then unlock.
Example #4: Explicit Lock, RndProc32, Unlock
Also at the suggestion of adeyblue when FB_RND_REAL is used (what uses the crypto API), the Mersenne Twister buffer is used to buffer 624 ulongs at one time. As he put it, reading a single value is criminal. On my PC, FB_RND_REAL is about 20 times slower than the other PRNGs.
fbc.rnd() & fbc.rnd32() automatically lock/unlock the mutex when getting the next value.
However, fbc-int/math.bi also exposes fbc.MathLock() and fbc.MathUnlock(). And
Example #4: Explicit Lock, RndProc32, Unlock
Code: Select all
#include "fbgfx.bi"
#include "fbmath.bi"
#include "fbc-int/math.bi"
dim shared info as fbc.FB_RNDSTATE ptr
sub FillImageRnd32( byval image as fb.image ptr )
'' example is for 4 bytes per pixel only
assert( image->bpp = 4)
fbc.MathLock()
dim dst as ulong ptr = cast( ulong ptr, image + 1 )
for y as integer = 0 to image->height - 1
for x as integer = 0 to image->width-1
dst[ y * (image->pitch \ 4) + x] = info->rndproc32()
next
next
fbc.MathUnlock()
end sub
'' for demonstration only FB_RND_REAL is much slower than all other PRNGs
fbc.randomize , fb.FB_RND_REAL
info = fbc.RndGetState( )
screenres 640, 480, 32
dim as fb.image ptr image = ImageCreate( 128, 128 )
do
FillImageRnd32( image )
put( 0, 0 ), image, pset
loop until inkey <> ""
ImageDestroy( image )
Last edited by coderJeff on Nov 07, 2020 14:31, edited 1 time in total.
Reason: changes fbc-int.bi
Reason: changes fbc-int.bi
Re: Threadsafe RANDOMIZE and RND
I debated with myself for about a week if I should even bother with this update. I expected that it would be received as lacking.deltarho[1859] wrote:With PCG32II, for example, we get thread safety not from using a mutex, or whatever, but by using separate locations for the state vector. With no collisions, the throughput for each generator is the same as with a single instance. Not only that we can have each generator using its own sequence.
Now whilst making thread safety available to the FreeBASIC generators is laudable, especially with fbc.rnd32() and fbc.RndGetInternals( @info ), it seems to me to be a little late in the day given that they are out shined by modern generators from Melissa O'Neill, David Blackman & Sebastiano Vigna, and Bernard Widynski. These generators are much faster and have a much better quality of randomness and can avail themselves to 32-bit ulong output and access to the state vector. With my latest Vigna generators we get 64-bit ulongint output.
For a new PRNG to be added:
- adding it to RANDOMIZE & RND is straight-forward.
- For PCG32II or any other PRNG, I need an implementation that I can reference and use to code in C.
- Whatever source code I create or use need to be able to release under LGPL 2.1 with our linking exception.
- This part is quite easy if can point me to the right algorithm to use.
To create individual state vectors:
- need something different from RANDOMIZE & RND only
- internally, https://github.com/freebasic/fbc/blob/m ... math_rnd.c needs a rewrite
- math_rnd.c needs to be rewritten so that a user allocated state vector can be specified
- but still need to have a global thread safe state vector by default for backwards source compatibility
- would be nice to split up the PRNG's in to separate modules, making it possible to link to a specific PRNG and not bloat the executable with unused code.
Re: Threadsafe RANDOMIZE and RND
Here's the test code I've been using to get an idea of the timings.
OUTPUT EXAMPLE:
By compiling for 1 thread, can compare what I think is the overhead of locking/unlock against non-locking.
$ fbc-win32 dotest.bas -gen gas -exx -d NTHREADS=1 -mt
With 4 threads, a lot of extra time is spent waiting on the mutex, but the no-lock timings are about the same.
$ fbc-win32 dotest.bas -gen gas -exx -d NTHREADS=4 -mt
Code: Select all
#include once "fbc-int/math.bi"
const outputfile = "rnd-results.txt"
const MAX_COUNT = 10000000
const MAX_TRIALS = 3
sub PrintOut( byref text as const string, byval dontlog as boolean = false )
print text;
if( dontlog = false ) then
open outputfile for append as #1
print #1, text;
close #1
end if
end sub
#if __FB_MT__
#ifdef NTHREADS
const MAX_THREADS = NTHREADS
#else
const MAX_THREADS = 4
#endif
#else
const MAX_THREADS = 1
#endif
type TESTINFO
n as ulongint
rndinfo as FBC.FB_RNDINTERNALS
end type
sub do_rnd(byval p as any ptr)
dim arg as TESTINFO ptr = p
dim d as double
for i as integer = 1 to arg->n
d += fbc.rnd()
next
end sub
sub do_rnd32(byval p as any ptr)
dim arg as TESTINFO ptr = p
dim d as ulongint
for i as integer = 1 to arg->n
d += fbc.rnd32()
next
end sub
sub do_rnd_nolock(byval p as any ptr)
dim arg as TESTINFO ptr = p
dim d as double
for i as integer = 1 to arg->n
d += arg->rndinfo.rndproc()
next
end sub
sub do_rnd32_nolock(byval p as any ptr)
dim arg as TESTINFO ptr = p
dim d as ulongint
for i as integer = 1 to arg->n
d += arg->rndinfo.rndproc32()
next
end sub
function PerformTest _
( _
title as string, _
thread as sub(byval arg as any ptr), _
threads as integer, _
count as ulongint _
) as double
printout( title, true )
dim as double t = timer
dim arg as TESTINFO
arg.n = count
fbc.rndGetInternals( @(arg.rndinfo) )
#if __FB_MT__
dim as any ptr thread_ptr(threads-1)
for i as integer = 0 to threads-1
thread_ptr(i)=threadcreate(thread, @arg)
sleep 10
next i
for i as integer = 0 to threads-1
threadwait(thread_ptr(i))
next i
#else
for i as integer = 0 to threads-1
thread( @arg )
next
#endif
t = timer - t
printout( ": " & cuint(t*1000) & " msec" & !"\n", true )
function = t
end function
type TESTPROC
title as zstring ptr
proc as sub (byval arg as any ptr)
nolocks as boolean
end type
dim testprocs(0 to ...) as TESTPROC = _
{ _
( @"rnd" , @do_rnd , false ), _
( @"rnd32" , @do_rnd32 , false ), _
( @"rndproc (nolock)" , @do_rnd_nolock , true ), _
( @"rndproc32 (nolock)", @do_rnd32_nolock , true ) _
}
type GENERATOR
title as zstring ptr
index as integer
iterations as integer
end type
dim generators(0 to ... ) as GENERATOR = _
{ _
( @"CRT" , 1, MAX_COUNT ), _
( @"FAST" , 2, MAX_COUNT ), _
( @"MTWIST" , 3, MAX_COUNT ), _
( @"QB" , 4, MAX_COUNT ), _
( @"REAL" , 5, MAX_COUNT\20 ) _
}
type RESULT
avg_t as double
end type
dim results(0 to ubound( testprocs ), 0 to ubound(generators) ) as RESULT
for trial as integer = 1 to MAX_TRIALS
for test as integer = 0 to ubound( testprocs )
for gen as integer = 0 to ubound(generators)
fbc.randomize , generators( gen ).index
dim title as string = ""
dim as integer threads = MAX_THREADS
dim as integer iterations = generators( gen ).iterations
#if __FB_MT__
title &= "MT "
#else
title &= "ST "
#endif
title &= "trial #" & trial
if( testprocs( test ).nolocks ) then
threads = 1
title &= ", threads=" & 1 & " (main only)"
else
title &= ", threads=" & threads
end if
title &= ", N=" & iterations
title &= " gen=" & *generators( gen ).title
title &= " test=" & *testprocs( test ).title
var t = PerformTest( title, testprocs( test ).proc, threads, iterations\threads )
'' accumulate average
with results( test, gen )
.avg_t += (.avg_t * cdbl(trial-1) + t*1000000000/iterations ) / cdbl(trial)
end with
next
next
next
printout( !"\n" )
#ifdef __FB_64BIT__
printout( "64-bit - " )
#else
printout( "32-bit - " )
#endif
printout( __FB_BACKEND__ & !"\n" )
#if __FB_MT__
printout( "Threads : " & MAX_THREADS & !"\n" )
printout( "Threads (nolock) : " & 1 & !" (main only)\n" )
#else
printout( "Single threaded" & !"\n" )
#endif
printout( "Average of " & MAX_TRIALS & " trials (in nanosec)" & !"\n" )
printout( space(20) )
for col as integer = 0 to ubound(generators)
printout( right( space(10) & *generators(col).title, 10 ) )
next
printout( !"\n" )
for test as integer = 0 to ubound(testprocs)
printout( left( *testprocs(test).title & space(20), 20 ) )
for gen as integer = 0 to ubound(generators)
printout( right( space(10) & cuint( results( test, gen ).avg_t ), 10 ) )
next
printout( !"\n" )
next
By compiling for 1 thread, can compare what I think is the overhead of locking/unlock against non-locking.
$ fbc-win32 dotest.bas -gen gas -exx -d NTHREADS=1 -mt
Code: Select all
32-bit - gas
Threads : 1
Threads (nolock) : 1 (main only)
Average of 3 trials (in nanosec)
CRT FAST MTWIST QB REAL
rnd 252 232 301 229 7435
rnd32 187 129 209 129 7365
rndproc (nolock) 166 140 213 135 7242
rndproc32 (nolock) 99 44 118 49 7247
$ fbc-win32 dotest.bas -gen gas -exx -d NTHREADS=4 -mt
Code: Select all
32-bit - gas
Threads : 4
Threads (nolock) : 1 (main only)
Average of 3 trials (in nanosec)
CRT FAST MTWIST QB REAL
rnd 574 508 625 577 8855
rnd32 415 545 547 548 8652
rndproc (nolock) 167 140 212 135 7145
rndproc32 (nolock) 97 44 118 50 7272
Re: Threadsafe RANDOMIZE and RND
Code to compile with the '-mt' option.coderJeff wrote:Example #4: Explicit Lock, RndProc32, UnlockCode: Select all
#include "fbgfx.bi" #include "fbc-int/math.bi" dim shared info as fbc.FB_RNDINTERNALS sub FillImageRnd32( byval image as fb.image ptr ) '' example is for 4 bytes per pixel only assert( image->bpp = 4) fbc.MathLock() dim dst as ulong ptr = cast( ulong ptr, image + 1 ) for y as integer = 0 to image->height - 1 for x as integer = 0 to image->width-1 dst[ y * (image->pitch \ 4) + x] = info.rndproc32() next next fbc.MathUnlock() end sub fbc.randomize , fbc.FB_RND_REAL fbc.RndGetInternals( @info ) screenres 640, 480, 32 dim as fb.image ptr image = ImageCreate( 128, 128 ) do FillImageRnd32( image ) put( 0, 0 ), image, pset loop until inkey <> "" ImageDestroy( image )
-
- Posts: 4308
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: Threadsafe RANDOMIZE and RND
That is what I do with CryptoRndII; I use 128KB buffers when using BCryptGenRandom and 32KB buffers when using Intel RdRand. There is a bit more to it than that: Two buffers are used and both filled initially. When the first buffer is exhausted we switch to the second buffer and then start filling the first buffer again, and so on. There is a bit more: Each buffer is split into two and each half is populated with a separate thread of execution. In practice the likelihood of waiting for a buffer to be filled before it can be used is almost nil.coderJeff wrote:As he [adeyblue] put it, reading a single value is criminal. On my PC, FB_RND_REAL is about 20 times slower than the other PRNGs.
It is worth noting that CryptGenRandom, used with generator #5 in Windows, was designed to fill a buffer.
With regard the timings you forgot to remove '-exx' which normally knocks the stuffing out of the performance.
I have found that gas is diabolically slow with my generators and use gcc with -O2 optimization.
If that is true, then I dislike the sound of that. The FreeBASIC generators are already slow compared with modern generators. Of course, with some applications speed is not an issue.a lot of extra time is spent waiting on the mutex
I am a descending voice here. I would be interested in what others have to say.
Re: Threadsafe RANDOMIZE and RND
I get results tighter than yours:
Code: Select all
32-bit - gas
Single threaded
Average of 3 trials (in nanosec)
CRT FAST MTWIST QB REAL
rnd 81 38 50 15 3687
rnd32 52 10 19 9 3604
rndproc (nolock) 83 44 53 17 3708
rndproc32 (nolock) 60 14 24 15 3589
Code: Select all
32-bit - gas
Threads : 4
Threads (nolock) : 1 (main only)
Average of 3 trials (in nanosec)
CRT FAST MTWIST QB REAL
rnd 374 306 345 213 3839
rnd32 339 204 248 198 3883
rndproc (nolock) 83 44 63 20 3760
rndproc32 (nolock) 61 15 24 14 3647
Re: Threadsafe RANDOMIZE and RND
Keep in mind that using TLS itself increases execution time a bit because it induces successive indirections for all accesses to these thread-local static variables.
-
- Posts: 4308
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: Threadsafe RANDOMIZE and RND
Using the same code and command line?fxm wrote:I get results tighter than yours: