Oh no, not another PRNG

General FreeBASIC programming questions.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Oh no, not another PRNG

Post by jj2007 »

deltarho[1859] wrote:
jj2007 wrote:And that would take how much time?
You already gave me a figure of 1200 years - that has not changed.
1200 years is the worst case. On average it should be around 300 years then - still ok, though ;-)
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

This may be a daft question but instead of zipping and then encrypting why not encrypt and then zip. We don't get a giveaway in the head of the encrypted file.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Oh no, not another PRNG

Post by jj2007 »

deltarho[1859] wrote:This may be a daft question but instead of zipping and then encrypting why not encrypt and then zip. We don't get a giveaway in the head of the encrypted file.
There is no point in zipping the encrypted file because it is not compressible. Try it out, or look at this:Image
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

Of course, the encrypted file is random. Image My home-brew doesn't XOR with a random number.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Oh no, not another PRNG

Post by jj2007 »

deltarho[1859] wrote:Of course, the encrypted file is random. Image My home-brew doesn't XOR with a random number.
Yep, it's pretty random. Which technique do you use for your home-brew?
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

It uses a random addition/subtraction.

With xoroshiro128** using one round a 1MB file gets encrypted in 10.6ms, two rounds in 21.2ms, three rounds in 32.0ms and so on.

I see at MASM " a 20MB file encrypts and decrypts in about 150 milliseconds". Is that using one 32-bit PRNG or two?

Code: Select all

'#Console On
#Include "xoroshiro128.bas"
#Define CrLf Chr(10, 13)
 
' ------------------------------------------------------------
Sub EncDec( message As String, ByVal Seed As String, ByVal Rounds As ulong, ByVal flag As Long )
' flag = 1 for encryption and -1 for decryption
Dim As Long i, j, k = Len(message), temp
x128.MyRandomize( Seed )
For i = 1 To Rounds
  For j = 1 To k
    temp = x128.Range(0,255)
    message[j-1] = Asc(message, j) + temp*IIf( (temp <= 128), flag, -flag) ' Random addition/subtraction
  Next
Next
End Sub
' ------------------------------------------------------------
 
' Example usage
Dim s As String
Dim i As Long
Dim As Double t
 
's = "The time has come the walrus said" + CrLf
's += "to talk of many things" + CrLf
's += "of shoes, and ships, and sealing wax," + CrLf
's += "of cabbages, and kings," + CrLf
'S += "and why the sea is boiling hot," + CrLf
's += "and whether pigs have wings."
 
open "Test.txt" for binary as #1
s = String(LOF(1), 0)
get #1, , s
close #1
 
'Print s + CrLf
t = Timer
EncDec( s, "1844674407370955161", 3, 1 )
t = Timer - t
'Print s + CrLf
'EncDec( s, "1844674407370955161", 3, -1 )
'Print s
'Print
Print t*1000
 
Sleep
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

For a 20MB file
1 round 211ms
2 rounds 422.5ms
3 rounds 635.7ms

I reckon 3 rounds would be headache to break - but, according to my comments above, what do I know?
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Oh no, not another PRNG

Post by jj2007 »

deltarho[1859] wrote:It uses a random addition/subtraction.
That should yield similar results as an xor, indeed.
I see at MASM " a 20MB file encrypts and decrypts in about 150 milliseconds". Is that using one 32-bit PRNG or two?
Two. I just tested it with a 288,536,365 bytes file: just under 5 seconds, 18ms/MB. Another one runs at 617ms/42MB=14.7ms/MB. Which is much slower than the 150ms/20MB=7.5ms/MB, probably due to cache effects.

Here are some typical ENT results for MasmBasic Rand() - "matchPos" is the period:

Code: Select all

############ ENT results Rand(), seed=Sc03, matchPos=3.87 GB, ct=0
Entropy = 7.999988 bits per byte.

Optimum compression would reduce the size
of this 16777216 byte file by 0 percent.

Chi square distribution for 16777216 samples is 283.67, and randomly
would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 127.5063 (127.5 = random).
Monte Carlo value for Pi is 3.142549787 (error 0.03 percent).
Serial correlation coefficient is 0.000062 (totally uncorrelated = 0.0).

############ ENT results Rand(), seed=Sc04, matchPos=3.87 GB, ct=0
Entropy = 7.999990 bits per byte.

Optimum compression would reduce the size
of this 16777216 byte file by 0 percent.

Chi square distribution for 16777216 samples is 242.89, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.5034 (127.5 = random).
Monte Carlo value for Pi is 3.142116342 (error 0.02 percent).
Serial correlation coefficient is 0.000113 (totally uncorrelated = 0.0).
We did a lot of testing, google site:masm32.com "ent" "rand" to see some threads. Unfortunately, with PractRand Rand() fails at 128kB.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

Two rules for PRNGs.

Rule 1:
Top quality distribution uniformity is a necessary condition for top quality randomness but it is not sufficient.

Rule 2:
Top quality randomness is a sufficient condition for top quality distribution uniformity but it is not necessary.

With Monte Carlo work if we took the random numbers used and sorted them in ascending order or descending order the Monte Carlo result would be the same. So Monte Carlo doesn't care how the random numbers come in. However, Monte Carlo would suffer badly if the distribution has poor uniformity. The reason why we use a generator of top quality randomness is not because of its randomness but because of Rule 2.

@jj2007

I don't think that your home-brewed encryption requires top quality randomness. I am not even sure that top quality distribution uniformity is required either but I would think that good quality would be preferred over a biased distribution.

So, the fact that your generator fails PractRand at 128KB is immaterial. The first four metrics of ENT tells us that we have good distribution uniformity and since it is "almost 4 times as fast as PCG32" then it seems to me to be an ideal partner for your home-brewed encryption. It may also be useful for Monte Carlo work.

Failing PractRand at 128KB tells us that it was good to 64KB. That gives us 16K of dword floats. So if we needed more than 16K of dword floats then we should be considering a different generator; if randomness is an issue.

Added: The first for metrics of ENT are about Rule 1. If one or more of them is poor there is no point in moving on to PractRand. FreeBASIC's generator #4 has an amazing distribution uniformity but has poor quality randomness.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Oh no, not another PRNG

Post by jj2007 »

Thanks, David. I also think that my encryption does not require the ultimate PractRand champion to work. For the ENT suite, it's a really good PRNG, though - and it's fast ;-)

Still, I'd be curious to know what kind of application does require top quality randomness. Obviously, there is a need for this, otherwise there wouldn't be that tough competition between Vigna, O'Neill and others...
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

The internet is awash with people saying that Chi-square is a good test for randomness. No, it is not, it is a good test for non-randomness by testing its distribution uniformity. If your observed data was sorted then the chi-square value would be the same because it concerns itself with the relationship between expected values and observed values. Chi-squared is not concerned with randomness. The other four metrics of ENT, Entropy, Arithmetic Mean and Monte Carlo Value for Pi are also concerned with distribution uniformity. At Randomness we have: "Monte Carlo methods, which rely on random input (such as from random number generators or pseudorandom number generators), are important techniques in science.". No. Monte Carlo methods are not concerned with randomness. However, they would suffer if the distribution uniformity was poor; as mentioned above. PractRand also uses tests for distribution uniformity. Failing any of them will see PractRand abort. ENT uses Serial Correlation Coefficient which is concerned with randomness. PractRand does countless tests looking at previous bytes, words, dwords, qwords and so on dependent upon what we ask it to look at. George Marsaglia's Die Hard suite also considers 'randomness' tests, in addition to uniformity tests, but nothing like as many as PractRand.

Your ENT chi-square results give 25.00% and 50.00% telling us that it cannot find any evidence of poor distribution uniformity. According to my 'Rule 1' we need more than that to draw any conclusions with regard randomness.
For the ENT suite, it's a really good PRNG
No, it isn't because it failed PractRand at 128KB. No doubt it was doing quite well on the distribution uniformity tests but went down on the randomness tests.

Unfortunately, many people think that distribution uniformity and randomness are one and the same thing: They are not.
Still, I'd be curious to know what kind of application does require top quality randomness.
Most, if not all, of the graphics programs published on the forum use random numbers. I very much doubt that any of them require top quality randomness. If they use dodicat's Regulate function then speed is probably not an issue either. However, if the main loop has too much work then asking Regulate for 60fps may fail as 54 fps, say, may be as good as it can get. In that case a faster random number generator may help. In addition if we request a Sleep of less than 15.625ms then we get 15.625ms whether we like it or not. It may be worthwhile to use 'timeBeginPeriod"(As Ulong=1)' so if we request a Sleep of 5ms, say, then we will get a Sleep between 5ms and 6ms. I wonder how many people give any thought to how much work is being done in the main loop. Come to think of it your "almost 4 times as fast as PCG32" generator may do for our graphic's programmers using random numbers. Start a new thread introducing your generator.

OK, back to "top quality randomness". Off the top of my head I could not list what actually needs "top quality randomness". You would better served by using a search engine. I've just done a quick search and there are quite a few situations requiring "top quality randomness". There may well be members of this forum who insist on "top quality randomness".
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Oh no, not another PRNG

Post by jj2007 »

deltarho[1859] wrote:
For the ENT suite, it's a really good PRNG
No, it isn't because it failed PractRand at 128KB. No doubt it was doing quite well on the distribution uniformity tests but went down on the randomness tests.
I wrote explicitly "for the ENT suite". And for that suite, my values are really not so bad:

https://cdn.hackaday.io/files/397517006 ... iteria.txt

Code: Select all

Entropy
	7.7 and above passes.
Compression
	1% or less passes.
Chi-Square “exceed this value” percent.
	Less than 1% or greater than 99% fails.
	Between 1% and 5% or between 95% and 99% is "Suspect".
	Between 5% and 10% or between 90% and 95% is "Almost Suspect".
	Between 10% and 90% passes.
Arithmetic Mean
	Between 126 and 129 passes.
Monte Carlo - Pi
	Error less than 0.07% passes.
Serial Correlation
	Correlation smaller than 0.005 (plus or minus) passes.
MasmBasic Rand() values:

Code: Select all

Entropy = 7.999988 bits per byte.

Optimum compression would reduce the size of this 16777216 byte file by 0 percent.

Chi square distribution for 16777216 samples is 283.67, and randomly
would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 127.5063 (127.5 = random).
Monte Carlo value for Pi is 3.142549787 (error 0.03 percent).
Serial correlation coefficient is 0.000062 (totally uncorrelated = 0.0).
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

I repeat: "Unfortunately, many people think that distribution uniformity and randomness are one and the same thing: They are not."

At ENTROPY Source for true random numbers, the second link at https://cdn.hackaday..., we have, with regard the ENT suite, "The single most important value this program produces is the Chi Square test results which measure how uniform the distribution of the test data is.".

That is what I have been saying. A Chi square failure indicates poor distribution uniformity which, in turn, implies poor quality of randomness. A Chi square success does not imply quality randomness; for that we need randomness tests. ENT only has one randomness test: Serial Correlation Coefficient. https://cdn.hackaday... reckons "Correlation smaller than 0.005 (plus or minus) passes." I reckon not and would be happier with 0.0005.

The Entropy link mentions George Marsaglia's 'Die Hard' and 'Die Harder' suites. If you look at the tests the vast majority are randomness tests. TestU01 has been used for quite a few years and was used by Melissa O'Neill. O'Neill had never heard of PractRand when she developed PCG32 and now uses PractRand. Vigna also used TestU01 and came a cropper when PractRand started being used. The NIST test suite is poor. The Entropy link does not mention PractRand.

See Testing the quality of PRNGs and the answer, written in 2014, which starts "There are several statistical tests suites available." That answer was written by Chris Doty-Humphrey, the author of PractRand. He reckons that only PractRand, TestU01, and possibly RaBiGeTe are worth using. If you look at the "found bias in" figures at the top of his answer you will see why. I have RaBiGeTe but disliked using it, it has a chronic interface. I never managed to go get TestU01 working properly. PractRand is easy to use.

LCGs look quite good with ENT. PowerBASIC's RND looks good with ENT. Both fail PractRand at 4KB.

If your only interest is in distribution uniformity then ENT is OK. However, if your interest is in randomness ENT will only tell us when distribution uniformity is poor, with the first four metrics, and, therefore, randomness is poor. To test for randomness quality we need randomness tests and PractRand has plenty of them.

If we knocked out some data with a generator which passes PractRand to 1TB we would get good quality distribution uniformity. If we then sorted the data, ascending or descending, and gave that data to PractRand it would fail in a split second. Give that same sorted data to ENT and the first four metrics would give us the thumbs up; the Serial Correlation Coefficient would not.

I repeat my Rule 1: Top quality distribution uniformity is a necessary condition for top quality randomness but it is not sufficient.

O'Neill, Vigna, Doty-Humphrey and many others understand that but many folk on the internet do not. When I first got interested in random numbers I didn't either.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Oh no, not another PRNG

Post by jj2007 »

David,

You are right in all points, and I don't want to argue with our top expert about top quality randomness. My only point here was this one:
Me: For the ENT suite, it's a really good RNG
You: No, it isn't because it failed PractRand

Can you see the logical flaw? If you had written "It's not a good RNG", no problem, we could argue about a serial correlation coefficient of 0.00006 being good or not good but that's it. However: for ENT it's good. PractRand comes to another result, and that's OK - different tools have different purposes, and produce different results, as you have explained above. Thanks again, David.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Oh no, not another PRNG

Post by deltarho[1859] »

Jochen, there is no logical flaw. The ENT suite, for the first four metrics, does not tell us whether we have a good RNG or not - it only tells us when we haven't. The first paragraph of the ENT website says the program "is useful for evaluating pseudorandom number generators for encryption and statistical sampling applications". Statistical sampling applications are one of the areas requiring top drawer randomness. Top drawer distribution uniformity is not enough. I reckon that ENT's author, John Walker, believes ENT is good for randomness tests. It is not - it is good for distribution uniformity tests. John Walker wrote ENT in 2008 and I believe that at the time he did not appreciate Rule 1.

As I mentioned, the Wikipedia article on randomness says "Monte Carlo methods, which rely on random input ...". No, they don't, they rely on top drawer distribution uniformity and not "random input". Here we are in 2020 and people are still getting it wrong.

Marsaglia includes randomness tests in his 'DieHard' suite, and he wrote that in 1995.

You may come back and say: "Hold on there, David, are you saying you are right and John Walker is wrong". That is exactly what I am saying. It is worth noting that O'Neill and Vigna both used to use just TestU01 and now use PractRand in addition. I have never seen either of them mention ENT.

With regard the Serial Correlation Coefficient that is only one way to test randomness but bear in mind ENT examines a stream of bytes - PractRand examines a stream of Dwords with your 32-bit output generator and Qwords with the two Vigna generators above.

I should add that Steve Hutchesson swears by ENT. I had a similar discussion with him a few years back, but I was unable to get him to budge. He also reckoned that top drawer distribution uniformity was a sufficient condition for randomness. Grrrrr! Image
Post Reply