Update

2025-05-09 22:58:17 +02:00 · 2025-05-09 22:58:17 +02:00 · 1343c90ee8
commit 1343c90ee8
parent 64fd120266
19 changed files with 2055 additions and 1950 deletions
--- a/pseudorandomness.md
+++ b/pseudorandomness.md
@ -134,6 +134,7 @@ However the core of a pseudorandom generator is the quality of the sequence itse
 - **Try to [compress](compression.md) the sequence**: Truly random data should be basically impossible to compress -- you can exploit this fact and try to compress your sequence with some compression programs. It is ideal if the compression programs end up enlarging the file.
 - **Statistical tests**: Here you use objective mathematical tests -- there exist many very advanced tests, we'll only mention the very simple ones.
  - **[Histogram](histogram.md)**: Generate many numbers (but not more than the generator's period) and make a histogram (i.e. for every number count how many times it appeared) -- all numbers should appear roughly with the same frequency. If you make a nice generator, you should even see exactly the same count for every value generated -- this is explained above. Also count 1 and 0 bits in the whole sequence -- again there should be about the same number of them (exactly the same if you do it correctly). But keep in mind this only checks if you have correct frequencies of numbers, it says nothing about their distribution. Even a sequence 1, 2, 3, 4, 5, .... will pass this.
+  - **Streaks**: check the distribution of streak lengths, i.e. how many times there are runs of *n* same bits in a row, and at which position they occur. You can compute the theoretical probability distribution and it should look similar.
  - **Check statistical properties on (non-short) subintervals of the series**: This will already take into account where the numbers appear in the sequence. For example check if the average value over some smaller intervals are always close to middle value, which should hold in a series of random numbers; also the minimum and maximum value, histogram (distribution of the values) and other statistical measures should basically be close to being the same on smaller intervals as they are over the whole series if the intervals aren't very short -- i.e. just be careful about not picking too short intervals -- the smaller interval you pick, the more likely it will be (even in the ideal random sequence) that a statistical property will diverge from its expected value, i.e. don't test intervals smaller than let's say 10000 values.
  - **[Fourier transform](fourier_transform.md)** (and similar methods that give you the spectrum) -- the spectrum of the data should have equal amount of all frequencies, just like white noise.
  - **[Correlation](correlation.md) coefficients**: This is kind of the real proof of randomness, ideally no values should be correlated in your data, so you can try to compute some correlation coefficients, for example try to compute how much correlation there is between consecutive numbers (it's similar to plotting the data as coordinates and seeing if they form a line or not) -- you should ideally find no significant correlations. [Autocorrelation](autocorrelation.md) may be a good test (basically you keep performing [dot product](dot_product.md) of the series with a shifted version of the same series -- this mustn't diverge too much from 0; ideal white noise has a high peak for 0 shift and then 0 values elsewhere).