Update

2023-05-21 15:09:23 +02:00 · 2023-05-21 15:09:23 +02:00 · 7a4c69819f
commit 7a4c69819f
parent a7b086a309
14 changed files with 59 additions and 18 deletions
--- a/randomness.md
+++ b/randomness.md
@ -8,7 +8,7 @@ TODO

 WORK IN PROGRESS { Also I'm not too good at statistics lol. ~drummyfish }

-Here is a sequence of bits which we most definitely could consider truly random as it was generated by physical coin tosses:
+Here is a sequence of 1000 bits which we most definitely could consider truly random as it was generated by physical coin tosses:

 { The method I used to generate this: I took a plastic bowl and 10 coins, then for each round I threw the coins into the bowl, shook them (without looking, just in case), then rapidly turned it around and smashed it against the ground. I took the bowl up and wrote the ten generated bits by reading the coins kind of from "top left to bottom right" (heads being 1, tails 0). ~drummyfish }

@ -105,4 +105,11 @@ number: 0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
 count:  18 14 19 18 23 15 18 11 11 14 9  10 13 20 18 19
 ```

-TODO: see how much some compression program can compress it? Visualize it somehow to reveal correlations?
+Another way to test data randomness may be by **trying to [compress](compression.md) it**, since compression is basically based on removing regularities, redundancy, leaving only randomness. A compression algorithm exploits [correlations](correlation.md) in input data and removes that which can later be reasoned out from what's left, but with a completely random data nothing should be correlated, it shouldn't be possible to reason out parts of such data from other parts of that data, hence compression can remove nothing and it shouldn't generally be possible to compress completely random data (though of course there exists a non-zero probability that in rare cases random data will have regular structure and we will be able to compress it). Let us try to perform this test with the `lz4` compression utility -- we convert our 1000 random bits to 125 random bytes and try to compress them. Then we will try to compress another sequence of 125 bytes, this time a non-random one -- a repeated alphabet in ASCII (`abcdefghijklmnopqrstuvwxyzabcdef...`). Here are the results:
+
+| sequence (125 bytes) | compressed size  |
+| -------------------- | ---------------- |
+| our random bits      | 144 (115.20%)    |
+| `abcdef...`          | 56 (44.80%)      |
+
+We see that while the algorithm was able to compress the non-random sequence to less than a half of the original size, it wasn't able to compress our data, it actually made it bigger! This suggests the data is truly random. Of course it would be good to test multiple compression algorithms and see if any one of them finds some regularity in the data, but the general idea has been presented.