This commit is contained in:
Miloslav Ciz 2024-03-14 23:30:14 +01:00
parent 6ac96e5e81
commit a8a438148b
33 changed files with 1816 additions and 1753 deletions

View file

@ -2,9 +2,9 @@
*Not to be confused with [pseudorandomess](pseudorandomness.md).*
Randomness means unpredictability, lack of patterns and/or behavior without cause. Random events can only be predicted imperfectly using [probability](probability.md), because there is something present that's subject to chance, something we don't know; events may be random to us either because they are inherently random (i.e. they really have no cause, pattern etc.) or because we just lack knowledge or practical ability to perfectly predict the events. Randomness is one of the most basic, yet also one of the most difficult concepts to understand about our [Universe](universe.md) -- it's a phenomenon of immense practical importance, we encounter it every second of our daily lives, but it's also of no lesser interest to science, philosophy, art and religion. Whole libraries could be filled just with books about this topic, here we will be able to only scratch the surface of it by taking a look at the very basics of randomness, mostly as related to [programming](programming.md) and [math](math.md).
Randomness means unpredictability, lack of patterns, and/or behavior without cause. Random events can only be predicted imperfectly using [probability](probability.md) because there is something present that's subject to chance, something we don't know; events may be random to us either because they are inherently random (i.e. they really have no cause, pattern etc.) or because we just lack knowledge or practical ability to perfectly predict the events. Randomness is one of the most basic, yet also one of the most difficult concepts to understand about our [Universe](universe.md) -- it's a phenomenon of uttermost practical importance, we encounter it every second of our daily lives, but it's also of no lesser interest to science, philosophy, art and religion. Whole libraries could be filled just with books about this topic, here we will be able to only scratch the surface of it by taking a look at the very basics of randomness, mostly as related to [programming](programming.md) and [math](math.md).
As with similarly wide terms, the word *randomness* and *random* may be defined in different ways and change meaning slightly depending on context, for example sometimes we have to distinguish between "true" randomness, such as that we encounter in [quantum mechanics](quantum.md) or that present in nondeterministic mathematical models, and [pseudorandomness](pseudorandomness.md) (what as a programmer you'll be probably dealing with), i.e. imitating this true randomness with [deterministic](determinism.md) ("non-randomly behaving") systems, e.g. sequences of numbers that are difficult to [compress](compression.md). Other times we call random anything at all that just deviates from usual order, as in "someone started randomly spamming me in chat". Let's briefly review a few terms related to this topic:
As with similarly wide spanning terms the word *randomness* and *random* may be defined in different ways and change meaning slightly depending on context, for example sometimes we have to distinguish between "true" randomness, such as that we encounter in [quantum mechanics](quantum.md) or that present in nondeterministic mathematical models, and [pseudorandomness](pseudorandomness.md) (what as a programmer you'll be probably dealing with), i.e. imitating this true randomness with [deterministic](determinism.md) ("non-randomly behaving") systems, e.g. sequences of numbers that are difficult to [compress](compression.md). Other times we call random anything at all that just deviates from usual order, as in "someone started randomly spamming me in chat". Let's briefly review a few terms related to this topic:
- **randomness**: The wide term meaning great unpredictability, which may be inherent or just apparent. We usually divide it to:
- **true randomness**: Randomness that is caused by inherently unpredictable behavior of a system, i.e. behavior that truly has no cause and is decided purely by chance, without ever being able to be perfectly predicted, even just theoretically; this is contrasted with pseudorandomness. A typical example given is [quantum physics](quantum.md) in which true randomness seems to be present in things such as some properties of elementary particles of the Universe -- though in fact this can never be proven with certainty, there is so much evidence of us not being able to predict quantum phenomena that we just mostly take it for the closest thing to true randomness in real world. However we can also see some purely mathematical models to have true randomness, simply because they define it so, e.g. a nondeterministic [Turing machine](turing_machine.md) is simply defined to sometimes make purely random decisions.
@ -16,7 +16,7 @@ As with similarly wide terms, the word *randomness* and *random* may be defined
- **[stochasticity](stochastic.md)**: Basically mathematics that deals with randomness and probability in some way, the term is often used as an attribute of a mathematical model, i.e. stochastic model is that which is somehow described in terms of probabilities.
- **[entropy](entropy.md)**: A measure related to randomness, saying how much [information](information.md) (in [bits](bit.md)) we can extract from given message -- the higher the randomness (unpredictability), the higher the entropy because this randomness may be used to carry information.
Keep in mind **there are different "amounts" of randomness**, i.e. you should consider that **[probability distributions](probability_distribution.md)** exist and that some processes may be random only a little. It is not like there are only completely predictable and completely unpredictable systems, oftentimes we just have some small elements of chance or can at least estimate which outcomes are more likely. We see absolute randomness (i.e. complete unpredictability) only with uniform probability distribution, i.e. in variables in which all outcomes are equally likely -- for example rolling a dice. However in real life variables some values are usually more likely than others -- e.g. with adult human male height values such as 175 cm will be much more common than 200 cm; great many real life values actually have [normal distribution](normal_distribution.md) -- the one in which values around some center value are most common.
Keep in mind **there are different "amounts" of randomness** -- that is to say you should consider that **[probability distributions](probability_distribution.md)** exist and that some processes may be random only a little. It is not like there are only completely predictable and completely unpredictable systems, oftentimes we just have some small elements of chance or can at least estimate which outcomes are more likely. We see absolute randomness (i.e. complete unpredictability) only with uniform probability distribution, i.e. in variables in which all outcomes are equally likely -- for example rolling a dice. However in real life variables some values are usually more likely than others -- e.g. with adult human male height values such as 175 cm will be much more common than 200 cm; great many real life values actually have [normal distribution](normal_distribution.md) -- the one in which values around some center value are most common.
**What do random numbers look like?** This is a tricky question. Let's now consider uniform probability distribution, i.e. "absolute randomness". When we see sequences of numbers such as [1, 2, 3, 4, 5, 6, 7], [0, 0, 0, 0, 0, 0, 0, 0] or [9, 1, 4, 7, 8, 1, 5], which are "random" and which not? Intuitively we would say the first two are not random because there is a clear pattern, while the third one looks pretty random. However consider that under our assumption of uniform probability distribution all of these sequences are equally likely to occur! It is just that there are only very few sequences in which we recognize a common pattern compared to those that look to have no pattern, so we much more commonly see these sequences without a pattern coming out of random number generators and therefore we think the first two patterns are very unlikely to have come from a random source. Indeed they are, but the third, "random looking" sequence is equally unlikely (if you bet the numbers in lottery, you are still very unlikely to win), it just has great many weird looking siblings. You have to be careful, things around probability are great many times very unintuitive and tricky (see e.g. the famous [Monty Hall problem](monty_hall.md)).
@ -32,7 +32,7 @@ One of the most basic is the **[chi-squared test](chi_squared_test.md)** whose d
{ The following is a method I came up with wrote about here (includes some code): https://codeberg.org/drummyfish/my_writings/src/branch/master/randomness.md, I haven't found what this is called, it probably already exists. If you know what this method is called, please send me a mail. ~drummyfish }
**Cool randomness test**: this test tries to measure the unpredictability, the inability to predict what binary digit will follow. As an input to the test we suppose a binary sequence *S* of length *N* bits that's repeating forever (for example for *N = 2* a possible sequence is 10 meaning we are really considering an infinite sequence 1010101010...). We suppose an observer knows the sequence and that it's repeating (consider he has for example been watching us broadcast it for a long time and he noticed we are just repeating the same sequence over and over), then we ask: if the observer is given a random (and randomly long) subsequence *S2* of the main sequence *S*, what's the average probability he can correctly predict the bit that will follow? This average probability is our measured randomness *r* -- the lower the *r*, the "more random" the sequence *S* is according to this test. For different *N* there are different minimum possible values of *r*, it is for example not possible to achieve *r < 0.7* for *N = 3* etc. The following table shows this test's most random sequences for given *N*, along with their count and *r*.
**Cool randomness test**: this test attempts to measure the unpredictability, the inability to predict what binary digit will follow. As an input to the test we suppose a binary sequence *S* of length *N* bits that's repeating forever (for example for *N = 2* a possible sequence is 10 meaning we are really considering an infinite sequence 1010101010...). We suppose an observer knows the sequence and that it's repeating (consider he has for example been watching us broadcast it for a long time and he noticed we are just repeating the same sequence over and over), then we ask: if the observer is given a random (and randomly long) subsequence *S2* of the main sequence *S*, what's the average probability he can correctly predict the bit that will follow? This average probability is our measured randomness *r* -- the lower the *r*, the "more random" the sequence *S* is according to this test. For different *N* there are different minimum possible values of *r*, it is for example not possible to achieve *r < 0.7* for *N = 3* etc. The following table shows this test's most random sequences for given *N*, along with their count and *r*.
| seq. len. | most random looking sequences |count| min. r |
| --------- | --------------------------------------------------------------------------------------------- | --- | ------ |