This commit is contained in:
Miloslav Ciz 2024-02-17 10:47:29 +01:00
parent d52544f211
commit 2c60dc3a2e
22 changed files with 1722 additions and 1663 deletions

View file

@ -2,7 +2,27 @@
*Not to be confused with [pseudorandomess](pseudorandomness.md).*
TODO
Randomness means unpredictability, lack of patterns and/or behavior without cause. Random events can only be predicted imperfectly using [probability](probability.md), because there is something present that's subject to chance, something we don't know; events may be random to us either because they are inherently random (i.e. they really have no cause, pattern etc.) or because we just lack knowledge or practical ability to perfectly predict the events. Randomness is one of the most basic, yet also one of the most difficult concepts to understand about our [Universe](universe.md) -- it's a phenomenon of immense practical importance, we encounter it every second of our daily lives, but it's also of no lesser interest to science, philosophy, art and religion. Whole libraries could be filled just with books about this topic, here we will be able to only scratch the surface of it by taking a look at the very basics of randomness, mostly as related to [programming](programming.md) and [math](math.md).
As with similarly wide terms, the word *randomness* and *random* may be defined in different ways and change meaning slightly depending on context, for example sometimes we have to distinguish between "true" randomness, such as that we encounter in [quantum mechanics](quantum.md) or that present in nondeterministic mathematical models, and [pseudorandomness](pseudorandomness.md) (what as a programmer you'll be probably dealing with), i.e. imitating this true randomness with [deterministic](determinism.md) ("non-randomly behaving") systems, e.g. sequences of numbers that are difficult to [compress](compression.md). Other times we call random anything at all that just deviates from usual order, as in "someone started randomly spamming me in chat". Let's briefly review a few terms related to this topic:
- **randomness**: The wide term meaning great unpredictability, which may be inherent or just apparent. We usually divide it to:
- **true randomness**: Randomness that is caused by inherently unpredictable behavior of a system, i.e. behavior that truly has no cause and is decided purely by chance, without ever being able to be perfectly predicted, even just theoretically; this is contrasted with pseudorandomness. A typical example given is [quantum physics](quantum.md) in which true randomness seems to be present in things such as some properties of elementary particles of the Universe -- though in fact this can never be proven with certainty, there is so much evidence of us not being able to predict quantum phenomena that we just mostly take it for the closest thing to true randomness in real world. However we can also see some purely mathematical models to have true randomness, simply because they define it so, e.g. a nondeterministic [Turing machine](turing_machine.md) is simply defined to sometimes make purely random decisions.
- **[pseudorandomness](pseudorandomness.md)**: Randomness that's at its basic level generated by a completely deterministic system, i.e. something (e.g. a sequence of numbers) that practically looks like something that would be generated by truly random system, which however stems from something completely non-random (e.g. a computer program). This is contrasted with pure randomness. Chaotic systems are mostly used to implement pseudorandomness. Pseudorandomness is used to imitate true randomness e.g. in computers, because it is mostly [good enough](good_enough.md) and true randomness is difficult to achieve.
- **non[determinism](determinism.md)**: Attribute of a [system](system.md), such as mathematical model or physics theory, of involving true randomness.
- **[chaos](chaos.md)**: Behavior that is deterministic (i.e. without true randomness) which however due to its mathematical properties is practically impossible to be predicted as there is no "nice" equation for it, resulting in practically having the same implications as true randomness. Chaotic behavior is predictable in theory but not in practice as it basically just requires "brute force" simulation, and so we often treat chaotic systems the same as completely random ones, with statistics and probability.
- **[probability](probability.md)**: Mathematical theory examining randomness, it formally models systems that include randomness and reasons about them, it gives us equations, for example it says how we infer the exact probability of something happening knowing probabilities of some individual events etc. It is a theoretical area and stresses deductive reasoning, i.e. it starts by defining a system and reasons about what such system will do.
- **[statistics](statistics.md)**: Applying probability theory to examining [data](data.md) -- like probability it is a mathematical discipline, however it is applied (rather than purely theoretical) and stresses inductive reasoning, i.e. it works "in the other direction" than probability theory; statistics starts with having some data and then tries to find a probabilistic model that would likely produce such data, potentially revealing what system really lies underneath the data.
- **[stochasticity](stochastic.md)**: Basically mathematics that deals with randomness and probability in some way, the term is often used as an attribute of a mathematical model, i.e. stochastic model is that which is somehow described in terms of probabilities.
- **[entropy](entropy.md)**: A measure related to randomness, saying how much [information](information.md) (in [bits](bit.md)) we can extract from given message -- the higher the randomness (unpredictability), the higher the entropy because this randomness may be used to carry information.
Keep in mind **there are different "amounts" of randomness**, i.e. you should consider that **[probability distributions](probability_distribution.md)** exist and that some processes may be random only a little. It is not like there are only completely predictable and completely unpredictable systems, oftentimes we just have some small elements of chance or can at least estimate which outcomes are more likely. We see absolute randomness (i.e. complete unpredictability) only with uniform probability distribution, i.e. in variables in which all outcomes are equally likely -- for example rolling a dice. However in real life variables some values are usually more likely than others -- e.g. with adult human male height values such as 175 cm will be much more common than 200 cm; great many real life values actually have [normal distribution](normal_distribution.md) -- the one in which values around some center value are most common.
**What do random numbers look like?** This is a tricky question. Let's now consider uniform probability distribution, i.e. "absolute randomness". When we see sequences of numbers such as `1, 2, 3, 4, 5, 6, 7`, `0, 0, 0, 0, 0, 0, 0, 0` or `9, 1, 4, 3, 9, 1, 5`, which are "random" and which not? Intuitively we would say the first two are not random because there is a clear pattern, while the third one looks pretty random. However consider that under our assumption of uniform probability distribution all of these sequences are equally likely to occur! It is just that there are only very few sequences in which we recognize a common pattern compared to those that look to have no pattern, so we much more commonly see these sequences without a pattern coming out of random number generators and therefore we think the first two patterns are very unlikely to have come from a random source. Indeed they are, but the third, "random looking" sequence is equally unlikely (if you bet the numbers in lottery, you are still very unlikely to win), it just has great many weird looking siblings. You have to be careful, things around probability are great many times very unintuitive and tricky (see e.g. the famous [Monty Hall problem](monty_hall.md)).
Of course we cannot say just from the sequence alone if it was generated randomly or not, the sequences above may have been generated by true randomness or by pseudorandom generator -- we even see this is kind of stupid to ask. We should rather think about what we actually mean by asking whether the sequence is "random" -- to get meaningful answers we have to specify this first. If we formulate the question precisely, we may get precise answers. Sometimes we are looking for lack of patterns -- this can be tested by programs that look for patterns, e.g. [compression](compression.md) programs; number sequences that have regularities in them can be compressed well. We may examine the sequences [entropy](entropy.md) to say something about its "randomness". Mathematicians often like to ask "how likely is it that a sequence with these properties was generated by this model?", i.e. for example listening to signals from space and capturing some numeric sequence, we may compute its properties such as distribution of values in it and then we ask how likely is it that such sequence was generated by some natural source such exploding star or black hole? If we conclude this is very unlikely, we may say the signal was probably not generated randomly and may e.g. come from intelligent lifeforms.
TODO: moar
## Randomness Tests