Update

2025-03-29 16:59:03 +01:00 · 2025-03-29 16:59:03 +01:00 · 651f779374
commit 651f779374
parent 3fe12a0939
25 changed files with 1986 additions and 1969 deletions
--- a/probability.md
+++ b/probability.md
@ -45,9 +45,13 @@ Showing this is nothing more than simply [normalization](normalization.md) by th

 **[Venn diagrams](venn_diagram.md) are excellent for visualizing probabilities**. Imagine the space of all possibilities as a circle with area equal to 1 and then events as other smaller circles inside this circle. Area occupied by each circle is the corresponding event's probability. Now imagine performing an experiment as choosing ta random a point in the the big circle, for example by blindly throwing a dart. It's clear that the larger an event's circle is, the higher chance it has of being hit. Events with non-overlapping circles are mutually exclusive as there is no way the dart could ever land simultaneously in two non-overlapping areas. It's clear that the probability of one of several mutually exclusive events occurring is the sum of the corresponding circles' areas, just like stated by the equation above. Overlapping circles represent events allowed to happen simultaneously. Should events *x* and *y* overlap, then the conditional probability *P(x|y)* is the proportion of *x*'s area inside *y* to the whole area of *y*. And so on.

-**Probability distribution [functions](function.md)**: until now we've implicitly assumed that the all possible outcomes (events) of an experiment are equally likely to occur, i.e. that for instance each marble in a box has the same likelihood of being picked etc. In real life scenarios this frequently doesn't hold however, for example the likelihood of a human being born with red hair is lower than that of being born with dark hair (considering we don't have further information about parents etc.). This is modeled by so called *probability distribution function* -- this function says how likely each possible outcome is. For a finite number of discrete outcomes, such as possible hair colors, the functions may simply state the probability directly, e.g. *p_hair_color(black) = 0.75*, *p_hair_color(red) = 0.01* etc. For continuous values, such as human height, the situation gets slightly more complicated: the function cannot directly state a probability of a single value, only a probability of a value falling within certain INTERVAL. Consider e.g. asking about the probability of a human being exactly 1.75 meters tall? It's essentially 0 because anyone getting even very short of said height will always be at least a few micrometers off. So we should rather ask what's the probability of someone being between 1.75 and 1.76 meters tall? And this already makes good sense. For this continuous values are rather described by so called **probability density functions**, which must be [integrated](integral.md) over given interval in order to obtain a direct probability. There further exist equivalent kinds of functions such as cumulative distribution functions that say the probability of of the value being *x* or lower, but we won't delve into these now. Some of the most important probability distributions are **uniform** (all events are equally likely), **normal**, which is continuous, has the bell curve shape and describes many variables in nature, for example [IQ](iq.md) distribution or height of trees in a forest, and **binomial** (described below), which is a discrete distribution, in shape similar to normal distribution, saying probability of given number of successful experiments in a fixed number of experiments.
+**Probability distribution [functions](function.md)**: until now we've implicitly assumed that the all possible outcomes (events) of an experiment are equally likely to occur, i.e. that for instance each marble in a box has the same likelihood of being picked etc. In real life scenarios this frequently doesn't hold however, for example the likelihood of a human being born with red hair is lower than that of being born with dark hair (considering we don't have further information about parents etc.). This is modeled by so called *probability distribution function* -- this function says how likely each possible outcome is. For a finite number of discrete outcomes, such as possible hair colors, the functions may simply state the probability directly, e.g. *p_hair_color(black) = 0.75*, *p_hair_color(red) = 0.01* etc. For continuous values, such as human height, the situation gets slightly more complicated: the function cannot directly state a probability of a single value, only a probability of a value falling within certain INTERVAL. Consider e.g. asking about the probability of a human being exactly 1.75 meters tall? It's essentially 0 because anyone getting even very short of said height will always be at least a few micrometers off. So we should rather ask what's the probability of someone being between 1.75 and 1.76 meters tall? And this already makes good sense. For this continuous values are rather described by so called **probability density functions**, which must be [integrated](integral.md) over given interval in order to obtain a direct probability. There further exist equivalent kinds of functions such as cumulative distribution functions that say the probability of of the value being *x* or lower, but we won't delve into these now.

-Binomial distribution tells us the probability of seeing exactly *x* successful experiments if we perform *n* experiments in total. Given success probability *p*, it is computed as:
+The most basic distribution is **uniform**, one under which all events are equally likely, i.e. that which was our default assumption. It is kind of "most random" in the sense that we just lack any clue about what to expect. There is not much more to add here.
+
+**Normal** distribution is probably the second one to mention as it's very common and describes plenty of variables measured in [real life](irl.md), such as [IQ](iq.md), height of trees in a forest etc. It's a continuous distribution and always has two parameters: mean and standard deviation. Mean says the "center", "average" value, for example 100 for IQ. The curve has the bell shape, a kind of "hill" that's centered on the mean value, and whose width depends on the standard deviation parameter. In essence this distribution says that most likely values to be measured are the ones around the center (e.g. IQ 100), and values further and further away from the center (e.g. very low or high IQ) get progressively less likely to be observed. Normal distribution is so common in the nature because it is what we get when we average many variables with uniform distribution. Consider for example we let a computer generate 3 random numbers in the range 0 to 10 -- the likelihood of the average of these numbers being close to  the middle value, 5, is quite high because there are MANY WAYS to obtain such average (0, 5, 10; 5, 1, 9; 5, 5, 5; ...); however the likelihood of obtaining the average of 10 is very low because there is only one way to get it (10, 10, 10).
+
+**Binomial distribution** is another useful one -- a discrete distribution telling us the probability of seeing exactly *x* successful experiments if we perform *n* experiments in total. Given success probability *p*, it is computed as:

 *Bi(n,p,x) = binomial(n,x) * p^x * (1 - p)^(n - x)*