less_retarded_wiki/probability.md
2025-03-25 00:54:31 +01:00

18 KiB

Probability

Probability (coming from Latin probabilitas, credibility; colloquially synonymous with chance) is a mathematical measure expressing how likely something is to be true (i.e. credibility, plausibility, degree of belief, ...), vitally important to all science and many fields of intellectual endeavor such as statistics, games, economy, computational art, simulations and others, also being of high interest to non-exact fields such as philosophy. Probability can also be thought of as a generalization of definitive statements about truthfulness (i.e. deciding between a few discrete options such as true, untrue and unknown) to enabling an infinite continuum of possible degrees of confidence about truthfulness of propositions, i.e. expressing "how strongly we believe something is true" using a real number value between 0 and 1 (allowing both bounds), where 1 signifies absolute certainty of the proposition being true, 0 absolute certainty of it being untrue and 0.5 absolute lack of knowledge about whether it might be true or false. Common people more often express probability as a percentage or "1 in X" value, i.e. instead of 0.25 our common speech rather uses 25% or 1 in 4, but mathematicians prefer the "0 to 1" value. Examples of probabilistic statements are: "The probability of dying in a car accident is approximately 0.01.", "A perfectly fair coin flip has a 50% probability of landing heads." etc.

Probability is the most essential concept to mathematical models exhibiting unpredictability, be it truly random systems (inherently unpredictable by their very nature, such as Markov chains) or just highly chaotic ones (deterministic and predictable in theory but unpredictable in practical terms, e.g. games of life). Indeed, most aspects of our life are more or less subject to chance and even more complex mathematical models are eventually unpredictable to some degree, even if only for high computational cost, there is hardly ever any absolute certainty of anything, and so the idea of probability is eventually unavoidable when applying mathematics to any practically encountered problem.

Simple calculations with probabilities, ones usually met by ordinary programmers for instance, are usually not overly difficult, but there's a caveat: everyone must be aware of the dangers lurking in more complex probabilistic problems, because probabilities are notoriously tricky and difficult, not just because the more complex problems naturally necessitate more complicated equations, but especially because of frequently appearing unintuitive results, perhaps best exemplified by the famous Monty Hall paradox, and further continuing to extremely divisive philosophical questions such as the Sleeping Beauty paradox and various plays on the anthropic principle, that split opinions even of experts. Probabilistic calculations may, through combinatorics, easily bring in extremely large numbers, ones that themselves are unintuitive, hard to tame even by computers and still largely theoretically unexplored. Another complication is that errors in probability calculations are difficult to detect as spotting a disparity in predicted and observed probability requires large number of experiments. Once probability gets involved, a whole plethora of new concepts and mechanisms pours into our models and their analysis, such as statistical significance, p-values, probability distributions and real numbers, bringing additional headaches and room for errors. Furthermore probability is also difficult to understand from philosophical point of view and opinions on the definition and interpretation of the term probability itself may differ, adding further noise to the debates. Is probability a concept inherent to our Universe through quantum mechanics or is just a construct of human mind? Is probability a fixed measure, or does it changed based on observer and his knowledge? What level of confidence is high enough to consider a hypothesis "proven"? Etc. For these reasons we therefore sometimes liken probability to "quantum theory of mathematics", a concept surrounded by magic, misunderstanding and counterintuitive behavior similar to that seen in quantum mechanics (which itself stems from considering introducing probabilities into the fundamental model of our Universe).

Simple examples of unintuitive nature of probability: Flipping a coin 5 times in a row, is it more likely to see the series heads, tails, tails, heads, tails, or 5 heads in a row? Indeed anyone familiar with basic math knows the probabilities are the same, but people never confronted with it almost universally see 5 heads in a row as less likely. But this gets trickier and trickier with more complex examples, for instance: rolling two dice simultaneously, what's the probability of rolling the same value on both dice? One might say it is 6 (the number of ways to roll the same value) divided by 6x6 (total number of values we can roll), i.e. 1/6; however someone else might count the total number of possibilities as a combination with repetitions, Cr(6,2) = 21, arriving at result 6/21 = 2/7, which actually makes some sense and is correct under certain assumption, but with respect to the real world we actually wanted the first result.

NOTE on terminology: a term central to probabilistic mathematics is event -- something which may or may not occur as result of what's usually called an "experiment" (also "sample", "data point", "observation", "instance" and so on). Likelihood of the event's occurrence is designated by the event's probability. An example of event may be "coin landing heads" or "life existing on Mars". Although the term event is traditionally the most common one, we may also call it an outcome, condition etc., which may however bring in confusion (considering e.g. terms such as conditional probability and so on).

To address the mysteriousness and obscurity we poked on above, let's start by asking the most basic question: What IS probability really? If we claim that "Probability of life existing on Mars is X.", what do we really mean? Was this probability different 100 years ago when we had much less knowledge about the planet? Is the exact value subject to opinion? Is there a completely objective, unquestionable way to state such probability? This is all quite messy and non-mathematical talk, so what to do now? Of course different definitions of the term "probability" exist, but which one is the most reasonable and appropriate in given situation? Well, firstly notice that as long as we're just performing formal mathematical calculations, guided by the traditional "shut up and calculate" mindset, we have no problem once we simply adopt a definition. Considering a set of options (e.g. future events), exactly one of which will occur with absolute randomness (i.e. without us having any further clue about how likely each one might occur), we define probability of condition (event) x being true as:

P(x) = (count of options satisfying x) / (count of all possible options)

i.e. the ratio of the size of a subset satisfying condition x to the size of the whole set of options. Considering for example a die roll and setting the condition (event) x to "rolling an odd number", we proceed to compute P(x) = 3 / 6 = 1/2. This nevertheless still leaves unanswered the original question of what exactly we mean by "probability of life existing on Mars", as we somehow have to connect our mathematical definition to the real world. What are all possible options here? Which are the options satisfying our condition? There either is life on Mars or isn't, and only the former satisfies our proposition, but does this imply the probability is exactly 1/2? Not if we consider that both options themselves have different probability, each of which further depends on our knowledge etc. So now what? Here comes the more opinionated, philosophical part of probability interpretation (note the similarity to quantum mechanics interpretations), i.e. relating the concept of probability to the real world. We now begin to unravel that the aura of mystery is in nature similar to that of, let's say, the concept of infinity. It's there due to most people generally sharing a vague intuitive idea of what the concept means, which works well enough in everyday communication and problem solving, but which won't suffice in the realms of deeper thinking about very abstract problems, and which breaks apart in such situations unless we make the concept clearer by a more precise definition, which are in this case the mentioned probability interpretations. We will now simplify giant volumes of literature to just a division into two main probability interpretations:

  • frequentist: As the name suggest, here probability is equated to frequency of occurrence, i.e. nothing "magical" and supernatural, literally just the ratio of how many times a positive outcome is (on average) observed per N experiments. This is great as no debates about opinions can take place, we simply gather data and compute the probability, everyone knows what the number means. For example to compute the probability of dying in a car accident, a frequentist will gather data about a million cases of deaths and sum up the ones that happened due to a car accident -- finding 10000 cases per million, he will conclude the probability is 0.01. Obviously we'll start facing issues when we can't perform many experiments, e.g. asking about life on Mars as there is only one Mars and we still can't tell whether there is life or not. Normally it is also impossible to compute EXACT probability this way because that would only be achieved by performing infinitely many experiments, so we are always just converging to the true value with some margin of error that's only lowered by conducting more and more experiments, but staying forever just shy of the exact value.
  • evidential (also Bayesian, ...): Here probability is viewed as a basically subjective degree of confidence or belief, based on EVIDENCE (i.e. knowledge) we have. The subjectivity here is in the fact that each "observer" of the world may come up with a different probability of the same event due to having different knowledge than others -- an all-knowing god would always know the exact value of either 1 or 0. E.g. asking about life on Mars, we might say that, having found no liquid water on Mars along with the strong belief that liquid water is a prerequisite for life, we estimate the probability of life existing on Mars is let's say 0.01. And somebody else -- even us in the future, having new knowledge -- might come up with a different probability, and this is fine because obviously the TRUE answer is either 1 (there is life on Mars) or 0 (no life on Mars), we are only estimating which EXPECTATION is more rational (and by how much). Here we enter the realm of speculation and more esoteric, harder to grasp values, but we also expand the horizon of what we can apply probability to.

Basics

This section describes absolute basics of probability math.

Like said above, probability is expressed with a real number between 0 and 1. Probability of event x is written as P(x) and is defined as the ratio of the number of outcomes satisfying x to the number of all possible outcomes. The number of possibilities are computed with the help of a closely related discipline called combinatorics.

Given MUTUALLY EXCLUSIVE (i.e. impossible to occur simultaneously) events x, y, z, ..., the probability of one of them occurring is the sum of their individual probabilities:

P(x OR y OR z OR ...) = P(x) + P(y) + P(z) + ...

For example the probability of a randomly picked individual to have either blond or red hair is the probability of having blond hair plus the probability of having red hair. Notice that applying this rule to all the possible events has to result in the sum of probabilities being exactly 1. For example assuming a box with marbles, of which 1 is red, 2 are green and 3 are blue, and drawing one at random, the probability of picking the red marble is P(red) = 1/6, of picking blue P(green) = 2/6 = 1/3 and of picking green P(blue) = 3/6 = 1/2; adding these together (i.e. computing the probability of it being either red, blue or green) is P(red OR green OR blue) = 1/6 + 1/3 + 1/2 = 1. That's quite logical and obvious. This fact can be conveniently exploited as many a times it's easier to compute the probability of an event NOT happening than vice versa -- and so we may compute this and then obtain our desired probability as 1 minus the computed probability. For example the probability of drawing either red or blue marble is equal to the probability of NOT drawing a green marble, i.e. P(red OR blue) = P(NOT green) = 1 - P(green) = 1/2.

Similarly given INDEPENDENT (allowed to occur simultaneously, without one influencing the other) events x, y, z, ..., the probability of them appearing simultaneously is the product of individual probabilities:

P(x AND y AND z AND ...) = P(x) * P(y) * P(z) * ...

For example rolling a die and flipping a coin at the same time, the probability of simultaneously rolling an odd number AND the coin landing heads is P(odd AND heads) = 1/2 * 1/2 = 1/4.

Conditional probability is one of an event UNDER THE ASSUMPTION that some other event will definitely have occurred. This may be a bit confusing, but it's nothing complicated really (using Venn diagrams may aid the understanding, see below). The probability of event x occurring under the assumption y occurred is written as P(x|y) and is computed as:

P(x|y) = P(x AND y) / P(y)

Showing this is nothing more than simply normalization by the probability of y, i.e. saying what portion of y's space is occupied by x that overlaps into it. For example the probability of a plane crashing is very low, as is the probability of a plane engine failure, but under the assumption that an engine HAS indeed already failed the probability of a crash will be higher than it is normally.

Venn diagrams are excellent for visualizing probabilities. Imagine the space of all possibilities as a circle with area equal to 1 and then events as other smaller circles inside this circle. Area occupied by each circle is the corresponding event's probability. Now imagine performing an experiment as choosing ta random a point in the the big circle, for example by blindly throwing a dart. It's clear that the larger an event's circle is, the higher chance it has of being hit. Events with non-overlapping circles are mutually exclusive as there is no way the dart could ever land simultaneously in two non-overlapping areas. It's clear that the probability of one of several mutually exclusive events occurring is the sum of the corresponding circles' areas, just like stated by the equation above. Overlapping circles represent events allowed to happen simultaneously. Should events x and y overlap, then the conditional probability P(x|y) is the proportion of x's area inside y to the whole area of y. And so on.

Probability distribution functions: until now we've implicitly assumed that the all possible outcomes (events) of an experiment are equally likely to occur, i.e. that for instance each marble in a box has the same likelihood of being picked etc. In real life scenarios this frequently doesn't hold however, for example the likelihood of a human being born with red hair is lower than that of being born with dark hair (considering we don't have further information about parents etc.). This is modeled by so called probability distribution function -- this function says how likely each possible outcome is. For a finite number of discrete outcomes, such as possible hair colors, the functions may simply state the probability directly, e.g. p_hair_color(black) = 0.75, p_hair_color(red) = 0.01 etc. For continuous values, such as human height, the situation gets slightly more complicated: the function cannot directly state a probability of a single value, only a probability of a value falling within certain INTERVAL. Consider e.g. asking about the probability of a human being exactly 1.75 meters tall? It's essentially 0 because anyone getting even very short of said height will always be at least a few micrometers off. So we should rather ask what's the probability of someone being between 1.75 and 1.76 meters tall? And this already makes good sense. For this continuous values are rather described by so called probability density functions, which must be integrated over given interval in order to obtain a direct probability. There further exist equivalent kinds of functions such as cumulative distribution functions that say the probability of of the value being x or lower, but we won't delve into these now. The most important probability distributions are uniform (all events are equally likely) and normal, which has the bell curve shape and which describes many variables in nature, for example IQ distribution or height of trees in a forest.

See Also