Update

2024-12-10 20:41:11 +01:00 · 2024-12-10 20:41:11 +01:00 · 63ed9985ce
commit 63ed9985ce
parent c5b4bdd5dc
20 changed files with 1896 additions and 1861 deletions
--- a/elo.md
+++ b/elo.md
@ -1,20 +1,26 @@
 # Elo

-The Elo system (named after Arpad Elo, NOT an [acronym](acronym.md)) is a mathematical system for rating the relative strength of players of a certain game, most notably and widely used in [chess](chess.md) but also elsewhere (video games, table tennis, ...). Based on number of wins, losses and draws against other Elo rated opponents, the system computes a number (rating) for each player that highly [correlates](correlation.md) with that player's current strength/skill; as games are played, ratings of players are constantly being updated to reflect changes in their strength. The numeric rating can then be used to predict the probability of a win, loss or draw of any two players in the system, as well as e.g. for constructing ladders of current top players and matchmaking players of similar strength in online games. For example if player A has an Elo rating of 1700 and player B 1400, player A is expected to win in a game with player B with the [probability](probability.md) of 85%. Besides Elo there exist alternative and improved systems, notably e.g. the [Glicko](glicko.md) system (which further adds e.g. confidence intervals).
+The Elo system (named after Arpad Elo, NOT an [acronym](acronym.md)) is a mathematical system for rating the relative strength of players in a certain competitive [game](game.md), most notably and widely used in [chess](chess.md) but also elsewhere (video games, table tennis, ...). { I have seen a cool video where someone computed the Elo of all NPC players in the Pokemon games. ~drummyfish } Based on number of wins, losses and draws against other Elo rated opponents, the system computes a [number](number.md) (**rating**) for each player that highly [correlates](correlation.md) with that player's current strength/skill; as games are played, ratings of players are constantly being updated to reflect changes in their strength. The numeric rating can then be used to predict the [probability](probability.md) of a win, loss or draw of any two players in the system, as well as doing all kinds of other nice things such as tracking player's improvement over time, constructing ladders of current top players and matchmaking players of similar strength in online games. For example if player *A* has an Elo rating of 1700 and player *B* 1400, *A* is expected to win in a game with player *B* with the [probability](probability.md) of 85%. The system is designed in a very clever way -- it uses the ability to estimate the outcome of a game between two players and then corrects the ratings of the players based on whether they do better or worse than expected. This way the ratings change and converge to stop near the value that reflects the player's true strength. Elo is a system designed in a smart way but still remains mathematically a pretty "[keep it simple](kiss.md)" one -- this means it has a few flaws and shortcomings (mentioned below), which keep being addressed by alternative rating systems such as [Glicko](glicko.md) (which further adds e.g. confidence intervals). However the simplicity of Elo has also shown to be a big advantage, it does a great job for a very small "price" and this quality to price ratio so far seems to be uncontested. Elo is [good enough](good_enough.md) for most practical uses without requiring too complex mathematics or large amounts of data constantly being available. For this it remains in wide use despite other systems being objectively more accurate in predictions: usually the high complexity of the competing systems shows only [diminishing returns](diminishing_returns.md).

-The Elo system was created specifically for chess (even though it can be applied to other games as well, it doesn't rely on any chess specific rules) and described by Arpad Elo in his 1978 book called *The Rating of Chessplayers, Past and Present*, by which time it was already in use by FIDE. It replaced older rating systems, most notably the [Harkness](harkness.md) system. Despite more "advanced" systems being around nowadays, Elo remains the most widely used one.
+What we call a "game" here need not always be a typical game, Elo rating may be used for example in a video sharing platform to help the recommendation [algorithm](algorithm.md) by letting videos compete for attention and then assigning them rating. When the site recommends the user two videos at once, they are effectively playing a game to win attention: whichever gets clicked wins the game, and this way we may find out which videos are the most popular AND also how popular each one is relative to others.

-**Elo rates only RELATIVE performance**, not absolute, i.e. the rating number of a player says nothing in itself, it is only the DIFFERENCE in rating points between two players that matters, so  in an extreme case two players rated 300 and 1000 in one rating pool may in another one be rated 10300 and 11000 (the difference of 700 is the only thing that stays the same, mean value can change freely). This may be influenced by initial conditions and things such as **rating inflation** (or deflation) -- if for example a [chess](chess.md) website assigns some start rating to new users which tends to overestimate an average newcomer's abilities, newcomers will come to the site, play a few games which they will lose, then they [ragequit](ragequit.md) but they've already fed their points to the good players, causing the average rating of a good player to grow over time.
+The Elo system was created specifically for chess (even though it can be applied to other games as well, it doesn't rely on any chess specific rules) and described by Arpad Elo in his 1978 book called *The Rating of Chessplayers, Past and Present*, by which time it was already in use by FIDE. It replaced older rating systems, most notably the [Harkness](harkness.md) system.
+
+**Elo rates only RELATIVE performance**, not absolute, i.e. the rating number of a player says nothing in itself, it is only the DIFFERENCE in rating points between two players that matters, so  in an extreme case two players rated 300 and 1000 in one rating pool may in another one be rated 10300 and 11000 (the difference of 700 is the only thing that stays the same, mean value can change freely). This may be influenced by initial conditions and things such as **rating inflation** (or deflation) -- if for example a [chess](chess.md) website assigns some start rating to new users which tends to overestimate an average newcomer's abilities, newcomers will come to the site, play a few games which they will lose, then they [ragequit](ragequit.md) but they've already fed their points to the good players, causing the average rating of a good player to grow over time (it's basically like an economy where the rating points are the currency, new overrated players have the same effect as printing money). This is one of several issues the Elo system has to deal with. Other issues include for example [magic constants](magic_constant.md): the constant *K* (change rate) and the initial rating of new players have to somehow be set, and the system itself doesn't say what the ideal values are.
+
+Yet another shortcoming is that **ratings (including relative differences) depend on the order of games**. I.e. when several games are played between N players and we update the ratings after each game, then the ratings of all the players (and their differences, i.e. predictions the system will make) at the end will depend on the order in which the games were played -- playing the games with exact same results but in different order will generally result in different ratings. This also holds for grouping: we may update ratings after each game or group several games together and count them as one match, outcome of which will be the average outcome of all the games -- and this may affect ratings too. So the rating partially depends on something that has nothing to do with the player's skill. This may not be such a huge problem in practice, tiny differences and fluctuations are usually ignored, but eventually this IS an undesirable property of the system. Some other systems address this by always computing every player's rating based on whole history of games he ever played, which fixes the issue but also brings in more computational complexity (imagine having to recompute everything from scratch after every single game, AND having to keep the record of complete history of all games).
+
+It also must be said that **Elo is a [simplification](approximation.md) of reality**, as is any attempt at capturing skill with a single number -- even though it is a very good predictor of something akin a "skill" and outcomes of games, trying to capture "skill" with a single number is similar to trying to capture such a multidimensional attribute as intelligence with a single dimensional [IQ](iq.md) number. For example due to psychology, many different areas of the game to be mastered and different playstyles [transitivity](transitivity.md) may be broken in reality: it may happen that player *A* mostly beats player *B*, player *B* mostly beats player *C* and player *C* mostly beats player *A*, which Elo won't capture. However this is not an issue of the Elo system specifically but rather of our simplified model of reality -- any other system that tries to capture skill as a one dimensional number, no matter how advanced, will suffer the same flaw.
+
+Besides mathematical inaccuracies Elo (as well as other systems in general) also comes with more potential practical problems such as creating focus on grinding (players strategically choosing weaker opponents to maximize their rating), players refusing to play in order to not lose points, removing [fun](fun.md) from games by implementing super effective matchmaking that just maximizes number of draws etcetc. Despite all the described flaws however it must be held that Elo is pretty nice and very useful, it's usually just its wrong application (for example in the mentioned matchmaking) where it starts to create trouble.

 Elo rating can also be converted to (or from) alternative measures such as material or time advantage, i.e. given let's say two chess players with known ratings, we may be able to say how big of a handicap the stronger player must suffer in order for the two to be on par. However the relationship will probably not be simple, we can't say "this much Elo difference equals this many pawns in handicap" -- having a two pawn material advantage in a beginner game hardly makes a difference but on the absolute top level losing two pawns is decisively also a lost game (despite this some approximations were given, e.g. Fisher and Kannan estimated that in computer chess 100 Elo was roughly equivalent to one pawn).

-**Keep in mind Elo is a big simplification of reality**, as is any attempt at capturing skill with a single number -- even though it is a very good predictor of something akin a "skill" and outcomes of games, trying to capture a "skill" with a single number is similar to e.g. trying to capture such a multidimensional thing as intelligence with a single dimensional [IQ](iq.md) number. For example due to many different areas of a game to be mastered and different playstyles [transitivity](transitivity.md) may be broken in reality: it may happen that player A mostly beats player B, player B mostly beats player C and player C mostly beats player A, which Elo won't capture.
-
 ## How It Works

 Initial rating of players is not specified by Elo, each rating organization applies its own method (e.g. assign an arbitrary value of let's say 1000 or letting the player play a few unrated games to estimate his skill).

-Suppose we have two players, player 1 with rating *A* and player 2 with rating *B*. In a game between them player 1 can either win, i.e. score 1 point, lose, i.e. score 0 points, or draw, i.e. score 0.5 points.
+Suppose we have two players, player 1 with rating *A* and player 2 with rating *B*. In a game between them player 1 can either win, i.e. score 1 point, lose, i.e. score 0 points, or draw, i.e. score 0.5 points. (Some games may allow to give more possible outcomes besides win/loss/draw, some wins may be "stronger" than others for example -- this is still compatible with Elo as long as we can map the outcome to the range between 0 and 1.)

 The expected score *E* of a game between the two players is computed using a [sigmoid function](sigmoid.md) (400 is just a [magic constant](magic_constant.md) that's usually used, it makes it so that a positive difference of 400 points makes a player 10 times more likely to win):