143 lines
8.4 KiB
Markdown
143 lines
8.4 KiB
Markdown
# Elo
|
|
|
|
The Elo system (named after Arpad Elo, NOT an [acronym](acronym.md)) is a mathematical system for rating the relative strength of players of a certain game, most notably and widely used in [chess](chess.md) but also elsewhere (video games, table tennis, ...). Based on number of wins, losses and draws against other Elo rated opponents, the system computes a number (rating) for each player that highly [correlates](correlation.md) with that player's current strength/skill; as games are played, ratings of players are constantly being updated to reflect changes in their strength. The numeric rating can then be used to predict the probability of a win, loss or draw of any two players in the system, as well as e.g. for constructing ladders of current top players and matchmaking players of similar strength in online games. For example if player A has an Elo rating of 1700 and player B 1400, player A is expected to win in a game with player B with the [probability](probability.md) of 85%. Besides Elo there exist alternative and improved systems, notably e.g. the [Glicko](glicko.md) system (which further adds e.g. confidence intervals).
|
|
|
|
The Elo system was created specifically for chess (even though it can be applied to other games as well, it doesn't rely on any chess specific rules) and described by Arpad Elo in his 1978 book called *The Rating of Chessplayers, Past and Present*, by which time it was already in use by FIDE. It replaced older rating systems, most notably the [Harkness](harkness.md) system. Despite more "advanced" systems being around nowadays, Elo remains the most widely used one.
|
|
|
|
**Elo rates only RELATIVE performance**, not absolute, i.e. the rating number of a player says nothing in itself, it is only the DIFFERENCE in rating points between two players that matters, so in an extreme case two players rated 300 and 1000 in one rating pool may in another one be rated 10300 and 11000 (the difference of 700 is the only thing that stays the same, mean value can change freely). This may be influenced by initial conditions and things such as **rating inflation** (or deflation) -- if for example a [chess](chess.md) website assigns some start rating to new users which tends to overestimate an average newcomer's abilities, newcomers will come to the site, play a few games which they will lose, then they [ragequit](ragequit.md) but they've already fed their points to the good players, causing the average rating of a good player to grow over time.
|
|
|
|
**Keep in mind Elo is a big simplification of reality**, as is any attempt at capturing skill with a single number -- even though it is a very good predictor of something akin a "skill" and outcomes of games, trying to capture a "skill" with a single number is similar to e.g. trying to capture such a multidimensional thing as intelligence with a single dimensional [IQ](iq.md) number. For example due to many different areas of a game to be mastered and different playstyles [transitivity](transitivity.md) may be broken in reality: it may happen that player A mostly beats player B, player B mostly beats player C and player C mostly beats player A, which Elo won't capture.
|
|
|
|
## How It Works
|
|
|
|
Initial rating of players is not specified by Elo, each rating organization applies its own method (e.g. assign an arbitrary value of let's say 1000 or letting the player play a few unrated games to estimate his skill).
|
|
|
|
Suppose we have two players, player 1 with rating *A* and player 2 with rating *B*. In a game between them player 1 can either win, i.e. score 1 point, lose, i.e. score 0 points, or draw, i.e. score 0.5 points.
|
|
|
|
The expected score *E* of a game between the two players is computed using a [sigmoid function](sigmoid.md) (400 is just a [magic constant](magic_constant.md) that's usually used, it makes it so that a positive difference of 400 points makes a player 10 times more likely to win):
|
|
|
|
*E = 1 / (1 + 10^((B - A)/400))*
|
|
|
|
For example if we set the ratings *A = 1700* and *B = 1400*, we get a result *E ~= 0.85*, i.e in a series of many games player 1 will get an average of about *0.85* points per game, which can mean that out of 100 games he wins 85 times and loses 16 times (but it can also mean that out of 100 games he e.g. wins 70 times and draws 30). Computing the same formula from the player 2 perspective gives *E ~= 0.15* which makes sense as the number of points expected to gain by the players have to add up to 1 (the formula says in what ratio the two players split the 1 point of the game).
|
|
|
|
After playing a game the ratings of the two players are adjusted depending on the actual outcome of the game. The winning player takes some amount of rating points from the loser (i.e. the loser loses the same amount of point the winner gains which means the total number of points in the system doesn't change as a result of games being played). The new rating of player 1, *A2*, is computed as:
|
|
|
|
*A2 = A + K * (R - E)*
|
|
|
|
where *R* is the outcome of the game (for player 1, i.e. 1 for a win, 0 for loss, 0.5 for a draw) and *K* is the change rate which affects how quickly the ratings will change (can be set to e.g. 30 but may be different e.g. for new or low rated players). So with e.g. *K = 25* if for our two players the game ends up being a draw, player 2 takes 9 points from player 1 (*A2 = 1691*, *B2 = 1409*, note that drawing a weaker player is below the expected result).
|
|
|
|
## Some Code
|
|
|
|
Here is a [C](c.md) code that simulates players of different skills playing games and being rated with Elo. Keep in mind the example is simple, it uses the potentially imperfect `rand` function etc., but it shows the principle quite well. At the beginning each player is assigned an Elo of 1000 and a random skill which is [normally distrubuted](normal_distribution.md), a game between two players consists of each player drawing a random number in range from from 1 to his skill number, the player that draws a bigger number wins (i.e. a player with higher skill is more likely to win).
|
|
|
|
```
|
|
#include <stdio.h>
|
|
#include <stdlib.h>
|
|
#include <math.h>
|
|
|
|
#define PLAYERS 101
|
|
#define GAMES 10000
|
|
#define K 25 // Elo K factor
|
|
|
|
typedef struct
|
|
{
|
|
unsigned int skill;
|
|
unsigned int elo;
|
|
} Player;
|
|
|
|
Player players[PLAYERS];
|
|
|
|
double eloExpectedScore(unsigned int elo1, unsigned int elo2)
|
|
{
|
|
return 1.0 / (1.0 + pow(10.0,((((double) elo2) - ((double) elo1)) / 400.0)));
|
|
}
|
|
|
|
int eloPointGain(double expectedResult, double result)
|
|
{
|
|
return K * (result - expectedResult);
|
|
}
|
|
|
|
int main(void)
|
|
{
|
|
srand(100);
|
|
|
|
for (int i = 0; i < PLAYERS; ++i)
|
|
{
|
|
players[i].elo = 1000; // give everyone inital Elo of 1000
|
|
|
|
// normally distributed skill in range 0-99:
|
|
players[i].skill = 0;
|
|
|
|
for (int j = 0; j < 8; ++j)
|
|
players[i].skill += rand() % 100;
|
|
|
|
players[i].skill /= 8;
|
|
}
|
|
|
|
for (int i = 0; i < GAMES; ++i) // play games
|
|
{
|
|
unsigned int player1 = rand() % PLAYERS,
|
|
player2 = rand() % PLAYERS;
|
|
|
|
// let players draw numbers, bigger number wins
|
|
unsigned int number1 = rand() % (players[player1].skill + 1),
|
|
number2 = rand() % (players[player2].skill + 1);
|
|
|
|
double gameResult = 0.5;
|
|
|
|
if (number1 > number2)
|
|
gameResult = 1.0;
|
|
else if (number2 > number1)
|
|
gameResult = 0.0;
|
|
|
|
int pointGain = eloPointGain(eloExpectedScore(
|
|
players[player1].elo,
|
|
players[player2].elo),gameResult);
|
|
|
|
players[player1].elo += pointGain;
|
|
players[player2].elo -= pointGain;
|
|
}
|
|
|
|
for (int i = PLAYERS - 2; i >= 0; --i) // bubble-sort by Elo
|
|
for (int j = 0; j <= i; ++j)
|
|
if (players[j].elo < players[j + 1].elo)
|
|
{
|
|
Player tmp = players[j];
|
|
players[j] = players[j + 1];
|
|
players[j + 1] = tmp;
|
|
}
|
|
|
|
for (int i = 0; i < PLAYERS; i += 5) // print
|
|
printf("#%d: Elo: %d (skill: %d\%)\n",i,players[i].elo,players[i].skill);
|
|
|
|
return 0;
|
|
}
|
|
```
|
|
|
|
The code may output e.g.:
|
|
|
|
```
|
|
#0: Elo: 1134 (skill: 62%)
|
|
#5: Elo: 1117 (skill: 63%)
|
|
#10: Elo: 1102 (skill: 59%)
|
|
#15: Elo: 1082 (skill: 54%)
|
|
#20: Elo: 1069 (skill: 58%)
|
|
#25: Elo: 1054 (skill: 54%)
|
|
#30: Elo: 1039 (skill: 52%)
|
|
#35: Elo: 1026 (skill: 52%)
|
|
#40: Elo: 1017 (skill: 56%)
|
|
#45: Elo: 1016 (skill: 50%)
|
|
#50: Elo: 1006 (skill: 40%)
|
|
#55: Elo: 983 (skill: 50%)
|
|
#60: Elo: 974 (skill: 42%)
|
|
#65: Elo: 970 (skill: 41%)
|
|
#70: Elo: 954 (skill: 44%)
|
|
#75: Elo: 947 (skill: 47%)
|
|
#80: Elo: 936 (skill: 40%)
|
|
#85: Elo: 927 (skill: 48%)
|
|
#90: Elo: 912 (skill: 52%)
|
|
#95: Elo: 896 (skill: 35%)
|
|
#100: Elo: 788 (skill: 22%)
|
|
```
|
|
|
|
We can see that Elo quite nicely correlates with the player's real skill. |