less_retarded_wiki/elo.md
2024-10-25 00:41:24 +02:00

9.5 KiB

Elo

The Elo system (named after Arpad Elo, NOT an acronym) is a mathematical system for rating the relative strength of players of a certain game, most notably and widely used in chess but also elsewhere (video games, table tennis, ...). Based on number of wins, losses and draws against other Elo rated opponents, the system computes a number (rating) for each player that highly correlates with that player's current strength/skill; as games are played, ratings of players are constantly being updated to reflect changes in their strength. The numeric rating can then be used to predict the probability of a win, loss or draw of any two players in the system, as well as e.g. for constructing ladders of current top players and matchmaking players of similar strength in online games. For example if player A has an Elo rating of 1700 and player B 1400, player A is expected to win in a game with player B with the probability of 85%. Besides Elo there exist alternative and improved systems, notably e.g. the Glicko system (which further adds e.g. confidence intervals).

The Elo system was created specifically for chess (even though it can be applied to other games as well, it doesn't rely on any chess specific rules) and described by Arpad Elo in his 1978 book called The Rating of Chessplayers, Past and Present, by which time it was already in use by FIDE. It replaced older rating systems, most notably the Harkness system. Despite more "advanced" systems being around nowadays, Elo remains the most widely used one.

Elo rates only RELATIVE performance, not absolute, i.e. the rating number of a player says nothing in itself, it is only the DIFFERENCE in rating points between two players that matters, so in an extreme case two players rated 300 and 1000 in one rating pool may in another one be rated 10300 and 11000 (the difference of 700 is the only thing that stays the same, mean value can change freely). This may be influenced by initial conditions and things such as rating inflation (or deflation) -- if for example a chess website assigns some start rating to new users which tends to overestimate an average newcomer's abilities, newcomers will come to the site, play a few games which they will lose, then they ragequit but they've already fed their points to the good players, causing the average rating of a good player to grow over time.

Elo rating can also be converted to (or from) alternative measures such as material or time advantage, i.e. given let's say two chess players with known ratings, we may be able to say how big of a handicap the stronger player must suffer in order for the two to be on par. However the relationship will probably not be simple, we can't say "this much Elo difference equals this many pawns in handicap" -- having a two pawn material advantage in a beginner game hardly makes a difference but on the absolute top level losing two pawns is decisively also a lost game (despite this some approximations were given, e.g. Fisher and Kannan estimated that in computer chess 100 Elo was roughly equivalent to one pawn).

Keep in mind Elo is a big simplification of reality, as is any attempt at capturing skill with a single number -- even though it is a very good predictor of something akin a "skill" and outcomes of games, trying to capture a "skill" with a single number is similar to e.g. trying to capture such a multidimensional thing as intelligence with a single dimensional IQ number. For example due to many different areas of a game to be mastered and different playstyles transitivity may be broken in reality: it may happen that player A mostly beats player B, player B mostly beats player C and player C mostly beats player A, which Elo won't capture.

How It Works

Initial rating of players is not specified by Elo, each rating organization applies its own method (e.g. assign an arbitrary value of let's say 1000 or letting the player play a few unrated games to estimate his skill).

Suppose we have two players, player 1 with rating A and player 2 with rating B. In a game between them player 1 can either win, i.e. score 1 point, lose, i.e. score 0 points, or draw, i.e. score 0.5 points.

The expected score E of a game between the two players is computed using a sigmoid function (400 is just a magic constant that's usually used, it makes it so that a positive difference of 400 points makes a player 10 times more likely to win):

E = 1 / (1 + 10^((B - A)/400))

For example if we set the ratings A = 1700 and B = 1400, we get a result E ~= 0.85, i.e in a series of many games player 1 will get an average of about 0.85 points per game, which can mean that out of 100 games he wins 85 times and loses 16 times (but it can also mean that out of 100 games he e.g. wins 70 times and draws 30). Computing the same formula from the player 2 perspective gives E ~= 0.15 which makes sense as the number of points expected to gain by the players have to add up to 1 (the formula says in what ratio the two players split the 1 point of the game).

After playing a game the ratings of the two players are adjusted depending on the actual outcome of the game. The winning player takes some amount of rating points from the loser (i.e. the loser loses the same amount of point the winner gains which means the total number of points in the system doesn't change as a result of games being played). The new rating of player 1, A2, is computed as:

A2 = A + K * (R - E)

where R is the outcome of the game (for player 1, i.e. 1 for a win, 0 for loss, 0.5 for a draw) and K is the change rate which affects how quickly the ratings will change (can be set to e.g. 30 but may be different e.g. for new or low rated players). So with e.g. K = 25 if for our two players the game ends up being a draw, player 2 takes 9 points from player 1 (A2 = 1691, B2 = 1409, note that drawing a weaker player is below the expected result).

How to compute Elo difference from a number of games? This is useful e.g. if we have a chess engine X with Elo EX and a new engine Y whose Elo we don't know: we may let these two engines play 1000 games, note the average result E and then compute the Elo difference of the new engine against the first engine from this formula (derived from the above formula by solving for Elo difference B - A):

B - A = log10(1 / E - 1) * 400

Some Code

Here is a C code that simulates players of different skills playing games and being rated with Elo. Keep in mind the example is simple, it uses the potentially imperfect rand function etc., but it shows the principle quite well. At the beginning each player is assigned an Elo of 1000 and a random skill which is normally distrubuted, a game between two players consists of each player drawing a random number in range from from 1 to his skill number, the player that draws a bigger number wins (i.e. a player with higher skill is more likely to win).

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define PLAYERS 101
#define GAMES 10000
#define K 25          // Elo K factor

typedef struct
{
  unsigned int skill;
  unsigned int elo;
} Player;

Player players[PLAYERS];

double eloExpectedScore(unsigned int elo1, unsigned int elo2)
{
  return 1.0 / (1.0 + pow(10.0,((((double) elo2) - ((double) elo1)) / 400.0)));
}

int eloPointGain(double expectedResult, double result)
{
  return K * (result - expectedResult);
}

int main(void)
{
  srand(100);

  for (int i = 0; i < PLAYERS; ++i)
  {
    players[i].elo = 1000; // give everyone initial Elo of 1000

    // normally distributed skill in range 0-99:
    players[i].skill = 0;

    for (int j = 0; j < 8; ++j)
      players[i].skill += rand() % 100;

    players[i].skill /= 8;
  }

  for (int i = 0; i < GAMES; ++i) // play games
  {
    unsigned int player1 = rand() % PLAYERS,
                 player2 = rand() % PLAYERS;

    // let players draw numbers, bigger number wins
    unsigned int number1 = rand() % (players[player1].skill + 1),
                 number2 = rand() % (players[player2].skill + 1);

    double gameResult = 0.5;

    if (number1 > number2)
      gameResult = 1.0;
    else if (number2 > number1)
      gameResult = 0.0;
  
    int pointGain = eloPointGain(eloExpectedScore(
      players[player1].elo,
      players[player2].elo),gameResult);

    players[player1].elo += pointGain;
    players[player2].elo -= pointGain;
  }

  for (int i = PLAYERS - 2; i >= 0; --i) // bubble-sort by Elo
    for (int j = 0; j <= i; ++j)
      if (players[j].elo < players[j + 1].elo)
      {
        Player tmp = players[j];
        players[j] = players[j + 1];
        players[j + 1] = tmp;
      }

  for (int i = 0; i < PLAYERS; i += 5) // print
    printf("#%d: Elo: %d (skill: %d\%)\n",i,players[i].elo,players[i].skill);

  return 0;
}

The code may output e.g.:

#0: Elo: 1134 (skill: 62%)
#5: Elo: 1117 (skill: 63%)
#10: Elo: 1102 (skill: 59%)
#15: Elo: 1082 (skill: 54%)
#20: Elo: 1069 (skill: 58%)
#25: Elo: 1054 (skill: 54%)
#30: Elo: 1039 (skill: 52%)
#35: Elo: 1026 (skill: 52%)
#40: Elo: 1017 (skill: 56%)
#45: Elo: 1016 (skill: 50%)
#50: Elo: 1006 (skill: 40%)
#55: Elo: 983 (skill: 50%)
#60: Elo: 974 (skill: 42%)
#65: Elo: 970 (skill: 41%)
#70: Elo: 954 (skill: 44%)
#75: Elo: 947 (skill: 47%)
#80: Elo: 936 (skill: 40%)
#85: Elo: 927 (skill: 48%)
#90: Elo: 912 (skill: 52%)
#95: Elo: 896 (skill: 35%)
#100: Elo: 788 (skill: 22%)

We can see that Elo quite nicely correlates with the player's real skill.