You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

7.0 KiB

Distance

Distance is a measure of how far away from each other two points are. Most commonly distance refers to physical separation in space, e.g. as in distance of planets from the Sun, but more generally distance can refer to any kind of parameter space and in any number of dimensions, e.g. the distance of events in time measured in seconds (1D distance) or distance of two text strings as the amount of their dissimilarity (Levenshtein distance). Distances are very important in computer science and math as they allow us to do such things as clustering, path searching, physics simulations, various comparisons, sorting etc.

Distance is similar/related to length, the difference is that distance is computed between two points while length is the distance of one point from some implicit origin.

There are many ways to define distance within given space. Most common and implicitly assumed distance is the Euclidean distance (basically the "straight line from point A to point B" whose length is computed with Euclidean Theorem), but other distances are possible, e.g. the taxicab distance (length of the kind of perpendicular path taxis take between points A and B in Manhattan, usually longer than straight line). Mathematically a space in which distances can be measured are called metric spaces, and a distance within such space can be any function dist (called a distance or metric function) that satisfies these axioms:

  1. dist(p,p) = 0 (distance from identical point is zero)
  2. Values given by dist are never negative.
  3. dist(p,q) = dist(q,p) (symmetry, distance between two points is the same in both directions).
  4. dist(a,c) <= dist(a,b) + dist(b,c) (triangle inequality)

Approximations

Computing Euclidean distance requires multiplication and most importantly square root which is usually a pretty slow operation, therefore many times we look for simpler approximations. Note that a possible approach here may also lead through computing the distance normally but using a fast approximation of the square root.

Two very basic and rough approximations of Euclidean distance, both in 2D and 3D, are taxicab (also Manhattan) and Chebyshev distances. Taxicab distance simply adds the absolute coordinate differences along each principal axis (dx, dy and dz) while Chebyshev takes the maximum of them. In C (for generalization to 3D just add one coordinate of course):

int distTaxi(int x0, int y0, int x1, int y1)
{
  x0 = x1 > x0 ? x1 - x0 : x0 - x1; // dx
  y0 = y1 > y0 ? y1 - y0 : y0 - y1; // dy
  
  return x0 + y0;
}

int distCheb(int x0, int y0, int x1, int y1)
{
  x0 = x1 > x0 ? x1 - x0 : x0 - x1; // dx
  y0 = y1 > y0 ? y1 - y0 : y0 - y1; // dy
  
  return x0 > y0 ? x0 : y0;
}

Both of these distances approximate a circle in 2D with a square or a sphere in 3D with a cube, the difference is that taxicab is an upper estimate of the distance while Chebyshev is the lower estimate. For speed of execution (optimization) it may also be important that taxicab distance only uses the operation of addition while Chebyshev may result in branching (if) in the max function which is usually not good for performance.

A bit more accuracy can be achieved by averaging the taxicab and Chebyshev distances which in 2D approximates a circle with an 8 segment polygon and in 3D approximates a sphere with 24 sided polyhedron. The integer-only C code is following:

int dist8(int x0, int y0, int x1, int y1)
{
  x0 = x1 > x0 ? x1 - x0 : x0 - x1; // dx
  y0 = y1 > y0 ? y1 - y0 : y0 - y1; // dy
    
  return (x0 + y0 + (x0 > y0 ? x0 : y0)) / 2;
}

{ The following is an approximation I came up with when working on tinyphysicsengine. While I measured the average and maximum error of the taxi/Chebyshev average in 3D at about 16% and 22% respectively, the following gave me 3% and 12% values. ~drummyfish }

Yet more accurate approximation of 3D Euclidean distance can be made with a 48 sided polyhedron. The principle is following: take absolute values of all three coordinate differences and order them by magnitude so that dx >= dy >= dz >= 0. This gets us into one of 48 possible slices of space (the other slices have the same shape, they just differ by ordering or signs of the coordinates but the distance in them is of course equal). In this slice we'll approximate the distance linearly, i.e. with a plane. We do this by simply computing the distance of our point from a plane that goes through origin and whose normal is approximately {0.8728,0.4364,0.2182} (it points in the direction that goes through the middle of space slice). The expression for the distance from this plane simplifies to simply 0.8728 * dx + 0.4364 * dy + 0.2182 * dz. The following is an integer-only implementation in C (note that the constants above have been converted to allow division by 1024 for possible optimization of division to a bit shift):

int32_t dist48(
  int32_t x0, int32_t y0, int32_t z0,
  int32_t x1, int32_t y1, int32_t z1)
{
  x0 = x1 > x0 ? x1 - x0 : x0 - x1; // dx
  y0 = y1 > y0 ? y1 - y0 : y0 - y1; // dy
  z0 = z1 > z0 ? z1 - z0 : z0 - z1; // dz
 
  if (x0 < y0) // order the coordinates
  {
    if (x0 < z0)
    {
      if (y0 < z0)
      { // x0 < y0 < z0
        int32_t t = x0; x0 = z0; z0 = t;
      }
      else
      { // x0 < z0 < y0
        int32_t t = x0; x0 = y0; y0 = t;
        t = z0; z0 = y0; y0 = t;
      }
    }
    else
    { // z0 < x0 < y0
      int32_t t = x0; x0 = y0; y0 = t;
    }
  }
  else
  {
    if (y0 < z0)
    {
      if (x0 < z0)
      { // y0 < x0 < z0
        int32_t t = y0; y0 = z0; z0 = t;
        t = x0; x0 = y0; y0 = t;  
      }
      else
      { // y0 < z0 < x0
        int32_t t = y0; y0 = z0; z0 = t;
      }
    }
  }
    
  return (893 * x0 + 446 * y0 + 223 * z0) / 1024;
}

A similar approximation for 2D distance is (from a 1984 book Problem corner) this: sqrt(dx^2 + dy^2) ~= 0.96 * dx + 0.4 * dy for dx >= dy >= 0. The error is <= 4%. This can be optionally modified to use the closest power of 2 constants so that the function becomes much faster to compute, but the maximum error increases (seems to be about 11%). C code with fixed point follows (commented out line is the faster, less accurate version):

int dist2DApprox(int x0, int y0, int x1, int y1)
{
  x0 = x0 > x1 ? (x0 - x1) : (x1 - x0);
  y0 = y0 > y1 ? (y0 - y1) : (y1 - y0);
  
  if (x0 < y0)
  {
    x1 = x0; // swap
    x0 = y0;
    y0 = x1;
  }
  
  return (123 * x0 + 51 * y0) / 128; // max error = ~4%
  //return x0 + y0 / 2;              // faster, less accurate  
}

TODO: this https://www.flipcode.com/archives/Fast_Approximate_Distance_Functions.shtml