less_retarded_wiki/fixed_point.md
Miloslav Ciz eb91ebfbc9 Update
2022-06-18 22:58:09 +02:00

6.4 KiB

Fixed Point

Fixed point arithmetic is a simple and often good enough representation of fractional (non-integer) numbers, as opposed to floating point which we consider a bad, bloated alternative (in most cases). Probably in 99% cases when you think you need floating point, fixed point is actually what you need.

Fixed point has at least these advantages over floating point:

  • It doesn't require a special hardware coprocessor for efficient execution and so doesn't introduce a dependency. Programs using floating point will run extremely slowly on systems without float hardware support as they have to emulate the complex hardware in software, while fixed point will run just as fast as integer arithmetic. For this reason fixed point is very often used in embedded computers.
  • It is easier to understand and better predictable, less tricky, KISS, suckless. (Float's IEEE 754 standard is 58 pages long, the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic has 48 pages.)
  • Is easier to implement and so supported in many more systems. Any language or format supporting integers also supports fixed point.
  • Isn't ugly and doesn't waste values (positive and negative zero, denormalized numbers, ...).

How It Works

Fixed point uses a fixed (hence the name) number of digits (bits in binary) for the integer part and the rest for the fractional part (whereas floating point's fractional part varies in size). I.e. we split the binary representation of the number into two parts (integer and fractional) by IMAGINING a radix point at some place in the binary representation. That's basically it.

So, we can just use an integer data type as a fixed point data type, there is no need for libraries or special hardware support. We can also perform operations such as addition the same way as with integers. For example if we have a binary integer number represented as 00001001, 9 in decimal, we may say we'll be considering a radix point after let's say the sixth place, i.e. we get 000010.01 which we interpret as 2.25 (2^2 + 2^(-2)). The binary value we store in a variable is the same (as the radix point is only imagined), we only INTERPRET it differently.

We may look at it this way: we still use integers but we use them to count smaller fractions than 1. For example in a 3D game where our basic spatial unit is 1 meter our variables may rather contain the number of centimeters (however in practice we should use powers of two, so rather 1/128ths of a meter). In the example in previous paragraph we count 1/4ths (we say our scaling factor is 1/4), so actually the number represented as 00000100 is what in floating point we'd write as 1.0 (00000100 is 4 and 4 * 1/4 = 1), while 00000001 means 0.25.

This has just one consequence: we have to normalize results of multiplication and division (addition and subtraction work just as with integers, we can normally use the + and - operators). I.e. when multiplying, we have to divide the result by the inverse of the fractions we're counting, i.e. by 4 in our case (1/(1/4) = 4). Similarly when dividing, we need to MULTIPLY the result by this number. This is because we are using fractions as our units and when we multiply two numbers in those units, the units multiply as well, i.e. in our case multiplying two numbers that count 1/4ths give a result that counts 1/16ths, we need to divide this by 4 to get the number of 1/4ths back again (this works the same as e.g. units in physics, multiplying number of meters by number of meters gives meters squared.) For example the following integer multiplication:

00001000 * 00000010 = 00010000 (8 * 2 = 16)

in our system has to be normalized like this:

(000010.00 * 000000.10) / 4 = 000001.00 (2.0 * 0.5 = 1.0)

With this normalization we also have to think about how to bracket expressions to prevent rounding errors and overflows, for example instead of (x / y) * 4 we may want to write (x * 4) / y; imagine e.g. x being 00000010 (0.5) and y being 00000100 (1.0), the former would result in 0 (incorrect, rounding error) while the latter correctly results in 0.5. The bracketing depends on what values you expect to be in the variables so it can't really be done automatically by a compiler or library (well, it might probably be somehow handled at runtime, but of course, that will be slower).

The normalization is basically the only thing you have to think about, apart from this everything works as with integers. Remember that this all also works with negative number in two's complement, so you can use a signed integer type without any extra trouble.

Remember to always use a power of two scaling factor -- this is crucial for performance. I.e. you want to count 1/2th, 1/4th, 1/8ths etc., but NOT 1/10ths, as might be tempting. Why are power of two good here? Because computers work in binary and so the normalization operations with powers of two (division and multiplication by the scaling factor) can easily be optimized by the compiler to a mere bit shift, an operation much faster than multiplication or division.

Implementation Example

The following is an example of a simple C program using fixed point with 10 fractional bits, computing square roots of numbers from 0 to 10.

#include <stdio.h>

typedef int Fixed;

#define UNIT_FRACTIONS 1024 // 10 fractional bits, 2^10 = 1024

#define INT_TO_FIXED(x) ((x) * UNIT_FRACTIONS)

Fixed fixedSqrt(Fixed x)
{
  // stupid brute force square root

  int previousError = -1;
  
  for (int test = 0; test <= x; ++test)
  {
    int error = x - (test * test) / UNIT_FRACTIONS;

    if (error == 0)
      return test;
    else if (error < 0)
      error *= -1;

    if (previousError > 0 && error > previousError)
      return test - 1;

    previousError = error;
  }

  return 0;
}

void fixedPrint(Fixed x)
{
  printf("%d.%03d",x / UNIT_FRACTIONS,
    ((x % UNIT_FRACTIONS) * 1000) / UNIT_FRACTIONS);
}

int main(void)
{
  for (int i = 0; i <= 10; ++i)
  {
    printf("%d: ",i);
    
    fixedPrint(fixedSqrt(INT_TO_FIXED(i)));
    
    putchar('\n');
  }
  
  return 0;
}

The output is:

0: 0.000
1: 1.000
2: 1.414
3: 1.732
4: 2.000
5: 2.236
6: 2.449
7: 2.645
8: 2.828
9: 3.000
10: 3.162