master
Miloslav Ciz 2 years ago
parent fd5a853250
commit d31380f06b

@ -1,17 +1,17 @@
# Fixed Point
Fixed point arithmetic is a simple and often [good enough](good_enough.md) representation of [fractional](rational_number.md) (non-integer) numbers, as opposed to [floating point](float.md) which we consider a bad, [bloated](bloat.md) alternative (in most cases). Probably in 99% cases when you think you need floating point, fixed point is actually what you need.
Fixed point arithmetic is a simple and often [good enough](good_enough.md) method of computer representation of [fractional](rational_number.md) numbers (i.e. numbers with higher precision than [integers](integer.md), e.g. 4.03), as opposed to [floating point](float.md) which is a more complicated way of doing this which in most cases we consider a worse, [bloated](bloat.md) alternative. Probably in 99% cases when you think you need floating point, fixed point will do just fine.
Fixed point has at least these advantages over floating point:
- **It doesn't require a special hardware coprocessor** for efficient execution and so doesn't introduce a [dependency](dependency.md). Programs using floating point will run extremely slowly on systems without float hardware support as they have to emulate the complex hardware in software, while fixed point will run just as fast as integer arithmetic. For this reason fixed point is very often used in [embedded](embedded.md) computers.
- It is **easier to understand and better predictable**, less tricky, [KISS](kiss.md), [suckless](sukless.md). (Float's IEEE 754 standard is 58 pages long, the paper *What Every Computer Scientist Should Know About Floating-Point Arithmetic* has 48 pages.)
- Is easier to implement and so **supported in many more systems**. Any language or format supporting integers also supports fixed point.
- Isn't ugly and doesn't waste values (positive and negative zero, denormalized numbers, ...).
- Isn't ugly and **doesn't waste values** (unlike IEEE 754 with positive and negative zero, denormalized numbers, many [NaNs](nan.md) etc.).
## How It Works
Fixed point uses a fixed (hence the name) number of digits (bits in binary) for the integer part and the rest for the fractional part (whereas floating point's fractional part varies in size). I.e. we split the binary representation of the number into two parts (integer and fractional) by IMAGINING a radix point at some place in the binary representation. That's basically it.
Fixed point uses a fixed (hence the name) number of digits (bits in binary) for the integer part and the rest for the fractional part (whereas floating point's fractional part varies in size). I.e. we split the binary representation of the number into two parts (integer and fractional) by IMAGINING a radix point at some place in the binary representation. That's basically it. Fixed point therefore spaces numbers [uniformly](uniformity.md), as opposed to floating point whose spacing of numbers is non-uniform.
So, **we can just use an integer data type as a fixed point data type**, there is no need for libraries or special hardware support. We can also perform operations such as addition the same way as with integers. For example if we have a binary integer number represented as `00001001`, 9 in decimal, we may say we'll be considering a radix point after let's say the sixth place, i.e. we get `000010.01` which we interpret as 2.25 (2^2 + 2^(-2)). The binary value we store in a variable is the same (as the radix point is only imagined), we only INTERPRET it differently.

@ -0,0 +1,54 @@
# Floating Point
Floating point arithmetic (normally just *float*) is a method of computer representation of [fractional](rational_number.md) numbers, i.e. numbers with higher than [integer](integer.md) precision (such as 5.13), which is more complex than e.g. [fixed point](fixed_point.md). The core idea of it is to use a radix point that's not fixed but can move around so as to allow representation of both very small and very big values. Nowadays floating point is the standard way of [approximating](approximation.md) [real numbers](real_number.md) in computers, basically all of the popular [programming languages](programming_language.md) have a floating point [data type](data_type.md) that adheres to the IEEE 754 standard, all personal computers also have the floating point hardware unit (FPU) and so it is widely used in all [modern](modern.md) programs. However most of the time a simpler representation of fractional numbers, such as the mentioned [fixed point](fixed_point.md), suffices, and weaker computers (e.g. [embedded](embedded.md)) may lack the hardware support so floating point operations are emulated in software and therefore slow -- for these reasons we consider floating point [bloat](bloat.md) and recommend the preference of fixed point.
Is floating point literal evil? Well, of course not, but it is extremely overused. You may need it for precise scientific simulations, e.g. [numerical integration](numerical_integration.md), but as our [small3dlib](small3dlib.md) shows, you can comfortably do even [3D rendering](3d_rendering.md) without it. So always consider whether you REALLY need float.
## How It Works
Floats represent numbers by representing two main parts: the *base* -- actual encoded digits, called **mantissa** (or significand etc.) -- and the position of the radix point. The position of radix point is called the **exponent** because mathematically the floating point works similarly to the scientific notation of extreme numbers that use exponentiation. For example instead of writing 0.0000123 scientists write 123 * 10^-7 -- here 123 would be the mantissa and -7 the exponent.
Though various numeric bases can be used, in [computers](computer.md) we normally use [base 2](binary.md), so let's consider it from now on. So our numbers will be of format:
*mantissa * 2^exponent*
Note that besides mantissa and exponent there may also be other parts, typically there is also a sign bit that says whether the number is positive or negative.
Let's now consider an extremely simple floating point format based on the above. Keep in mind this is an EXTREMELY NAIVE inefficient format that wastes values. We won't consider negative numbers. We will use 6 bits for our numbers:
- 3 leftmost bits for mantissa: This allows us to represent 2^3 = 8 base values: 0 to 7 (including both).
- 3 rightmost bits for exponent: We will encode exponent in [two's complement](twos_complement.md) so that it can represent values from -4 to 3 (including both).
So for example the binary representation `110011` stores mantissa `110` (6) and exponent `011` (3), so the number it represents is 6 * 2^3 = 48. Similarly `001101` represents 1 * 2^-3 = 1/8 = 0.125.
Note a few things: firstly our format is [shit](shit.md) because some numbers have multiple representations, e.g. 0 can be represented as `000000`, `000001`, `000010`, `000011` etc., in fact we have 8 zeros! That's unforgivable and formats used in practice address this (usually by prepending an implicit 1 to mantissa).
Secondly notice the non-uniform distribution of our numbers: while we have a nice resolution close to 0 (we can represent 1/16, 2/16, 3/16, ...) but low resolution in higher numbers (the highest number we can represent is 56 but the second highest is 48, we can NOT represent e.g. 50 exactly). Realize that obviously with 6 bits we can still represent only 64 numbers at most! So float is NOT a magical way to get more numbers, with integers on 6 bits we can represent numbers from 0 to 63 spaced exactly by 1 and with our floating point we can represent numbers spaced as close as 1/16th but only in the region near 0, we pay the price of having big gaps in higher numbers.
Also notice that thing like simple addition of numbers become more difficult and time consuming, you have to include conversions and [rounding](rounding.md) -- while with fixed point addition is a single machine instruction, same as integer addition, here with software implementation we might end up with dozens of instructions (specialized hardware can perform addition fast but still, not all computer have that hardware).
Rounding errors will appear and accumulate during computations: imagine the operation 48 + 1/8. Both numbers can be represented in our system but not the result (48.125). We have to round the result and end up with 48 again. Imagine you perform 64 such additions in succession (e.g. in a loop): mathematically the result should be 48 + 64 * 1/8 = 56, which is a result we can represent in our system, but we will nevertheless get the wrong result (48) due to rounding errors in each addition. So the behavior of float can be **non intuitive** and dangerous, at least for those who don't know how it works.
## Standard Float Format: IEEE 754
IEEE 754 is THE standard that basically all computers use for floating point nowadays -- it specifies the exact representation of floating point numbers as well as rounding rules, required operations applications should implement etc. However note that the standard is **kind of [shitty](shit.md)** -- even if we want to use floating point numbers there exist better ways such as **[posits](posit.md)** that outperform this standard. Nevertheless IEEE 754 has been established in the industry to the point that it's unlikely to go anytime soon. So it's good to know how it works.
Numbers in this standard are signed, have positive and negative zero (oops), can represent plus and minus [infinity](infinity.md) and different [NaNs](nan.md) (not a number). In fact there are thousands to billions of different NaNs which are basically wasted values. These inefficiencies are addressed by the mentioned [posits](posit.md).
Briefly the representation is following (hold on to your chair): leftmost bit is the sign bit, then exponent follows (the number of bits depends on the specific format), the rest of bits is mantissa. In mantissa implicit `1.` is considered (except when exponent is all 0s), i.e. we "imagine" `1.` in front of the mantissa bits but this 1 is not physically stored. Exponent is in so called biased format, i.e. we have to subtract half (rounded down) of the maximum possible value to get the real value (e.g. if we have 8 bits for exponent and the directly stored value is 120, we have to subtract 255 / 2 = 127 to get the real exponent value, in this case we get -7). However two values of exponent have special meaning; all 0s signify so called denormalized (also subnormal) number in which we consider exponent to be that which is otherwise lowest possible (e.g. -126 in case of 8 bit exponent) but we do NOT consider the implicit 1 in front of mantissa (we instead consider `0.`), i.e. this allows storing [zero](zero.md) (positive and negative) and very small numbers. All 1s in exponent signify either [infinity](infinity.md) (positive and negative) in case mantissa is all 0s, or a [NaN](nan.md) otherwise -- considering here we have the whole mantissa plus sign bit unused, we actually have many different NaNs ([WTF](wtf.mf)), but usually we only distinguish two kinds of NaNs: quiet (qNaN) and signaling (sNaN, throws and [exception](exception.md)) that are distinguished by the leftmost bit in mantissa (1 for qNaN, 0 for sNaN).
The standard specifies many formats that are either binary or decimal and use various numbers of bits. The most relevant ones are the following:
| name |M bits|E bits| smallest and biggest number | precision <= 1 up to |
| --------------------------------- | ---- | ---- | --------------------------------------- | -------------------- |
|binary16 (half precision) | 10 | 5 |2^(-24), 65504 | 2048 |
|binary32 (single precision, float) | 23 | 8 |2^(-149), 2127 * (2 - 2^-23) ~= 3 * 10^38| 16777216 |
|binary64 (double precision, double)| 52 | 11 |2^(-1074), ~10^308 | 9007199254740992 |
|binary128 (quadruple precision) | 112 | 15 |2^(-16494), ~10^4932 | ~10^34 |
**Example?** Let's say we have float (binary34) value `11000000111100000000000000000000`: first bit (sign) is 1 so the number is negative. Then we have 8 bits of exponent: `10000001` (129) which converted from the biased format (subtracting 127) gives exponent value of 2. Then mantissa bits follow: `11100000000000000000000`. As we're dealing with a normal number (exponent bits are neither all 1s nor all 0s), we have to imagine the implicit `1.` in front of mantissa, i.e. our actual mantissa is `1.11100000000000000000000` = 1.875. The final number is therefore -1 * 1.875 * 2^2 = -7.5.
## See Also
- [posit](posit.md)
- [fixed point](fixed_point.md)

@ -15,10 +15,11 @@ Advantages of UBI:
- **Suffering of many people will be lowered or eliminated**. Simple as that.
- **We may actually save money** because the system will simplify a lot. Nowadays we have complex bureaucracy and commissions that judge who can get social welfare, who can get disability pensions etc. If everyone gets the money, we can save on this bureaucracy, commissions, on doctor examinations, caring about the homeless, maintaining special laws etc. If people become less stressed, mental health will also improve and we will save money on treatment of mentally ill people. Money may also be saved on organization of worker unions as they may become much less important.
- **People will become less stressed**, happier, will have security and as a result perhaps even become more "productive" (this has been confirmed by some experiments).
- **Criminality will greatly decrease** as it is directly linked to poverty, this will of course further save money on police, lawyers, medical bills etc.
- **People will become more equal** which will shift us closer to the [ideal society](ideal_society.md).
- **People will be able to do more important things than work if needed**, for example they may choose to focus on education for a few years, which will make the population better educated and therefore better.
- **Social security will suppress fear in people** and therefore make them less xenophobic, less militant etc.
- **Society will get nicer without homeless people shitting in the streets**.
- **Homelessness will greatly decrease**.
Disadvantages of UBI:

Loading…
Cancel
Save