This commit is contained in:
Miloslav Ciz 2022-12-26 15:23:35 +01:00
parent 8bbd48afb6
commit 2bd0010bb2
14 changed files with 44 additions and 28 deletions

View file

@ -1,12 +1,14 @@
# Floating Point
Floating point arithmetic (normally just *float*) is a method of computer representation of [fractional](rational_number.md) numbers, i.e. numbers with higher than [integer](integer.md) precision (such as 5.13), which is more complex than e.g. [fixed point](fixed_point.md). The core idea of it is to use a radix point that's not fixed but can move around so as to allow representation of both very small and very big values. Nowadays floating point is the standard way of [approximating](approximation.md) [real numbers](real_number.md) in computers, basically all of the popular [programming languages](programming_language.md) have a floating point [data type](data_type.md) that adheres to the IEEE 754 standard, all personal computers also have the floating point hardware unit (FPU) and so it is widely used in all [modern](modern.md) programs. However most of the time a simpler representation of fractional numbers, such as the mentioned [fixed point](fixed_point.md), suffices, and weaker computers (e.g. [embedded](embedded.md)) may lack the hardware support so floating point operations are emulated in software and therefore slow -- for these reasons we consider floating point [bloat](bloat.md) and recommend the preference of fixed point.
Floating point arithmetic (normally just *float*) is a method of computer representation of [fractional](rational_number.md) numbers and approximating [real numbers](real_number.md), i.e. numbers with higher than [integer](integer.md) precision (such as 5.13), which is more complex than e.g. [fixed point](fixed_point.md). The core idea of it is to use a radix ("decimal") point that's not fixed but can move around so as to allow representation of both very small and very big values. Nowadays floating point is the standard way of [approximating](approximation.md) [real numbers](real_number.md) in computers (floating point types are called *real* in some programming languages, even though they represent only [rational numbers](rational_number.md), floats can't e.g. represent [pi](pi.md) exactly), basically all of the popular [programming languages](programming_language.md) have a floating point [data type](data_type.md) that adheres to the IEEE 754 standard, all personal computers also have the floating point hardware unit (FPU) and so it is widely used in all [modern](modern.md) programs. However most of the time a simpler representation of fractional numbers, such as the mentioned [fixed point](fixed_point.md), suffices, and weaker computers (e.g. [embedded](embedded.md)) may lack the hardware support so floating point operations are emulated in software and therefore slow -- for these reasons we consider floating point [bloat](bloat.md) and recommend the preference of fixed point.
Is floating point literal evil? Well, of course not, but it is extremely overused. You may need it for precise scientific simulations, e.g. [numerical integration](numerical_integration.md), but as our [small3dlib](small3dlib.md) shows, you can comfortably do even [3D rendering](3d_rendering.md) without it. So always consider whether you REALLY need float.
Is floating point literal evil? Well, of course not, but it is extremely overused. You may need it for precise scientific simulations, e.g. [numerical integration](numerical_integration.md), but as our [small3dlib](small3dlib.md) shows, you can comfortably do even [3D rendering](3d_rendering.md) without it. So always consider whether you REALLY need float. You mostly do not.
## How It Works
Floats represent numbers by representing two main parts: the *base* -- actual encoded digits, called **mantissa** (or significand etc.) -- and the position of the radix point. The position of radix point is called the **exponent** because mathematically the floating point works similarly to the scientific notation of extreme numbers that use exponentiation. For example instead of writing 0.0000123 scientists write 123 * 10^-7 -- here 123 would be the mantissa and -7 the exponent.
The very basic idea is following: we have digits in memory and in addition we have a position of the radix point among these digits, i.e. both digits and position of the radix point can change. The fact that the radix point can move is reflected in the name *floating point*. In the end any number stored in float can be written with a finite number of digits with a radix point, e.g. 12.34. Notice that any such number can also always be written as a simple fraction of two integers (e.g. 12.34 = 1 * 10 + 2 * 1 + 3 * 1/10 + 4 * 1/100 = 617/50), i.e. any such number is always a rational number. This is why we say that floats represent fractional numbers and not true real numbers (real numbers such as [pi](pi.md), [e](e.md) or square root of 2 can only be approximated).
More precisely floats represent numbers by representing two main parts: the *base* -- actual encoded digits, called **mantissa** (or significand etc.) -- and the position of the radix point. The position of radix point is called the **exponent** because mathematically the floating point works similarly to the scientific notation of extreme numbers that use exponentiation. For example instead of writing 0.0000123 scientists write 123 * 10^-7 -- here 123 would be the mantissa and -7 the exponent.
Though various numeric bases can be used, in [computers](computer.md) we normally use [base 2](binary.md), so let's consider it from now on. So our numbers will be of format: