Update
This commit is contained in:
parent
f1848a0f4b
commit
879a37fd2b
14 changed files with 55 additions and 19 deletions
2
float.md
2
float.md
|
@ -1,6 +1,6 @@
|
|||
# Floating Point
|
||||
|
||||
Floating point arithmetic (normally just *float*) is a method of computer representation of [fractional](rational_number.md) numbers and approximating [real numbers](real_number.md), i.e. numbers with higher than [integer](integer.md) precision (such as 5.13), which is more complex than e.g. [fixed point](fixed_point.md). The core idea of it is to use a radix ("decimal") point that's not fixed but can move around so as to allow representation of both very small and very big values. Nowadays floating point is the standard way of [approximating](approximation.md) [real numbers](real_number.md) in computers (floating point types are called *real* in some programming languages, even though they represent only [rational numbers](rational_number.md), floats can't e.g. represent [pi](pi.md) exactly), basically all of the popular [programming languages](programming_language.md) have a floating point [data type](data_type.md) that adheres to the IEEE 754 standard, all personal computers also have the floating point hardware unit (FPU) and so it is widely used in all [modern](modern.md) programs. However most of the time a simpler representation of fractional numbers, such as the mentioned [fixed point](fixed_point.md), suffices, and weaker computers (e.g. [embedded](embedded.md)) may lack the hardware support so floating point operations are emulated in software and therefore slow -- for these reasons we consider floating point [bloat](bloat.md) and recommend the preference of fixed point.
|
||||
Floating point arithmetic (colloquially just *float*) is a method of computer representation of [fractional](rational_number.md) numbers and approximating [real numbers](real_number.md), i.e. numbers with higher than [integer](integer.md) precision (such as 5.13), which is more complex than e.g. [fixed point](fixed_point.md). The core idea of it is to use a radix ("decimal") point that's not fixed but can move around so as to allow representation of both very small and very big values. Nowadays floating point is the standard way of [approximating](approximation.md) [real numbers](real_number.md) in computers (floating point types are called *real* in some programming languages, even though they represent only [rational numbers](rational_number.md), floats can't e.g. represent [pi](pi.md) exactly), basically all of the popular [programming languages](programming_language.md) have a floating point [data type](data_type.md) that adheres to the IEEE 754 standard, all personal computers also have the floating point hardware unit (FPU) and so it is widely used in all [modern](modern.md) programs. However most of the time a simpler representation of fractional numbers, such as the mentioned [fixed point](fixed_point.md), suffices, and weaker computers (e.g. [embedded](embedded.md)) may lack the hardware support so floating point operations are emulated in software and therefore slow -- for these reasons we consider floating point [bloat](bloat.md) and recommend the preference of fixed point.
|
||||
|
||||
**Floating point is tricky**, it works most of the time but a danger lies in programmers relying on this kind of [magic](magic.md) too much, some new generation programmers may not even be very aware of how float works. Even though the principle is not so hard, the emergent complexity of the math is really complex. One floating point expression may evaluate differently on different systems, e.g. due to different rounding settings. One possible pitfall is working with big and small numbers at the same time -- due to differing precision at different scales small values simply get lost when mixed with big numbers and sometimes this has to be worked around with tricks (see e.g. [this](http://the-witness.net/news/2022/02/a-shader-trick/) devlog of The Witness where a float time variable sent into [shader](shader.md) is periodically reset so as to not grow too large and cause the mentioned issue). Another famous trickiness of float is that you shouldn't really be comparing them for equality with a normal `==` operator as small rounding errors may make even mathematically equal expressions unequal (i.e. you should use some range comparison instead).
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue