Update

2025-05-07 21:16:44 +02:00 · 2025-05-07 21:16:44 +02:00 · 8b530b5952
commit 8b530b5952
parent 4d545b6845
20 changed files with 206 additions and 24 deletions
--- a/float.md
+++ b/float.md
@ -4,10 +4,12 @@ In programming floating point (colloquially just *float*) is a way of representi

 Back in the earlier days of personal computers -- like the early [90s](90s.md) -- hardware accelerated floating point still wasn't completely common, for example Intel 80286 didn't have a built-in FPU, it had to be bought extra, and that was usually done only by professionals like engineers and scientists, games didn't really use floating point. Integrated FPU became standard only later on.

-**Floating point is tricky**, it works most of the time but a danger lies in programmers relying on this kind of [magic](magic.md) too much, some new generation programmers may not even be very aware of how float works. Even though the principle is not difficult to understand, the emergent complexity of the math can get really complex and practical problems of implementation and standardization don't help at either. One floating point expression may evaluate differently on different systems, for example due to different rounding settings. Floating point can introduce [chaotic](chaos.md) behavior into linear systems as it inherently makes rounding errors and so becomes a nonlinear system (source: http://foldoc.org/chaos). One common pitfall of float is working with big and small numbers at the same time -- due to differing precision at different scales small values simply get lost when mixed with big numbers and sometimes this has to be worked around with tricks (see e.g. [this](http://the-witness.net/news/2022/02/a-shader-trick/) devlog of The Witness where a float time variable sent into [shader](shader.md) is periodically reset so as to not grow too large and cause the mentioned issue). Another famous trickiness of float is that you shouldn't really be comparing them for equality with a normal `==` operator as small rounding errors may make even mathematically equal expressions unequal (i.e. you should use some range comparison instead).
+**Floating point is tricky**, it works most of the time but a danger lies in programmers relying on this kind of [magic](magic.md) too much, some new generation programmers may not even be very aware of how floats internally work. Even though the principle is not difficult to understand, the emergent complexity of the math can get really complex and practical problems of implementation and standardization don't help at either. One floating point expression may evaluate differently on different systems, for example due to different rounding settings. Floating point can introduce [chaotic](chaos.md) behavior into linear systems as it inherently makes rounding errors and so becomes a nonlinear system (source: http://foldoc.org/chaos). One common pitfall of float is working with big and small numbers at the same time -- due to differing precision at different scales small values simply get lost when mixed with big numbers and sometimes this has to be worked around with tricks (see e.g. [this](http://the-witness.net/news/2022/02/a-shader-trick/) devlog of The Witness where a float time variable sent into [shader](shader.md) is periodically reset so as to not grow too large and cause the mentioned issue). Another famous trickiness of float is that you shouldn't really be comparing them for equality with a normal `==` operator as small rounding errors may make even mathematically equal expressions unequal (i.e. you should use some range comparison instead).

 And there is more: floating point behavior really depends on the language you're using (and possibly even compiler, its setting etc.) and it may not be always completely defined/specified, leading to possible [nondeterministic](determinism.md) behavior which can cause real trouble e.g. in physics engines. This may also lead to nasty bugs and trouble with [portability](portability.md) (i.e. assuring the exact same behavior on all platforms).

+There is also a bit of an unfortunate situation with standardization. The widely adopted IEEE 754 standard is not nearly flawless in design, it's actually kind of bad but also came to be widely established and supported in all hardware so much so that it's immensely difficult to replace it even with an objectively better ways of handling floating point numbers, for example [posits](posit.md).
+
 { Really as I'm now getting down the float rabbit hole I'm seeing what a huge mess it all is, I'm not nearly an expert on this so maybe I've written some BS here, which just confirms how messy floats are. Anyway, from the articles I'm reading even being an expert on this issue doesn't seem to guarantee a complete understanding of it :) Just avoid floats if you can. ~drummyfish }

 For starers consider the following snippet (let's now assume the standard 32 bit IEEE float etc.):
@ -120,3 +122,4 @@ The following table shows approximate resolution (i.e. distance to next represen

 - [posit](posit.md)
 - [fixed point](fixed_point.md)
+- [conum](conum.md)