This commit is contained in:
Miloslav Ciz 2024-03-12 21:31:37 +01:00
parent fdc9159bf7
commit 0370ebebbc
18 changed files with 1936 additions and 1728 deletions

View file

@ -49,16 +49,16 @@ Briefly the representation is following (hold on to your chair): leftmost bit is
The standard specifies many formats that are either binary or decimal and use various numbers of bits. The most relevant ones are the following:
| name |M bits|E bits| smallest and biggest number | precision <= 1 up to |
| --------------------------------- | ---- | ---- | --------------------------------------- | -------------------- |
|binary16 (half precision) | 10 | 5 |2^(-24), 65504 | 2048 |
|binary32 (single precision, float) | 23 | 8 |2^(-149), 2127 * (2 - 2^-23) ~= 3 * 10^38| 16777216 |
|binary64 (double precision, double)| 52 | 11 |2^(-1074), ~10^308 | 9007199254740992 |
|binary128 (quadruple precision) | 112 | 15 |2^(-16494), ~10^4932 | ~10^34 |
| name |M bits|E bits| smallest and biggest number | precision <= 1 up to |
| --------------------------------- | ---- | ---- | ---------------------------------------- | -------------------- |
|binary16 (half precision) | 10 | 5 |2^(-24), 65504 | 2048 |
|binary32 (single precision, float) | 23 | 8 |2^(-149), 2^127 * (2 - 2^-23) ~= 3 * 10^38| 16777216 |
|binary64 (double precision, double)| 52 | 11 |2^(-1074), ~10^308 | 9007199254740992 |
|binary128 (quadruple precision) | 112 | 15 |2^(-16494), ~10^4932 | ~10^34 |
**Example?** Let's say we have float (binary34) value `11000000111100000000000000000000`: first bit (sign) is 1 so the number is negative. Then we have 8 bits of exponent: `10000001` (129) which converted from the biased format (subtracting 127) gives exponent value of 2. Then mantissa bits follow: `11100000000000000000000`. As we're dealing with a normal number (exponent bits are neither all 1s nor all 0s), we have to imagine the implicit `1.` in front of mantissa, i.e. our actual mantissa is `1.11100000000000000000000` = 1.875. The final number is therefore -1 * 1.875 * 2^2 = -7.5.
## See Also
- [posit](posit.md)
- [fixed point](fixed_point.md)
- [fixed point](fixed_point.md)