less_retarded_wiki/byte.md

16 lines
1.6 KiB
Markdown
Raw Normal View History

2023-08-22 20:36:49 +02:00
# Byte
TODO
**Byte frequency/probability**: it may be [interesting](interesting.md) and/or useful (e.g. for [compression](compression.md)) to know how often different byte values appear in the data we process with computers -- indeed, this always DEPENDS; if we are working with plain [ASCII](ascii.md) text, we will never encounter values above 127, and on the other hand if we are processing photos from a polar expedition, we will likely mostly encounter byte values of 255 (as snow will cause most pixels to be completely white). In general we may expect values such as [0](zero.md), 255, [1](one.md) and [2](two.md) to be most frequent, as many times these are e.g. assigned special meanings in data encodings, they may be cutoff values etc. Here is a table of measured byte frequencies in real data:
{ Measured by me. ~drummyfish }
| type of data | least c. | 2nd least c. | 3rd least c. | 3rd most c. | 2nd most c. | most c. |
| -------------------------- | --------- | ------------ | ------------ | ------------ | ------------- | ------------- |
| GNU/Linux 64bit executable | 0x9e (0%) | 0xb2 (0%) | 0x9a (0%) | 0x48 (2%) | 0xff (3%) | 0x00 (32%) |
| bare metal ARM executable | 0xcf (0%) | 0xb7 (0%) | 0xa7 (0%) | 0xff (2%) | 0x01 (3%) | 0x00 (15%) |
| UTF8 English txt book | 0x00 (0%) | 0x01 (0%) | 0x02 (0%) |0x74 (`t`, 6%)|0x65 (`e`, 8%) |0x20 (` `, 14%)|
| C source code | 0x00 (0%) | 0x01 (0%) | 0x02 (0%) |0x31 (`1`, 6%)|0x20 (` `, 12%)|0x2c (`,`, 16%)|
| raw 24bit RGB photo image | 0x07 (0%) | 0x09 (0%) | 0x08 (0%) | 0xdd (0%) | 0x00 (1%) | 0xff (25%) |