less_retarded_wiki/byte.md
2023-08-22 20:36:49 +02:00

1.6 KiB

Byte

TODO

Byte frequency/probability: it may be interesting and/or useful (e.g. for compression) to know how often different byte values appear in the data we process with computers -- indeed, this always DEPENDS; if we are working with plain ASCII text, we will never encounter values above 127, and on the other hand if we are processing photos from a polar expedition, we will likely mostly encounter byte values of 255 (as snow will cause most pixels to be completely white). In general we may expect values such as 0, 255, 1 and 2 to be most frequent, as many times these are e.g. assigned special meanings in data encodings, they may be cutoff values etc. Here is a table of measured byte frequencies in real data:

{ Measured by me. ~drummyfish }

type of data least c. 2nd least c. 3rd least c. 3rd most c. 2nd most c. most c.
GNU/Linux 64bit executable 0x9e (0%) 0xb2 (0%) 0x9a (0%) 0x48 (2%) 0xff (3%) 0x00 (32%)
bare metal ARM executable 0xcf (0%) 0xb7 (0%) 0xa7 (0%) 0xff (2%) 0x01 (3%) 0x00 (15%)
UTF8 English txt book 0x00 (0%) 0x01 (0%) 0x02 (0%) 0x74 (t, 6%) 0x65 (e, 8%) 0x20 ( , 14%)
C source code 0x00 (0%) 0x01 (0%) 0x02 (0%) 0x31 (1, 6%) 0x20 ( , 12%) 0x2c (,, 16%)
raw 24bit RGB photo image 0x07 (0%) 0x09 (0%) 0x08 (0%) 0xdd (0%) 0x00 (1%) 0xff (25%)