1.6 KiB
1.6 KiB
Byte
TODO
Byte frequency/probability: it may be interesting and/or useful (e.g. for compression) to know how often different byte values appear in the data we process with computers -- indeed, this always DEPENDS; if we are working with plain ASCII text, we will never encounter values above 127, and on the other hand if we are processing photos from a polar expedition, we will likely mostly encounter byte values of 255 (as snow will cause most pixels to be completely white). In general we may expect values such as 0, 255, 1 and 2 to be most frequent, as many times these are e.g. assigned special meanings in data encodings, they may be cutoff values etc. Here is a table of measured byte frequencies in real data:
{ Measured by me. ~drummyfish }
type of data | least c. | 2nd least c. | 3rd least c. | 3rd most c. | 2nd most c. | most c. |
---|---|---|---|---|---|---|
GNU/Linux 64bit executable | 0x9e (0%) | 0xb2 (0%) | 0x9a (0%) | 0x48 (2%) | 0xff (3%) | 0x00 (32%) |
bare metal ARM executable | 0xcf (0%) | 0xb7 (0%) | 0xa7 (0%) | 0xff (2%) | 0x01 (3%) | 0x00 (15%) |
UTF8 English txt book | 0x00 (0%) | 0x01 (0%) | 0x02 (0%) | 0x74 (t , 6%) |
0x65 (e , 8%) |
0x20 ( , 14%) |
C source code | 0x00 (0%) | 0x01 (0%) | 0x02 (0%) | 0x31 (1 , 6%) |
0x20 ( , 12%) |
0x2c (, , 16%) |
raw 24bit RGB photo image | 0x07 (0%) | 0x09 (0%) | 0x08 (0%) | 0xdd (0%) | 0x00 (1%) | 0xff (25%) |