This commit is contained in:
Miloslav Ciz 2023-08-04 22:23:02 +02:00
parent d765507e6a
commit ecd9c2f991
8 changed files with 22 additions and 6 deletions

View file

@ -4,6 +4,8 @@ Compression means encoding [data](data.md) (such as images or texts) in a differ
{ There is a cool compressing competition known as Hutter Prize that offers 500000 pounds to anyone who can break the current record for compressing [Wikipedia](wikipedia.md). Currently the record is at compressing 1GB down to 115MB. See http://prize.hutter1.net for more. ~drummyfish }
{ [LMAO](lmao.md) retard [patents](patent.md) are being granted on impossible compression algorithms, see e.g. http://gailly.net/05533051.html. See also [Sloot Digital Coding System](sloot.md), a miraculous compression algorithm that "could store a whole movie in 8 KB" lol. ~drummyfish }
Let's keep in mind compression is not applied just to files on hard drives, it can also be used e.g. in RAM to utilize it more efficiently.
Why don't we compress everything? Firstly because compressed data is slow to work with, it requires significant CPU time to compress and decompress data, it's a kind of a space-time tradeoff (we gain more storage space for the cost of CPU time). Secondly compressed data is more prone to [corruption](corruption.md) because redundant information (which can help restoring corrupted data) is removed from it -- in fact we sometimes purposefully do the opposite of compression and make our data bigger to protect it from corruption (see e.g. [error correcting](error_correction.md) codes, [RAID](raid.md) etc.). And last but not least, many data can hardly be compressed or are so small it's not even worth it.
@ -47,6 +49,8 @@ The following is an overview of some most common compression techniques.
**[Predictor](predictor.md) compression** is based on making a *predictor* that tries to guess following data from previous values (which can be done e.g. in case of pictures, sound or text) and then only storing the difference against such a predicted result. If the predictor is good, we may only store the small amount of the errors it makes.
A famous family of dictionary compression algorithms are **Lempel-Ziv (LZ)** algorithms -- these two guys first proposed [LZ77](lz77.md) in (1977, sliding window) and ([LZ78](lz78.md), explicitly stored dictionary, 1978). These were a basis for improved/remix algorithms, most notably [LZW](lzw.md) (1984, Welch). Additionally these algorithms are used and combined in other algorithms, most notably [gif](gif.md) and [DEFLATE](deflate.md) (used e.g. in gzip and png).
An approach similar to the predictor may be trying to find some general mathematical [model](model.md) of the data and then just find and store the right parameters of the model. This may for example mean [vectorizing](vector_graphics.md) a bitmap image, i.e. finding geometrical shapes in an image composed of pixels and then only storing the parameters of the shapes -- of course this may not be 100% accurate, but again if we want to preserve the data accurately, we may additionally also store the small amount of errors the model makes. Similar approach is used in [vocoders](vocoder.md) used in cellphones that try to mathematically model human speech (however here the compression is lossy), or in [fractal](fractal.md) compression of images. A nice feature we gain here is the ability to actually "increase the resolution" (or rather generate detail) of the original data -- once we fit a model onto our data, we may use it to tell us values that are not actually present in the data (i.e. we get a fancy [interpolation](interpotation.md)/[extrapolation](extrapolation.md)).
Another property of data to exploit may be its sparsity -- if for example we have a huge image that's prevalently white, we may say white is the implicit color and we only somehow store the pixels of other colors.