This commit is contained in:
Miloslav Ciz 2025-06-05 22:37:58 +02:00
parent 6f8eee7efa
commit ff6c15886f
11 changed files with 2009 additions and 1991 deletions

View file

@ -12,6 +12,8 @@ Compression means encoding [data](data.md) (such as images or texts) in a differ
We should also mention compression is not applied just to files on hard drives, it may just as well be used let's say in [RAM](ram.md) to utilize it more efficiently. [OpenGL](opengl.md) for instance offers the option to compress textures uploaded to the [GPU](gpu.md) to save space.
As for computational [complexity](complexity.md), it is mostly safe to assume that compression will be more demanding than decompression in terms of resources, and sometimes it's possible to dedicate more resources (time, memory, electricity, ...) to achieve a better compression ratio, i.e. we can "try harder" to "compress the file more". Whereas a compressed file can always be decoded only in one way (to obtain the original file) and the decompression process is normally quite fast and straightforward (e.g. "replace symbols with words from a dictionary"), it happens often that a file can be compressed in many different ways, some of which are better (smaller), and seldom there is another way but [brute force](brute_force.md) to find the best one. This asymmetry in cost of compression and decompression can be advantageous though, considering typical scenarios such as distributing compressed video over the Internet: we have to dedicate a lot of [CPU](cpu.md) time to compress the video well, but only once. The video will then be distributed to many clients and we are benefiting from saved bandwidth on every single copy we transfer, and thanks to simplicity of decompression the clients (of which there are many) aren't bothered nearly as much -- the total cost we're collectively paying is much smaller than if compression was cheap and decompression expensive.
Why don't we compress everything? Firstly because compressed data is slow to work with, it requires significant [CPU](cpu.md) time to compress and decompress data, it's a kind of a space-time tradeoff (we gain more storage space for the cost of CPU time). Compression also [obscures](obscurity.md) data, for example compressed text file will typically no longer be human readable, any code wanting to work with such data will have to include the nontrivial decompression code. Compressed data is also more prone to [corruption](corruption.md) because redundant information (which can help restoring corrupted data) is removed from it -- in fact we sometimes purposefully do the opposite of compression and make our data bigger to protect it from corruption (see e.g. [error correcting](error_correction.md) codes, [RAID](raid.md) etc.). And last but not least, many data can hardly be compressed or are so small it's not even worth it.
The basic division of compression methods is to: