This commit is contained in:
Miloslav Ciz 2023-12-26 14:46:41 +01:00
parent 0d8f2a5fea
commit b35242f273
17 changed files with 58 additions and 31 deletions

View file

@ -1,6 +1,6 @@
# Compression
Compression means encoding [data](data.md) (such as images or texts) in a different way so that the data takes less space (memory) while keeping all the important [information](information.md), or, in plain terms, it usually means "making files smaller". Compression is pretty important so that we can utilize memory or bandwidth well -- without it our hard drives would be able to store just a handful of videos, internet would be slow as hell due to the gigantic amount of transferred data and our [RAM](ram.md) wouldn't suffice for things we normally do. There are many [algorithms](algorithm.md) for compressing various kinds of data, differing by their complexity, performance, efficiency of compression etc. The reverse process to compression (getting the original data back from the compressed data) is called **decompression**. The ratio of the compressed data size to the original data size is called **compression ratio** (the lower, the better). The science of data compression is truly huge and complicated AF, here we'll just mention some very basics. Also watch out: compression algorithms are often a [patent](patent.md) mine field.
Compression means encoding [data](data.md) (such as images or texts) in a different way so that the data takes less space (memory) while keeping all the important [information](information.md), or, in plain terms, it usually means "making files smaller". Compression is pretty important so that we can utilize memory or [bandwidth](bandwidth.md) well -- without it our hard drives would be able to store just a handful of videos, internet would be slow as hell due to the gigantic amount of transferred data and our [RAM](ram.md) wouldn't suffice for things we normally do. There are many [algorithms](algorithm.md) for compressing various kinds of data, differing by their complexity, performance, efficiency of compression etc. The reverse process to compression (getting the original data back from the compressed data) is called **decompression**. The ratio of the compressed data size to the original data size is called **compression ratio** (the lower, the better). The science of data compression is truly huge and complicated AF, here we'll just mention some very basics. Also watch out: compression algorithms are often a [patent](patent.md) mine field.
{ I've now written a tiny LRS compression library/utility called [shitpress](shitpress.md), check it out at https://codeberg.org/drummyfish/shitpress. It's fewer than 200 LOC, so simple it can nicely serve educational purposes. The principle is simple, kind of a dictionary method, where the dictionary is simply the latest output 64 characters; if we find a long word that occurred recently, we simply reference it with mere 2 bytes. It works relatively well for most data! ~drummyfish }
@ -10,7 +10,7 @@ Compression means encoding [data](data.md) (such as images or texts) in a differ
Let's keep in mind compression is not applied just to files on hard drives, it can also be used e.g. in RAM to utilize it more efficiently.
Why don't we compress everything? Firstly because compressed data is slow to work with, it requires significant CPU time to compress and decompress data, it's a kind of a space-time tradeoff (we gain more storage space for the cost of CPU time). Secondly compressed data is more prone to [corruption](corruption.md) because redundant information (which can help restoring corrupted data) is removed from it -- in fact we sometimes purposefully do the opposite of compression and make our data bigger to protect it from corruption (see e.g. [error correcting](error_correction.md) codes, [RAID](raid.md) etc.). And last but not least, many data can hardly be compressed or are so small it's not even worth it.
Why don't we compress everything? Firstly because compressed data is slow to work with, it requires significant CPU time to compress and decompress data, it's a kind of a space-time tradeoff (we gain more storage space for the cost of CPU time). Compression also [obscures](obscurity.md) data, for example compressed text file will typically no longer be human readable, any code wanting to work with such data will have to include the nontrivial decompression code. Compressed data is also more prone to [corruption](corruption.md) because redundant information (which can help restoring corrupted data) is removed from it -- in fact we sometimes purposefully do the opposite of compression and make our data bigger to protect it from corruption (see e.g. [error correcting](error_correction.md) codes, [RAID](raid.md) etc.). And last but not least, many data can hardly be compressed or are so small it's not even worth it.
The basic division of compression methods is to: