Update

2024-05-27 22:48:10 +02:00 · 2024-05-27 22:48:10 +02:00 · 6b6fe66cd4
commit 6b6fe66cd4
parent 412b7489cc
13 changed files with 1849 additions and 1789 deletions
--- a/compression.md
+++ b/compression.md
@ -78,7 +78,7 @@ For **images** we usually exploit the fact that human sight is less sensitive to

 In **video** compression we may reuse the ideas from image compression and further employ exploiting temporal redundancy, i.e. the fact that consecutive video frames look similar, so we may only encode some kind of delta (change) against the previous (or even next) frame. The most common way is to fully record only one key frame in some time span (so called I-frame, further compressed with image compression methods), then divide it to small blocks and estimate the movement of those blocks so that they approximately make up the following frames -- we then record only the motion vectors of the blocks. This is why videos look "blocky". In the past [interlacing](interlacing.md) was also used -- only half of each frame was recorded, every other row was dropped; when playing, the frame was interlaced with the previous frame. Another cool idea is keyframe [superresolution](superresolution.md): you store only some keyframes in full resolutions and store the rest of them in smaller size; during decoding you can use the nearby full scale keyframes to upscale the low res keyframes (search for matching subblocks in the low res image and match them to those in the big res image).

-In **audio** we usually straight remove frequencies that humans can't hear (usually said to be above 20 kHz), for this we again convert the audio from spatial to frequency domain (using e.g. [Fourier transform](fourier_transform.md)). Furthermore it is very inefficient to store sample values directly -- we rather use so called *differential PCM*, a lossless compression that e.g. stores each sample as a difference against the previous sample (which is usually small and doesn't use up many bits). This can be improved by a predictor, which tries to predict the next values from previous values and then we only save the difference against this prediction. *Joint stereo coding* exploits the fact that human hearing is not so sensitive to the direction of the sound and so e.g. instead of recording both left and right stereo channels in full quality rather records the sum of both and a ratio between them (which can get away with fewer bits). *Psychoacoustics* studies how humans perceive sound, for example so called *masking*: certain frequencies may for example mask nearby (both in frequency and time) frequencies (make them unhearable for humans) so we can drop them. See also [vocoders](vocoder.md).
+In **audio** we usually straight remove frequencies that humans can't hear (usually said to be above 20 kHz), for this we again convert the audio from spatial to frequency domain (using e.g. [Fourier transform](fourier_transform.md)). Furthermore it is very inefficient to store sample values directly -- we rather use so called *differential PCM*, a lossless compression that e.g. stores each sample as a difference against the previous sample (which is usually small and doesn't use up many bits). This can be improved by a predictor, which tries to predict the next values from previous values and then we only save the difference against this prediction. *Joint stereo coding* exploits the fact that human hearing is not so sensitive to the direction of the sound and so e.g. instead of recording both left and right stereo channels in full quality rather records the sum of both and a ratio between them (which can get away with fewer bits). *Psychoacoustics* studies how humans perceive sound, for example so called *masking*: certain frequencies may for example mask nearby (both in frequency and time) frequencies (make them unhearable for humans) so we can drop them. See also [vocoders](vocoder.md). For specific kinds of audio we may further employ more detailed knowledge, for example with instrumental [music](music.md) we can just store the notes that are being played plus instruments that play them, for example with [MIDI](midi.md) -- this format was not made for compression per se, but it does allow us to store music in much smaller size than directly storing audio.

 TODO: LZW, DEFLATE etc.