less_retarded_wiki/wavelet_transform.md

36 lines
4.9 KiB
Markdown
Raw Permalink Normal View History

2022-09-19 08:56:30 +02:00
# Wavelet Transform
*Good luck trying to understand the corresponding [Wikipedia](wikipedia.md) article.*
2024-08-15 12:41:06 +02:00
Wavelet transform is a [mathematical](math.md) operation, not much different from let's say [Fourier transform](fourier_transform.md), that takes a [signal](signal_processing.md) (e.g. audio or image) and outputs information about the [frequencies](frequency.md) contained in that signal AS WELL as the locations of those frequencies. This comes very handy should we want to analyze and manipulate frequencies in the signal -- for example [JPEG 2000](jpeg_2000.md) exploits wavelet transforms for [compressing](compression.md) images by discarding certain frequencies in them that our eyes are not so sensitive to.
2022-09-19 08:56:30 +02:00
2022-12-26 15:23:35 +01:00
The main advantage over [Fourier transform](fourier_transform.md) (and similar transforms such as [cosine transform](cosine_transform.md)) is that wavelet transform shows us not only the frequencies, but ALSO their locations (i.e. for example time at which these frequencies come into play in an audio signal). This allows us for example to locate specific sounds in audio or apply compression only to certain parts of an image. While localizing frequencies is also possible with Fourier transform with tricks such as [spectrograms](spectrogram.md), wavelet transforms are a more elegant, natural and continuous way of doing so. Note that due to [Heisenberg's uncertainty principle](uncertainty_principle.md) it is mathematically IMPOSSIBLE to know both frequencies and their locations exactly, there always has to be a tradeoff -- the input signal itself tells us everything about location but nothing about frequencies, Fourier transform tells us everything about frequencies but nothing about their locations and wavelet transform is a **midway** between the two -- it tells us something about frequencies and their approximate locations.
2022-09-19 08:56:30 +02:00
Of course there is always an inverse transform for a wavelet transform so we can transform the signal, then manipulate the frequencies and transform it back.
Wavelet transforms use so called **[wavelets](wavelet.md)** (*tiny waves*) as their basis function, similarly to how Fourier transform uses [sine](sin.md)/[cosine](cos.md) functions to analyze the input signal. A wavelet is a special function (satisfying some given properties) that looks like a "short wave", i.e. while a sine function is an infinite wave (it goes on forever), a wavelet rises up in front of 0, vibrates for a while and then gradually disappears again after 0. Note that there are many possible wavelet functions, so there isn't a single wavelet transform either -- wavelet transforms are a **family of transforms** that each uses some kind of wavelet as its basis. One possible wavelet is e.g. the [Morlet](morlet.md) wavelet that looks something like this:
```
_
: :
.' '.
: :
.'. : : .'.
: : : : : :
.' : : : : '.
____ .. : : : : : : .. ___
'' '. .' : : : : '. .' ''
'_' : : : : '_'
: : : :
: : : :
: : : :
'.' '.'
```
The wavelet is in fact a [complex](complex_number.md) function, what's shown here is just its real part (the [imaginary](imaginary_number.md) part looks similar and swings in a perpendicular way to real part). The transform can somewhat work even just with the real part, for understanding it you can for start ignore complex numbers, but working with complex numbers will eventually create a nicer output (we'll effectively compute an envelope which is what we're interested in).
The output of a wavelet transform is so called **scalogram** (similar to [spectrum](spectrum.md) in Fourier transform), a multidimensional function that for each location in the signal (e.g. time in audio signal or pixel position in an image) and for each frequency gives "strength" of influence of that frequency on that location in the signal. Here the "influence strength" is basically similarity to the wavelet of given frequency and shift, similarity meaning basically a [dot product](dot_product.md) or [convolution](convolution.md). Scalogram can be computed by [brute force](brute_force.md) simply by taking each possible frequency wavelet, shifting it by each possible offset and then convolving it with the input signal.
For big brains, similarly to Fourier transform, wavelet transform can also be seen as transforming a point in high dimensional space -- the input function -- to a different orthogonal [vector basis](vector_basis.md) -- the set of basis vectors represented by the possible scaled/shifted wavelets. I.e. we literally just transform the function into a different coordinate system where our coordinates are frequencies and their locations rather than locations and amplitudes of the signal (the original coordinate system).
TODO