4.9 KiB
Wavelet Transform
Good luck trying to understand the corresponding Wikipedia article.
Wavelet transform is a mathematical operation, similar to e.g. Fourier transform, that takes a signal (e.g. audio or an image) and outputs information about the frequencies contained in that signal AS WELL as the locations of those frequencies. This is of course extremely useful when we want to analyze and manipulate frequencies in our signal -- for example JPEG 2000 uses wavelet transforms for compressing images by discarding certain frequencies in them that our eyes are not so sensitive to.
The main advantage over Fourier transform (and similar transforms such as cosine transform) is that wavelet transform shows us not only the frequencies, but ALSO their locations (i.e. for example time at which these frequencies come into play in an audio signal). This allows us for example to locate specific sounds in audio or apply compression only to certain parts of an image. While localizing frequencies is also possible with Fourier transform with tricks such as spectrograms, wavelet transforms are a more elegant, natural and continuous way of doing so. Note that due to Heisenberg's uncertainty principle it is mathematically IMPOSSIBLE to know both frequencies and their locations exactly, there always has to be a tradeoff -- the input signal itself tells us everything about location but nothing about frequencies, Fourier transform tells us everything about frequencies but nothing about their locations and wavelet transform is a midway between the two -- it tells us something about frequencies and their approximate locations.
Of course there is always an inverse transform for a wavelet transform so we can transform the signal, then manipulate the frequencies and transform it back.
Wavelet transforms use so called wavelets (tiny waves) as their basis function, similarly to how Fourier transform uses sine/cosine functions to analyze the input signal. A wavelet is a special function (satisfying some given properties) that looks like a "short wave", i.e. while a sine function is an infinite wave (it goes on forever), a wavelet rises up in front of 0, vibrates for a while and then gradually disappears again after 0. Note that there are many possible wavelet functions, so there isn't a single wavelet transform either -- wavelet transforms are a family of transforms that each uses some kind of wavelet as its basis. One possible wavelet is e.g. the Morlet wavelet that looks something like this:
_
: :
.' '.
: :
.'. : : .'.
: : : : : :
.' : : : : '.
____ .. : : : : : : .. ___
'' '. .' : : : : '. .' ''
'_' : : : : '_'
: : : :
: : : :
: : : :
'.' '.'
The wavelet is in fact a complex function, what's shown here is just its real part (the imaginary part looks similar and swings in a perpendicular way to real part). The transform can somewhat work even just with the real part, for understanding it you can for start ignore complex numbers, but working with complex numbers will eventually create a nicer output (we'll effectively compute an envelope which is what we're interested in).
The output of a wavelet transform is so called scalogram (similar to spectrum in Fourier transform), a multidimensional function that for each location in the signal (e.g. time in audio signal or pixel position in an image) and for each frequency gives "strength" of influence of that frequency on that location in the signal. Here the "influence strength" is basically similarity to the wavelet of given frequency and shift, similarity meaning basically a dot product or convolution. Scalogram can be computed by brute force simply by taking each possible frequency wavelet, shifting it by each possible offset and then convolving it with the input signal.
For big brains, similarly to Fourier transform, wavelet transform can also be seen as transforming a point in high dimensional space -- the input function -- to a different orthogonal vector basis -- the set of basis vectors represented by the possible scaled/shifted wavelets. I.e. we literally just transform the function into a different coordinate system where our coordinates are frequencies and their locations rather than locations and amplitudes of the signal (the original coordinate system).
TODO