I am fascinated by the Fourier transform; specifically: its ability to decompose a wave into a series of fundamental waves. Mathematicians might prefer to see it described it more formally, but here I'll be trying to make it understandable.
Obviously, for a wave to have motion, time needs to pass. If we freeze time then our wave cannot oscillate. To say a wave has frequency, we need first to observe a period in which the wave has motion. The longer we observe its movement the more we can mathematically extract from it, but for now let's simply say that we need to acquire a range of samples measured as the wave oscillates. From this point, armed with our array of floating point numerical samples, we menacingly approach the rabbit hole.
The remainder of this post will focus on a single practical aspect of the Fourier transform: how to interpret the result, especially its magnitude. I'm using the
Note: you need to specifically pass
Since I grew up in an age when Compact Discs were seen as technological marvels, I am going to use the standard 44.1 kHz as my sample rate for the following example. A full second's worth of samples (44100) might seem a reasonable place to start, but as we'll see later, it's by no means a requirement. The chosen sample rate theoretically limits the upper frequency to 22050 Hz, and even though we pass in 44100 samples to
It's probably important to note that we haven't changed the sample rate: that's still 44100 and it still means we cannot get any higher frequencies than 22050 Hz. What we've done is alter the resolution. With \(\frac{4}{5}\) of the original resolution each frequency bin would be \(\frac{5}{4}\) as wide so that we could still accommodate all 22050 Hz, but in fewer distinct bins. It's not a net loss, though; it's a trade off: we've lost resolution in the frequency domain but we've gained resolution in the time domain (there will be \(\frac{5}{4}\) as many Fourier transforms in any given time period). Of course, this now means we cannot simply read off the frequency from the bin number; now we need to scale the bin number by our resolution factor. Staying with the \(\frac{4}{5}\) example: 800 Hz would now be found in bin 640, which is \(\frac{4}{5}\) of 800.
So... why did we choose to use the
Finally, how did we get 22050 for the maximum
Obviously, for a wave to have motion, time needs to pass. If we freeze time then our wave cannot oscillate. To say a wave has frequency, we need first to observe a period in which the wave has motion. The longer we observe its movement the more we can mathematically extract from it, but for now let's simply say that we need to acquire a range of samples measured as the wave oscillates. From this point, armed with our array of floating point numerical samples, we menacingly approach the rabbit hole.
The remainder of this post will focus on a single practical aspect of the Fourier transform: how to interpret the result, especially its magnitude. I'm using the
MathNet.Numerics
package from Nuget, which provides the handy method Fourier.Forward(samples, FourierOptions.NoScaling)
.Note: you need to specifically pass
FourierOptions.NoScaling
if you want control of the output.Since I grew up in an age when Compact Discs were seen as technological marvels, I am going to use the standard 44.1 kHz as my sample rate for the following example. A full second's worth of samples (44100) might seem a reasonable place to start, but as we'll see later, it's by no means a requirement. The chosen sample rate theoretically limits the upper frequency to 22050 Hz, and even though we pass in 44100 samples to
Fourier.Forward
we get back a symmetrical result with the first 22050 data points carrying the frequencies up to 22050 Hz, and the second 22050 carrying them all in again reverse. We can safely ignore the second half of the output for this post. If our input signal (encoded in the 44100 samples) was a pure sinusoidal signal (of any frequency up to 22050 Hz) oscillating between +1 and -1, it will contribute 22050 units to the frequency bin associated with the signal. Those units can be found in the Complex32.Magnitude
property. Multiplying the amplitude of the input signal by any constant A will scale the magnitude to A x 22050. However, if we pass in a fraction of the full second's worth of samples - let's say \(\frac{4}{5}\), which is 35280 instead of 44100 - then we see the number of units fall to \(\sqrt{\frac{4}{5}}A\frac{samplerate}{2}\). And yes, the number would rise if the fraction was greater than one.It's probably important to note that we haven't changed the sample rate: that's still 44100 and it still means we cannot get any higher frequencies than 22050 Hz. What we've done is alter the resolution. With \(\frac{4}{5}\) of the original resolution each frequency bin would be \(\frac{5}{4}\) as wide so that we could still accommodate all 22050 Hz, but in fewer distinct bins. It's not a net loss, though; it's a trade off: we've lost resolution in the frequency domain but we've gained resolution in the time domain (there will be \(\frac{5}{4}\) as many Fourier transforms in any given time period). Of course, this now means we cannot simply read off the frequency from the bin number; now we need to scale the bin number by our resolution factor. Staying with the \(\frac{4}{5}\) example: 800 Hz would now be found in bin 640, which is \(\frac{4}{5}\) of 800.
So... why did we choose to use the
Complex32.Magnitude
property, and what does it mean? Each of the data points we get as output from Fourier.Forward
is a complex number, having both a Complex32.Real
and an Complex32.Imaginary
part. Besides these two orthogonal vectors pointing in the real and imaginary axes, another way to interpret them is as a single vector that has two properties: Complex32.Magnitude
and Complex32.Phase
. Together, these allow us to imagine the frequency bin as a clock with a single spinning hand of variable length. If the hand is long: there's more signal; if it's short: there is less. And the angle of the hand shows us the offset of the signal (relative to convention).Input Shift | Re | Im | Mag | Phase |
+0 | 0 | -22050 | 22050 | \(-\frac{π}{2}\) |
+\(\frac{π}{2}\) | 22050 | 0 | 22050 | 0 |
+π | 0 | 22050 | 22050 | \(\frac{π}{2}\) |
+\(\frac{3π}{2}\) | -22050 | 0 | 22050 | π |
Finally, how did we get 22050 for the maximum
Complex32.Magnitude
given a maximum amplitude of 1 (and minimum amplitude of -1) for our input signal? Well, that's how the Fourier transform works: it essentially adds up the 22050 vector results of pairing the input signal with the known pure sinusoidal signal.
No comments:
Post a Comment