General > General Technical Chat

FFT - Spectrum analyser - math illiterate!

(1/3) > >>

paulca:
I was never very good with anything beyond basic trig, so I'm fumbling about trying to do this "layman" style with only partial reference to what others have done.

What I have working:
An input audio stream being offloaded to a second ESP32 core for calculating the FFT and updating the LED array.  Excess data is dropped.

FFT is a simple "ReadFFT" taking 4096 single channel samples @44.1kHz, producing 2048 bins across a 20kHz range.
I truncate this range to 14kHz as I'm not interested in the top end treble.

I think took my frequency range and the number of bands on the analyzer (32) and inferred by experiment in a spreadsheet a exp value that will give me the range 30Hz to 14kHz in 32 bands.  Turns out to be about 1.2

Q1:  I'm not actually sure what to do with magnitudes.  I have tried ignorantly adding them up and also averaging them with various methods, each produce different results, but not exactly what I'm after.

Then to compress this dynamic range of magnitudes from 1000 noise floor to well over 11 million, I tried to create the same "log spaced bins" for amplitutde.  Again this works, but it's a nightmare to tune.

If I use something like log3 or log6 to get the totalMagnitude->amplitude levels for the 8 pixels it produces something that is probably quite accurate looking.  A curve dropping gentle from low end to mid treble with it's top 2 or 3 leds flickering.  It reminds me of the "spectrum analyser" display on a cheap HiFi.

When I need to figure out, is how to dynamically compress the "interesting" section of magnitudes on each band.  I'd like to see the bass end "bouncing" on the beat for example.  If I lift the noise floor way, way up to mid amplitude then I get the "bouncing" effect, but... only for the highest bands, the rest of 0.  So I need some way to determine the best amplitude window and scale for each band.... I think.

Brain is fried.

I could just go and look at someone elses code and copy it, but...  I want to figure this out without just blindly stealing it.  (I did steal the FFT!).

paulca:
What I think the thing needs to save all this messing around is a set of knobs.

NoiseFloor
Peak
Gain
TopBandFreq

Then I can tweak it as much as I please.

jonpaul:
easily available SW, some free for RTA, 1/3 Oct, hi res FFT.

App for windows and Mac

Some free

j

Nominal Animal:

--- Quote from: paulca on November 07, 2022, 04:08:35 pm ---FFT is a simple "ReadFFT" taking 4096 single channel samples @44.1kHz, producing 2048 bins across a 20kHz range.
I truncate this range to 14kHz as I'm not interested in the top end treble.

--- End quote ---
What is "ReadFFT"?  What is the sample format, float or N-bit signed or unsigned integers?  What is the FFT bin format, complex simple-precision floating-point (two floats per bin), or what?


--- Quote from: paulca on November 07, 2022, 04:08:35 pm ---I think took my frequency range and the number of bands on the analyzer (32) and inferred by experiment in a spreadsheet a exp value that will give me the range 30Hz to 14kHz in 32 bands.  Turns out to be about 1.2
--- End quote ---
Okay, so what you want to achieve, is to do a digital 32-channel VU meter, right?

What are your vertical units?  A real spectrum analyzer displays the amplitude spectrum (or power spectrum, which is amplitude squared).  Volume or loudness or similar?

Sound loudness is a complex issue, because not only does it involve the amplitude (sound pressure), but also the duration, with shorter sounds (below 200ms or so) sounding less loud that longer sounds, at the same frequency and amplitude.  Many bargraphs are simple peak meters with basically fixed decay rate.

As to amplitude –– better magnitude –– of the signal: If FFT bin \$k\$ contains complex value \$z = x + i y\$, then the magnitude is \$\sqrt{x^2 + y^2}\$.  (The phase would be \$\operatorname{atan2}(y, x)\$.)

Averaging or summing the magnitudes in the FFT isn't really useful.  If the band is wide, two single sinusoidal signals at maximum amplitude have twice the average or sum of just one sinusoidal signal at maximum amplitude, but in real life, the difference is much smaller (because of the logarithmic sensitivity).

Energy, or power if we consider fixed-size slices in time, is more useful.  I suggest you take a look at the Wikipedia spectral density article – also for proper terms, since I'm being rather lax in my use here.  We can define power as the square of the magnitude of a specific sinusoidal signal, for example as \$x^2 + y^2\$ of an FFT bin.

Within a given frequency band (across multiple FFT bins), using maximum magnitude among those bins, gets you a peak meter.  It describes the highest amplitude of a sinusoidal signal within that band.  Not very useful, but some use it nevertheless.

Summing the squared magnitudes gives you the power within the band (across those FFT bins).

If you sum over all FFT bins, you get the total power in the signal (within the FFT window).  You can also calculate this by calculating the statistical variance of the samples of which the FFT was taken (per Parseval's theorem; again see Wikipedia spectral density article).  Depending on implementation, the two may differ by a factor of \$2\$, and or \$2 \pi\$, depending on the FFT scale factor, and whether the number of bins is half the number of samples or not.  If you have the original samples the FFT is taken of, you can calculate their mean (average, the DC bias) and the variance (signal power) in a single loop, at the cost of two additions and one multiplication per sample, plus one additional multiplication, division, and subtraction.)

If we look at the amplitude of a sinusoidal signal humans perceive as equally loud, as a function of frequency, a good multi-band VU meter should have band width and band power scaled according to the perception; with narrowest and most sensitive (large-scaled) bands at the most sensitive portions of human hearing. for human purposes, the bands can have varying frequency widths and different weighing factors, in a good VU meter.  Note the logarithmic frequency on the horizontal axis in the diagram, and remember that in an FFT, all bins are a constant frequency apart.

The vertical axis for the power can be linear (useful for a power spectrum analyser) or logarithmic (for a human VU meter).


If the FFT window size is \$N\$ samples, and the sample rate is \$F\$, the difference in frequency between consecutive bins is \$f = F/N\$.  The very first bin corresponds to the DC bias = average signal, second bin to frequency \$f\$, and in general bin at index \$k\$ to frequency \$k f\$.

At 4096 samples per FFT at 44100 Hz sample rate, the difference in frequency between consecutive bin (excluding the zeroth, DC bias bin) corresponds to a frequency of \$44100 \text{Hz} / 4096 \approx 10.7666 \text{Hz}\$.

Using standard A440 pitch, middle C frequency is approximately 261.63 Hz.  If we use \$A\$ for the frequency of A above middle C (440 Hz in A440), then the frequency of a note \$x\$ semitones below (negative) or above (positive) that A is
$$f = A ~ C^x, \quad C = 2^{1/12} \approx 1.059463094359295$$

If you want \$B\$ frequencies (\$k = 0 \dots B-1\$) that are spaced the same number of semitones apart, from frequency \$f_\min\$ to \$f_\max\$ (using whatever units, as long as you use the same units for all frequency variables here), then the frequency \$k\$ is \$f_k\$,
$$f_k = C^{n + k}, \quad n = (B-1)\frac{\log f_\min}{\log f_\max - \log f_\min}, \quad C = f_\min^{1/n}$$
For \$f_\min = 30\$, \$f_\max = 14000\$, and \$B = 32\$, \$n \approx 17.15647904044056\$ and \$C \approx 1.219261871741125\$, with the frequencies being approximately 30.000, 36.577, 44.598, 54.377, 66.299, 80.836, 98.561, ..., 7723.908, 9417.466, 11482.357, and 14000 Hz.

These do not fall nicely to your FFT bins, so some hand-tuning on the FFT bins-per-band is needed at minimum.  It is mostly an issue at the lower end.  For example, the last band corresponds to bins 1066..1300 or so.


You can do arbitrary bands, by filtering the original signal (with a low-pass, high-pass, or band-pass FIR or IIR filter, for example), and then calculating the statistical variance of the filtered signal; this corresponds to the signal power in that band.

Unfortunately, this can be even more computational work than involved in the FFT.  The FFT has \$O(N \log N)\$ computational complexity with a window of \$N\$ samples.  If the FIR/IIR filters have a sum total of \$M\$ taps, the total complexity is \$O(N M)\$, so it really depends on how many taps you need to implement each bandpass filter.  On the other hand, each band can have their own window width, and update interval, with lowest frequency band having the widest window and gets most seldom updated; with the highest frequency band having the narrowest window and gets most often updated.

Or perhaps I'm just blabbing about uselessly here :-//

paulca:
Wow.  Lots to read.  Thanks.

I got it working, by hacking and hacking and scratching my beard and hacking some more.

The issue isn't now that it's producing rubbish or that's it's anyway wrong, it's just that it's not as visual an effect as I was after.

I tried to grab a video of it, but, alas every time I hit record the phone cuts the music... which is driving the Bluetooth sink on the LED matrix.

Anyway, on a few of your points...

The FFT (RealFFT), returns me an array of bins with magnitudes and frequencies.  I assume the magnitudes are already squared as they are huge!  I don't fancy squaring 65million!  No, I have no idea what units the magnitudes are in.

Samples are 16bit/sample 1 channel, I2S.  Not 100% sure of the codec format, but the FFT figures it out anyway.

Spatial density ..  I hacked that with a exponent value that would get me from 30Hz to 14kHz in 32 bands.  1.22  So band 2 is 1.22 times band 1 and so on.  This takes care of most of the interesting stuff being the low range so a lower number of bins in each band.  Like the first few bands only really have 1 or 2 bins.  By the time you get to the last 2 bands, they have over a 100 each.

A realisation was that... if I average out the magnitudes bands instead of adding them all up, what I effectively do is, undo the amplitude balancing effect the above frequency distribution achieves.  When I put that back to just summing them, the top end lit up again!

I think it is actually doing rather well, it's a little inaccurate at the lower end, when I give it a pure sine tone (off the internet), the displayed signal lags behind the real one quite a bit and catches up by 1kHz.

As to making it more visually appealing, I am considering dropping it down to 16 bands.

I also need to figure out a scaling algorithm which ties to find the "energy" levels to keep all 8 pixels in action.  For example, listening to a dance hit, I would expect to see the bottom 1/3 of the leds bouncing up and down.  At the moment they look a lot more compressed.  Given I am listening to over produced dance pop off of YouTube it is very likely it's been compressed to hell.

I've tried various things to select what magnitude value turn on which LEDs, but not found the right setup yet.

I still think I should do what others have done and make some of the tuning parameters manual with POTs on the front :)

I'd doing this on an ESP32 with dual cores and you would be inclinded to think it should be pretty speedy... what with 2 cores.  The trouble is half of "my" time to do stuff is taken up waiting on data to transfer via an RTOS stream buffer.

I have no delays and 1 core doing the buffer transfer in and processing the magnitudes and rendering the LEDs gives me a frame rate of only 25-30fps.  That's pants.  I'm pretty sure an STM32F4 can actually do that faster with proper DMA and an FPU!

Navigation

[0] Message Index

[#] Next page

There was an error while thanking
Thanking...
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod