Author Topic: Making a sound camera (Read 10093 times)

MasterT · « **Reply #25 on:** May 01, 2018, 11:40:13 pm »

Quote from: daslolo on May 01, 2018, 11:03:03 pm

Yes that one. They all use logarithmic spiral. Anyone knows why?

@MasterT good thing the complexity reinforce my decision of going all analog.
By the way, speaking of ADC, are there ADC out there that do mass conversion all at once? Then I wouldn't have to spend time in the mCU and counter-offset the signals.

Anyway I got the FFT running, and having two cores is sweet, so I might as well use programming to make this thing.

Logarithmic spiral is to impress potential customer, that sound camera is indeed very complex and consequently costly feature. Btw, I should say that I did a sound localizer/ camera twice in the recent past, one with arduino Leonardo and arduino DUE.
You don't have to spend 120 usec or so to wait an adc conversion complete. There are two ways to save time. On the 8-bits atmega controllers you activate adc conv. complete interrupt, and go into interrupt subroutine when sample is ready. It's about 6-10 usec, compare to 120 usec if in waiting state. I'm talking atmega328/atmega32u4 etc.
Arduino DUE has DMA, so you go into interrupt subfunction only once when complete data array is ready, and sampling 1 msps is quite easy to do w/o CPU involvement.

Nice you have fft running, now you should check the timing, if you can run fft on stream of data (4 mics - 10k *4 = 40 ksps or so). There are some optimization may be required. Arduino DUE for example iis capable to process data via fft with 250-300 ksps, and optimization is not requred even to HI-FI 48kHz *4 =< 200 ksps sound. But I did optimize a processing on arduino Leonardo, interleaving processing for left /right than top/ bottom mics, as Leonardo is good for about 20 ksps.

Sparker · « **Reply #26 on:** May 01, 2018, 11:59:39 pm »

Quote from: MasterT on May 01, 2018, 07:10:31 pm

Cross correlation is the same thing as DFT. Has very little or zero practical value, since FFT about thousands time faster.

I think we can indeed get the cross correlation of two signals from their DFTs and thus save time, you are right. What I mean is that getting the cross correlation by any means should be the main method here, because it can directly tell us the time difference between two signals. Or not? If not then I don't understand how the FFT approach should work.

By making clipping you are essentially making non-linear distortion to your signal. Do an experiment: observe the spectrum of one sinewave and observe the spectrum of a clipped sinewave. Now do the same thing with two sinewaves. You should notice that peaks with frequencies w1 and w2 have multiplied, and you have peaks at frequencies like w1+w2, 2*w1+w2 and so on.

MasterT · « **Reply #27 on:** May 02, 2018, 01:27:35 am »

Quote from: Sparker on May 01, 2018, 11:59:39 pm

Quote from: MasterT on May 01, 2018, 07:10:31 pm
Cross correlation is the same thing as DFT. Has very little or zero practical value, since FFT about thousands time faster.
I think we can indeed get the cross correlation of two signals from their DFTs and thus save time, you are right. What I mean is that getting the cross correlation by any means should be the main method here, because it can directly tell us the time difference between two signals. Or not? If not then I don't understand how the FFT approach should work.

Right, cross-correlation outputs the time difference. FFT outputs the phase difference, but since we know the frequency & speed of sound in the air, it's a piece of cake to translate phase to time and back, whatever is more practical for final result. In direction finding an Angle to sound source is what has to be determined, though time difference or phase must be translated to angle, based on distance between two mic's.
Second reason against cross-correlation is that it's run summary report for all bandwidth. If there are two and more sound sources ( always in real world) CC gives meaningless data. Same time FFT could easily distinguish 100's sound sources , as long as they don't overlap in bandwidth, or not completely overlap. Cutting 78 Hz slice out of 0-10 kHz band , sorting out sound patterns each of noise sources you could identify narrow band that is specific to each of them, and find a right direction..
Same apply to standing waves, reverberation, echoes especially in indoor environment, only FFT is capable to sort out and throw away a part of data pull, that is most distorted/ corrupted, and still resolve a trigonometry equation..

Marco · « **Reply #28 on:** May 02, 2018, 01:34:47 am »

Quote from: MasterT on May 01, 2018, 07:10:31 pm

Cross correlation is the same thing as DFT. Has very little or zero practical value, since FFT about thousands time faster.

FFT is a form of DFT, discrete cross correlation can be accelerated with FFT. Preferably a real-FFT so you don't have to combine signals for efficiency.

MasterT · « **Reply #29 on:** May 02, 2018, 03:52:53 am »

Quote from: Marco on May 02, 2018, 01:34:47 am

Preferably a real-FFT so you don't have to combine signals for efficiency.

What does that mean, real-FFT? Is there non-real part as well exist? And "combine signal" is a new mathematics term? Sorry for my limited vocabulary, I know + - * /, never heard combine.

daslolo · « **Reply #30 on:** May 02, 2018, 04:09:10 am »

Quote from: MasterT on May 01, 2018, 11:40:13 pm

Quote from: daslolo on May 01, 2018, 11:03:03 pm
Yes that one. They all use logarithmic spiral. Anyone knows why?

@MasterT good thing the complexity reinforce my decision of going all analog.
By the way, speaking of ADC, are there ADC out there that do mass conversion all at once? Then I wouldn't have to spend time in the mCU and counter-offset the signals.

Anyway I got the FFT running, and having two cores is sweet, so I might as well use programming to make this thing.
Logarithmic spiral is to impress potential customer, that sound camera is indeed very complex and consequently costly feature. Btw, I should say that I did a sound localizer/ camera twice in the recent past, one with arduino Leonardo and arduino DUE.
You don't have to spend 120 usec or so to wait an adc conversion complete. There are two ways to save time. On the 8-bits atmega controllers you activate adc conv. complete interrupt, and go into interrupt subroutine when sample is ready. It's about 6-10 usec, compare to 120 usec if in waiting state. I'm talking atmega328/atmega32u4 etc.
Arduino DUE has DMA, so you go into interrupt subfunction only once when complete data array is ready, and sampling 1 msps is quite easy to do w/o CPU involvement.

Nice you have fft running, now you should check the timing, if you can run fft on stream of data (4 mics - 10k *4 = 40 ksps or so). There are some optimization may be required. Arduino DUE for example iis capable to process data via fft with 250-300 ksps, and optimization is not requred even to HI-FI 48kHz *4 =< 200 ksps sound. But I did optimize a processing on arduino Leonardo, interleaving processing for left /right than top/ bottom mics, as Leonardo is good for about 20 ksps.

The Spiral does look scienty and the housing looks desirably expensive.
I'd love to see your sound camera in action, do you have a link of your project? I didn't know the arduino boards could push 1msps! is this per channel? Through the arduino dev IDE or are you poking at registers directly?

I don't know how to calculate sps but maybe you're talking about the sampling frequency. In that case I'm sampling at 10khz.
As for the FFT timing, on the esp32, one 512 bin FFT takes 14ms, the capture takes 10ms @ 10khz, 40Khz is the sampling limit using a delayed loop of analogRead.
Someone bypassed the register safety lock and managed 25 MHZ output so fast input sampling must be possible but I must say I didn't understand it yet how it's done and what the impact on using a second core for capturing (what I'm doing) is.
I haven't used DMA but I read that the esp32 has DMA as well, I don't know if this is the same access as your DUE though.

hamster_nz · « **Reply #31 on:** May 02, 2018, 04:52:36 am »

The spiral looks like the one used as the core of the Square Kilometer Array.

https://www.skatelescope.org/layout/

Quote

The spiral layout design has been chosen after detailed study by scientists into how best optimise the configuration to get the best possible results.

This spiral configuration gives many different lengths (baselines) and angles between antennas resulting in very high-resolution imaging capability.

The perfect layout would be a random arrangement that maximises the number of different baselines and angles between antennas. However, the practicalities of construction as well as linking the antennas together with cables mean that the spiral configuration is the best trade off between image resolution and cost.

mikeselectricstuff · « **Reply #32 on:** May 02, 2018, 07:05:31 am »

Someone posted here a while ago that they had done something similar They used digital-output MEMS microphone modules and an FPGA, to make a pretty cheap system
I don't recall how far they got in producing visual output.

mikeselectricstuff · « **Reply #33 on:** May 02, 2018, 07:17:30 am »

I think this was it
https://hackaday.com/2016/07/01/1024-pixel-sound-camera-treats-eyes-to-real-time-audio/

mikeselectricstuff · « **Reply #34 on:** May 02, 2018, 07:25:06 am »

Quote from: daslolo on May 01, 2018, 11:03:03 pm

Yes that one. They all use logarithmic spiral. Anyone knows why?

My guess would be something to do with standing waves and/or having a large number of different mic-to-mic distances while maintaining symmetry of sensitivity

Digital-output MEMS mics with I2S or pulse-modulation schemes are very cheap, so it may make sense to trade off number of mics versus processing complexity

Marco · « **Reply #35 on:** May 02, 2018, 07:37:45 am »

Quote from: MasterT on May 02, 2018, 03:52:53 am

What does that mean, real-FFT? Is there non-real part as well exist? And "combine signal" is a new mathematics term? Sorry for my limited vocabulary, I know + - * /, never heard combine.

Most FFT implementations are complex<->complex, what we are generally interested in in DSP is a real->complex FFT and a complex->real iFFT. You can use a complex FFT to do two real-FFTs, but it's a headache.

MasterT · « **Reply #36 on:** May 02, 2018, 12:48:50 pm »

Quote from: daslolo on May 02, 2018, 04:09:10 am

I'd love to see your sound camera in action, do you have a link of your project? I didn't know the arduino boards could push 1msps! is this per channel? Through the arduino dev IDE or are you poking at registers directly?

I don't know how to calculate sps but maybe you're talking about the sampling frequency. In that case I'm sampling at 10khz.
As for the FFT timing, on the esp32, one 512 bin FFT takes 14ms, the capture takes 10ms @ 10khz, 40Khz is the sampling limit using a delayed loop of analogRead.
Someone bypassed the register safety lock and managed 25 MHZ output so fast input sampling must be possible but I must say I didn't understand it yet how it's done and what the impact on using a second core for capturing (what I'm doing) is.
I haven't used DMA but I read that the esp32 has DMA as well, I don't know if this is the same access as your DUE though.

I don't have a video, it's lost. Visualization was done on android tablet, I used BT to transfer processed/ filtered and sorted data. 1 mega sample per seconds via direct registers programming, arduino IDE is sloppy. One adc, 1 msps per all channels.

Stream sample rate only makes sense with real-time data processing, when adc conversion is going in background continuously, using dma or interrupts. FFT should be computed faster than sampling, 14 msec with 512 fft means you theoretically could get 36571 Hz sampling, not counting overhead for data management.

Quote from: Marco on May 02, 2018, 07:37:45 am

Quote from: MasterT on May 02, 2018, 03:52:53 am
What does that mean, real-FFT? Is there non-real part as well exist? And "combine signal" is a new mathematics term? Sorry for my limited vocabulary, I know + - * /, never heard combine.
Most FFT implementations are complex<->complex, what we are generally interested in in DSP is a real->complex FFT and a complex->real iFFT. You can use a complex FFT to do two real-FFTs, but it's a headache.

It's an old myth, that we could save time not doing math on the empty imaginary part of the input data. I did some research, and it's turns out that cpu clock savings comes with only half frequency resolution, so all this theory to define real->FFT is complete BS. There are many proved optimization technics, that doesn't sacrifice frequency resolution, like using higher Radix or Split-radix, but all depends on specific uCPU and it's instructions set, availability MAC, MULT vs ADD performance etc.

Marco · « **Reply #37 on:** May 02, 2018, 01:25:34 pm »

Quote from: MasterT on May 02, 2018, 12:48:50 pm

It's an old myth

No, multiplying by 0 is always a waste of time.

Quote

and it's turns out that cpu clock savings comes with only half frequency resolution

If you fill the imaginary input of a complex FFT with 0s, the output is symmetric ... that the index for the frequency bins counts up higher is irrelevant, half the information is completely redundant.

A real-FFT goes up to Nyquist, just like the real DFT, that's obviously sufficient.

MasterT · « **Reply #38 on:** May 02, 2018, 01:35:39 pm »

Quote from: Marco on May 02, 2018, 01:25:34 pm

Quote from: MasterT on May 02, 2018, 12:48:50 pm
It's an old myth
No, multiplying by 0 is always a waste of time.
Quote
and it's turns out that cpu clock savings comes with only half frequency resolution
If you fill the imaginary input of a complex FFT with 0s, the output is symmetric ... that the index for the frequency bins counts up higher is irrelevant, half the information is completely redundant.

.
Show your code, than talk.

Marco · « **Reply #39 on:** May 02, 2018, 02:16:56 pm »

ogden · « **Reply #40 on:** May 02, 2018, 03:07:21 pm »

Quote from: MasterT on May 02, 2018, 01:35:39 pm

Quote from: Marco on May 02, 2018, 01:25:34 pm
If you fill the imaginary input of a complex FFT with 0s, the output is symmetric ... that the index for the frequency bins counts up higher is irrelevant, half the information is completely redundant.
.
Show your code, than talk.

Code can be 3rd party as well

https://www.keil.com/pack/doc/CMSIS/DSP/html/group__RealFFT.html

Most likely you are interested in just DSP package:

https://github.com/ARM-software/CMSIS_5/releases/tag/5.3.0

Sparker · « **Reply #41 on:** May 02, 2018, 04:00:03 pm »

Quote from: MasterT on May 02, 2018, 01:27:35 am

Right, cross-correlation outputs the time difference. FFT outputs the phase difference, but since we know the frequency & speed of sound in the air, it's a piece of cake to translate phase to time and back, whatever is more practical for final result. In direction finding an Angle to sound source is what has to be determined, though time difference or phase must be translated to angle, based on distance between two mic's.
Second reason against cross-correlation is that it's run summary report for all bandwidth. If there are two and more sound sources ( always in real world) CC gives meaningless data. Same time FFT could easily distinguish 100's sound sources , as long as they don't overlap in bandwidth, or not completely overlap. Cutting 78 Hz slice out of 0-10 kHz band , sorting out sound patterns each of noise sources you could identify narrow band that is specific to each of them, and find a right direction..
Same apply to standing waves, reverberation, echoes especially in indoor environment, only FFT is capable to sort out and throw away a part of data pull, that is most distorted/ corrupted, and still resolve a trigonometry equation..

I've just tested the cross-correlation in matlab and it could actually work nice with multiple noise-like signals with narrow self-correlation and low cross-correlations. But the method works bad with human speech because it's not as random as I initially assumed.

DFT is the choice here indeed.
What I don't understand is, if both sound sources occupy exactly the same frequency, like the car control panel in the video in this thread, how can the system differentiate them?

Marco · « **Reply #42 on:** May 02, 2018, 04:58:36 pm »

It's all fine and well to say "use the DFT", but it doesn't really mean anything. A bunch of phase differences between microphones at given frequencies aren't that easy to convert to delay, the phases are modulo 2pi after all.

If you want to pick out part of the spectrum, just multiply the FFT transformed microphone signals with a Fourier domain representation of a minimum phase bandpass filter, before multiplying them with each other and doing the inverse FFT. It's still cross correlation, just of band limited versions of the microphone signals.

PS. don't forget to zero pad the microphone signals before doing the FFT.

ogden · « **Reply #43 on:** May 02, 2018, 05:49:36 pm »

Quote from: MasterT on May 02, 2018, 05:29:01 pm

It's crappy Radix-2, that is good for undeveloped third world tribes numba-umba.

Here we go

Quote

And for whoever picking any BS posted on wiki pages, and than posting on this thread, that was started by OP who acknowledged his lack of software skills in first message. I don't see a point to continue this dispute here, should we start another fft related thread?.

Yes, please. Share your wizdom. I am especially interested if discussion can result in faster and/or smaller Cortex-M optimized complex FFT code than that in CMSIS DSP lib.

MasterT · « **Reply #44 on:** May 02, 2018, 10:16:16 pm »

https://www.eevblog.com/forum/microcontrollers/fft-processing-using-ucpu/

daslolo · « **Reply #45 on:** May 04, 2018, 12:05:38 am »

Quote from: hamster_nz on May 02, 2018, 04:52:36 am

The spiral looks like the one used as the core of the Square Kilometer Array.

https://www.skatelescope.org/layout/

Quote
The spiral layout design has been chosen after detailed study by scientists into how best optimise the configuration to get the best possible results.

This spiral configuration gives many different lengths (baselines) and angles between antennas resulting in very high-resolution imaging capability.

The perfect layout would be a random arrangement that maximises the number of different baselines and angles between antennas. However, the practicalities of construction as well as linking the antennas together with cables mean that the spiral configuration is the best trade off between image resolution and cost.

OK good to see there is a very good reason. Reminds me of what I read about Viktor Schauberger years ago. I wonder what else we can lift from his eminent work on water.

Quote from: mikeselectricstuff on May 02, 2018, 07:25:06 am

Quote from: daslolo on May 01, 2018, 11:03:03 pm
Yes that one. They all use logarithmic spiral. Anyone knows why?
My guess would be something to do with standing waves and/or having a large number of different mic-to-mic distances while maintaining symmetry of sensitivity

Digital-output MEMS mics with I2S or pulse-modulation schemes are very cheap, so it may make sense to trade off number of mics versus processing complexity

What do you mean it has to do with standing wave?
Also can you explain trade off number of mics vs processing complexity? Seems to me that more mic = more FFT to churn

You guys lost me at radix until I saw its synonimous with "base", I'll ask why that matters in your other thread, it's interesting.

Can anyone give me an explanation of why that jambalaya with FFT will give me inter-mic delays? I understand that frequency domain graph will give me a sort of instant signature of the sound but I don't understand how that'll detect that one sound arrived a few ms on this mic vs that mic...

ogden · « **Reply #46 on:** May 04, 2018, 12:33:19 am »

Quote from: daslolo on May 04, 2018, 12:05:38 am

Can anyone give me an explanation of why that jambalaya with FFT will give me inter-mic delays?

Output of complex FFT is array of complex numbers which means that you get not only magnitude but phase information for each frequency bin of FFT as well. Using simple trigonometry you can calculate phase angle difference between bins of two FFT's. Knowing frequency of the bin and phase angle difference, you calculate signal delay time. This of course is not straightforward operation because at high frequencies phase angle difference most likely will exceed 360 degrees

[edit] Real FFT is just spectrum analyzer. Complex FFT is more than that.

Quote

I understand that frequency domain graph will give me a sort of instant signature of the sound but I don't understand how that'll detect that one sound arrived a few ms on this mic vs that mic...

During just 1 ms sound travels whooping 0.343m. Your mic array shall be gigantic to get few ms delay between mics

MasterT · « **Reply #47 on:** May 04, 2018, 01:05:24 am »

Quote from: daslolo on May 04, 2018, 12:05:38 am

1. Also can you explain trade off number of mics vs processing complexity? Seems to me that more mic = more FFT to churn

2. You guys lost me at radix until I saw its synonimous with "base", I'll ask why that matters in your other thread, it's interesting.

3. Can anyone give me an explanation of why that jambalaya with FFT will give me inter-mic delays? I understand that frequency domain graph will give me a sort of instant signature of the sound but I don't understand how that'll detect that one sound arrived a few ms on this mic vs that mic...

1. Correct. Use your analytical skills to sort out rubbish.
2 & 3. Take a crash course http://www.dspguide.com/ch8/8.htm
Equation 8.6. is the key to find a phase.

You didn't provide a link to the fft library you are using, if output data set has Re[] and Im[], than you know what to do.

daslolo · « **Reply #48 on:** May 04, 2018, 01:43:52 am »

Alright I returned all my electronic gizmos, going back to basics, I fired up matlab and got this:

Code: [Select]

x=.5*cos(2*pi*20*t+20*pi/180) +.1*cos(2*pi*54*t+-60*pi/180);
x2=.5*cos(2*pi*30*t+20*pi/180) +.1*cos(2*pi*54*t+-50*pi/180);
X = 1/N*fftshift(fft(x,N));
X2 = 1/N*fftshift(fft(x2,N));
%phase thingy
phase=atan2(imag(X),real(X))*180/pi;
phase2=atan2(imag(X2),real(X2))*180/pi;
plot(f,phase-phase2);

which makes 2 curves x and x2 offset by phase of 10 (10 what?) and after calculating the phase and phase2 I get this funky curve when substracting both phases... I was expecting to get that delta of 10 but maybe there is more massaging I need to do

daslolo · « **Reply #49 on:** May 04, 2018, 01:53:19 am »

Quote from: MasterT on May 04, 2018, 01:05:24 am

You didn't provide a link to the fft library you are using, if output data set has Re[] and Im[], than you know what to do.

https://github.com/kosme/arduinoFFT


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Making a sound camera (Read 10093 times)

Share me