Author Topic: Making a sound camera (Read 10014 times)

daslolo · « **on:** April 30, 2018, 11:27:42 pm »

Hello,
Wouldn't it be nice to see sound, like this?
The common way to do that is with a FPGA pulling FFT of each microphone and calculating some phase delta.
Is it possible to do that without resorting to programming, maybe even using analog components?

In other word:
M-M-M
| | |
M-M -M
| | |
M-M-M
for each M determine the phase shift with the neighbor M of the frequency with the highest amplitude
light up the LED on the other side of the PCB for each M with (phase shift if it's positive, meaning the sound is ahead of the neighbor mics) * amplitude

EDIT: changed the title, since we're still figuring out how to make this thing.

Sparker · « **Reply #1 on:** April 30, 2018, 11:44:34 pm »

The dumbest solution to this problem coming to my mind is:
If you have a sinewave with frequency w and a delayed sinewave with same frequency phi, you multiply them and:
cos(w*t)*cos(w*t + phi) = 0.5*cos(2*w*t + phi) + 0.5*cos(phi)
then you can low-pass filter it to get rid of the high frequency part, and the low-frequency part depends only on to the phase difference of the sinewaves

.
Now here comes a problem, for different frequencies the same phase shift means different distance traveled: delta-phi = 2*pi*distance/wavelength, so you must have some way to differentiate different frequencies. Really this thing is just asking for DSP and FFT.

daslolo · « **Reply #2 on:** May 01, 2018, 01:17:49 am »

I like your solution, it sings.
I'd really like to not use chips that I have to program so maybe have each microphone go through N filters that barely overlap before going to their own phase detectors.
M-F(n)-PHD(n) \
-F(n+1)-PHD(n+1) - combine to form RGB - LED
-F(n+2)-PHD(n+2) /
....

MasterT · « **Reply #3 on:** May 01, 2018, 01:22:35 am »

Quote from: daslolo on April 30, 2018, 11:27:42 pm

M-M-M
for each M determine the phase shift with the neighbor M of the frequency with the highest amplitude

The problem that, each M has it's own highest magnitude frequency. FFT is the only solution, not necessarily FPGA - small uCPU board like stm32 , sam3x or atmega328 would be sufficient

daslolo · « **Reply #4 on:** May 01, 2018, 01:33:51 am »

Quote from: MasterT on May 01, 2018, 01:22:35 am

Quote from: daslolo on April 30, 2018, 11:27:42 pm
M-M-M
for each M determine the phase shift with the neighbor M of the frequency with the highest amplitude
The problem that, each M has it's own highest magnitude frequency. FFT is the only solution, not necessarily FPGA - small uCPU board like stm32 , sam3x or atmega328 would be sufficient

In my experience with the esp32 ADC conversion takes time, 1uS on the spec sheet, way more if accessed via Arduino lib (1K sample take 10 ms!!!) and I think (haven't figured out how to test that) that if I ADC one pin, then another one, the other one will be sampled 1uS after the first one, so there will be already phase shift caused by sampling.
Now I may be wrong and maybe there is a way to freeze all the ADC buckets at the same time.

Sparker · « **Reply #5 on:** May 01, 2018, 01:46:35 am »

You probably need N ADCs or ADCs that can do simultaneous sampling from many channels and then give you data back in serial. They must be clocked simultaneously, but when the conversion is over you can use whatever method to read data from them before the next conversion has started. Maybe their serial interfaces can be connected in serial? Maybe they have parallel interfaces and you can just connect them in parallel to the reading port of your MCU and use chip select of specific ADC to read the data from it.

So many possibilities and many kinds of ADCs.
If you care only about 0...10 kHz frequency band, you can sample at about 20 kHz, which is 50 us between samples. Probably an AVR can handle this data rate easily.
Just store this data into a buffer, as many samples as you can, then you can do whatever you like with it. I'd offload it to matlab/octave and try to process it there first.

MasterT · « **Reply #6 on:** May 01, 2018, 02:02:19 am »

Quote from: daslolo on May 01, 2018, 01:33:51 am

if I ADC one pin, then another one, the other one will be sampled 1uS after the first one, so there will be already phase shift caused by sampling.
Now I may be wrong and maybe there is a way to freeze all the ADC buckets at the same time.

You don't need sampling to be synchronous. Sure, with one adc there will be phase offset, but it's a CONSTANT defined by sampling rate. Just subtract it out of the phase difference.

daslolo · « **Reply #7 on:** May 01, 2018, 02:10:14 am »

Quote from: MasterT on May 01, 2018, 02:02:19 am

Quote from: daslolo on May 01, 2018, 01:33:51 am
if I ADC one pin, then another one, the other one will be sampled 1uS after the first one, so there will be already phase shift caused by sampling.
Now I may be wrong and maybe there is a way to freeze all the ADC buckets at the same time.
You don't need sampling to be synchronous. Sure, with one adc there will be phase offset, but it's a CONSTANT defined by sampling rate. Just subtract it out of the phase difference.

This dawned on me was I was driving to buy chocolate!

How do I evaluate phase shift ?

Sparker · « **Reply #8 on:** May 01, 2018, 02:23:37 am »

Indeed you don't need to sample synchonously.

It just must be periodic sampling, obviously.
For phase shift check the "Shift in time" DFT property: https://en.wikipedia.org/wiki/Discrete-time_Fourier_transform
So at the DFT output you'll have phases of frequency bins increased proportionally to the frequency of current bin and proportionally to the time shift.

daslolo · « **Reply #9 on:** May 01, 2018, 02:41:38 am »

Is the phase of an FFT the imaginary component?
Could using N microphones that output PWM instead of analog help lift some load off the uC?

Sparker · « **Reply #10 on:** May 01, 2018, 03:14:22 am »

No, phase is the angle between the vector and the real axis, like the phase used in a complex amplitude or a complex impedance.

Quote

Could using N microphones that output PWM instead of analog help lift some load off the uC?

What do you mean?

Also I don't understand how the whole signal processing from NI works. Maybe someone could explain?

hamster_nz · « **Reply #11 on:** May 01, 2018, 07:22:28 am »

Are you sure you are talking phase delta? That would be different for different frequencies....

Don't you just want delay, which will be constant for all frequencies? (The ~ 3 ms per meter of difference in distance travelled)

Or am I confused? ( I often am)

daslolo · « **Reply #12 on:** May 01, 2018, 07:45:10 am »

Quote from: hamster_nz on May 01, 2018, 07:22:28 am

Are you sure you are talking phase delta? That would be different for different frequencies....

Don't you just want delay, which will be constant for all frequencies? (The ~ 3 ms per meter of difference in distance travelled)

Or am I confused? ( I often am)

Phase sounds cool but maybe you are right.
The end goal is to show a blob at the mic that gets hit by the sound first on the array.

hamster_nz · « **Reply #13 on:** May 01, 2018, 08:44:29 am »

Hum. First guess.

Look for a distictive sound (e.g sudden loudness) on one center channel.

Window it (fade either end) so you get the pattern you are looking for. Make the window smaller than the FFT size (e.g. half the FFT size).

FFT it, so you get the complex spectrum signature you are looking for.

Take all the channels. FFT them to get a frequency specturm for each channel.

Multiply each by the signature's spectrum.

Inverse FFT each of the resulting spectums. That will give you a time series of how well the signal matches the signature pattern. You should have a spike in each channel (and maybe echos).

Identify the principal spike in each of the channel. Use the relative delay to triangulate the source.

(I used this sort of technique to find direct synthesis spread spectrum signals once)

Sparker · « **Reply #14 on:** May 01, 2018, 12:58:10 pm »

I guess that since we want to measure time delays between channels, the cross-correlation function fits more this need.
Imagine you have two mics, both hit by the same signal with unknown delay. You record the signals and cross-correlate them. The peak at the cross-correlation function output will be at the time which equals the time offset

.
Now add a second dimension orthogonal to the first one and multiply the two cross-correlation function outputs by each other(but one is horizontal, the other is vertical) and you will have a 2D image with a peak rotated towards the sound source. Pass it though some 2D filter (like the gaussian smoothing filter) and, I guess, you should get the same kind of result as the guys in the video.

Probably three mics will be sufficient for that, one in the middle, the other one X centimeters above it, third one X centimeters to the right of it.

daslolo · « **Reply #15 on:** May 01, 2018, 06:18:08 pm »

@Sparker and @hamster_nz you're both describing the same thing in your own words. I have no idea to do that at the moment but it looks like FFT is the key anyway. The good thing is I don't need much visual resolution in term of displaying the frequencies so a 128 bin FFT should be enough. And this is very fast @3ms on an esp32.

And earlier I was thinking of using i2s mems to bypass ADC conversion time ... it seemed like a good idea but how long will transfering the i2s stream take? I think it'll take long than the instant voltage read and it equates to moving the time delay from conversion to data transfer. Maybe someone can confirm?

MasterT · « **Reply #16 on:** May 01, 2018, 07:10:31 pm »

Don't know about esp32, but as any digital bus I2C trasfer time is defined by clock rate * number of bits. I2C standard AFAIK supports 100k, 400k, and 3.4M clock. See what is the clock of your esp32 driver. For myself, I usually measure arduino speed performance using any spare digital pin and a scope, setting high/low at the start and the end of time critical function.
millis() also does same thing, in software.
Regarding size of the fft, depends on lower frequency range & bandwidth. For example, 10 k sampling and 128 bins provides just 10000/128 = 78.125 Hz freq. resolution. It means, you can't measure below this value, and what is more important, error 100% at 78 Hz, and 10% or so (not sure if it's linear, but you get an idea, and do math work to verify ) at 780 Hz, that may not be acceptable.

Cross correlation is the same thing as DFT. Has very little or zero practical value, since FFT about thousands time faster.

Marco · « **Reply #17 on:** May 01, 2018, 08:05:35 pm »

Quote from: daslolo on April 30, 2018, 11:27:42 pm

Is it possible to do that without resorting to programming, maybe even using analog components?

If you limit the frequency of the signals and then clip them, you could probably do a phase detector a 4046 ... as long as the delay is smaller than the inverse of the highest frequency it can determine the delay. That would only work in a tiny cone though. Doing more than that discretely would quickly get very hard AFAICS.

ogden · « **Reply #18 on:** May 01, 2018, 08:37:48 pm »

Quote from: MasterT on May 01, 2018, 07:10:31 pm

Don't know about esp32, but as any digital bus I2C trasfer time is defined by clock rate * number of bits. I2C standard AFAIK supports 100k, 400k, and 3.4M clock.

I am afraid that you confused I2S with I2C which is totally different animal:

Quote from: daslolo on May 01, 2018, 06:18:08 pm

And earlier I was thinking of using i2s mems to bypass ADC conversion time ... it seemed like a good idea but how long will transfering the i2s stream take?

Any ADC will delay signal for some time, so it does not matter - it is i2s ADC or not. Mems microphones with i2s will be fine indeed.

ogden · « **Reply #19 on:** May 01, 2018, 08:49:23 pm »

Quote from: Marco on May 01, 2018, 08:05:35 pm

Quote from: daslolo on April 30, 2018, 11:27:42 pm
Is it possible to do that without resorting to programming, maybe even using analog components?
If you limit the frequency of the signals and then clip them, you could probably do a phase detector a 4046 ... as long as the delay is smaller than the inverse of the highest frequency it can determine the delay. That would only work in a tiny cone though. Doing more than that discretely would quickly get very hard AFAICS.

Such clipping approach would work only in textbook example for single signal source of clean sine tone, not real world application where various complex sounds are coming from multiple sources. This application indeed needs DSP processing, thus programming.

MasterT · « **Reply #20 on:** May 01, 2018, 09:22:52 pm »

Quote from: ogden on May 01, 2018, 08:37:48 pm

Quote from: MasterT on May 01, 2018, 07:10:31 pm
Don't know about esp32, but as any digital bus I2C trasfer time is defined by clock rate * number of bits. I2C standard AFAIK supports 100k, 400k, and 3.4M clock.

I am afraid that you confused I2S with I2C which is totally different animal:

Ooops, mea culpa, misreading. Didn't know esp32 could support I2S. I interfaced arduino DUE to Wm8731 over I2S /DMA, but it's for 1 CODEC - 2 channels In /Out only. Though bringing I2S into picture would make project much harder to build /program.
Here is a piece of code, to give an overview of complexity:

Code: [Select]

void DMAC_Handler(void)
{
  uint32_t dma_status;

//  digitalWrite( test_pin, HIGH);
  dma_status = DMAC->DMAC_EBCISR;
  // BUG : ISR = if (dma_status & (DMAC_EBCIER_CBTC0 << SSC_DMAC_RX_CH))
  if (dma_status & (DMAC_EBCISR_BTC0 << SSC_DMAC_RX_CH)) {
    flag_adcv = 1; 
    }

  if (dma_status & (DMAC_EBCISR_BTC0 << SSC_DMAC_TX_CH)) {
    flag_dacv = 1; 
    }
  if(flag_adcv && (!flag_dacv))  adcdac_first = 1;  
  if((!flag_adcv) && flag_dacv)  adcdac_first = 2;  
//  digitalWrite( test_pin,  LOW);
}

void ssc_dma_cfg()
{
  desc[0].ul_source_addr      = (uint32_t)(&SSC->SSC_RHR);
  desc[0].ul_destination_addr = (uint32_t) inp[0];
  desc[0].ul_ctrlA = DMAC_CTRLA_BTSIZE(INP_BUFF) |/* Set Buffer Transfer Size */
                     DMAC_CTRLA_SRC_WIDTH_WORD |  /* Source transfer size is set to 32-bit width */
                     DMAC_CTRLA_DST_WIDTH_WORD;   /* Destination transfer size is set to 32-bit width */

  desc[0].ul_ctrlB = DMAC_CTRLB_SRC_DSCR_FETCH_DISABLE |
                     DMAC_CTRLB_DST_DSCR_FETCH_FROM_MEM |           
                     DMAC_CTRLB_FC_PER2MEM_DMA_FC |                  
                     DMAC_CTRLB_SRC_INCR_FIXED |              
                     DMAC_CTRLB_DST_INCR_INCREMENTING;       
  desc[0].ul_descriptor_addr = (uint32_t) &desc[1]; 

  desc[1].ul_source_addr      = (uint32_t)(&SSC->SSC_RHR);
  desc[1].ul_destination_addr = (uint32_t) inp[1];
  desc[1].ul_ctrlA = DMAC_CTRLA_BTSIZE(INP_BUFF) |    
                     DMAC_CTRLA_SRC_WIDTH_WORD |                  
                     DMAC_CTRLA_DST_WIDTH_WORD;                   

  desc[1].ul_ctrlB = DMAC_CTRLB_SRC_DSCR_FETCH_DISABLE | 
                     DMAC_CTRLB_DST_DSCR_FETCH_FROM_MEM |            
                     DMAC_CTRLB_FC_PER2MEM_DMA_FC |                  
                     DMAC_CTRLB_SRC_INCR_FIXED |             
                     DMAC_CTRLB_DST_INCR_INCREMENTING;           
  desc[1].ul_descriptor_addr = (uint32_t) &desc[0];       
  
  DMAC->DMAC_CH_NUM[SSC_DMAC_RX_CH].DMAC_SADDR = desc[0].ul_source_addr;
  DMAC->DMAC_CH_NUM[SSC_DMAC_RX_CH].DMAC_DADDR = desc[0].ul_destination_addr;
  DMAC->DMAC_CH_NUM[SSC_DMAC_RX_CH].DMAC_CTRLA = desc[0].ul_ctrlA;
  DMAC->DMAC_CH_NUM[SSC_DMAC_RX_CH].DMAC_CTRLB = desc[0].ul_ctrlB;
  DMAC->DMAC_CH_NUM[SSC_DMAC_RX_CH].DMAC_DSCR  = desc[0].ul_descriptor_addr;

  DMAC->DMAC_CHER = DMAC_CHER_ENA0 << SSC_DMAC_RX_CH;
  ssc_enable_rx(SSC);

// transmit
  desc[2].ul_source_addr      = (uint32_t) out[0];
  desc[2].ul_destination_addr = (uint32_t) (&SSC->SSC_THR);
  desc[2].ul_ctrlA = DMAC_CTRLA_BTSIZE(INP_BUFF) |
                     DMAC_CTRLA_SRC_WIDTH_WORD |  
                     DMAC_CTRLA_DST_WIDTH_WORD;   

  desc[2].ul_ctrlB = DMAC_CTRLB_SRC_DSCR_FETCH_FROM_MEM | 
                     DMAC_CTRLB_DST_DSCR_FETCH_DISABLE |       
                     DMAC_CTRLB_FC_MEM2PER_DMA_FC |                  
                     DMAC_CTRLB_SRC_INCR_INCREMENTING |              
                     DMAC_CTRLB_DST_INCR_FIXED;                        
  desc[2].ul_descriptor_addr = (uint32_t) &desc[3];       

  desc[3].ul_source_addr      = (uint32_t) out[1];
  desc[3].ul_destination_addr = (uint32_t) (&SSC->SSC_THR);
  desc[3].ul_ctrlA = DMAC_CTRLA_BTSIZE(INP_BUFF) | 
                     DMAC_CTRLA_SRC_WIDTH_WORD |               
                     DMAC_CTRLA_DST_WIDTH_WORD;                

  desc[3].ul_ctrlB = DMAC_CTRLB_SRC_DSCR_FETCH_FROM_MEM | 
                     DMAC_CTRLB_DST_DSCR_FETCH_DISABLE |       
                     DMAC_CTRLB_FC_MEM2PER_DMA_FC |                  
                     DMAC_CTRLB_SRC_INCR_INCREMENTING |              
                     DMAC_CTRLB_DST_INCR_FIXED;                        
  desc[3].ul_descriptor_addr = (uint32_t) &desc[2];       

  DMAC->DMAC_CH_NUM[SSC_DMAC_TX_CH].DMAC_SADDR = desc[2].ul_source_addr;
  DMAC->DMAC_CH_NUM[SSC_DMAC_TX_CH].DMAC_DADDR = desc[2].ul_destination_addr;
  DMAC->DMAC_CH_NUM[SSC_DMAC_TX_CH].DMAC_CTRLA = desc[2].ul_ctrlA;
  DMAC->DMAC_CH_NUM[SSC_DMAC_TX_CH].DMAC_CTRLB = desc[2].ul_ctrlB;
  DMAC->DMAC_CH_NUM[SSC_DMAC_TX_CH].DMAC_DSCR  = desc[2].ul_descriptor_addr;

  DMAC->DMAC_CHER = DMAC_CHER_ENA0 << SSC_DMAC_TX_CH;
  ssc_enable_tx(SSC);
}

void init_dma() {
  uint32_t ul_cfg;

  pmc_enable_periph_clk(ID_DMAC);
  DMAC->DMAC_EN &= (~DMAC_EN_ENABLE);
  DMAC->DMAC_GCFG = (DMAC->DMAC_GCFG & (~DMAC_GCFG_ARB_CFG)) | DMAC_GCFG_ARB_CFG_ROUND_ROBIN;
  DMAC->DMAC_EN = DMAC_EN_ENABLE;

  ul_cfg = 0;
  ul_cfg = DMAC_CFG_SRC_PER(SSC_DMAC_RX_ID) |
           DMAC_CFG_SRC_H2SEL |
           DMAC_CFG_SOD_DISABLE | //SOD: Stop On Done
           DMAC_CFG_FIFOCFG_ALAP_CFG;

  DMAC->DMAC_CH_NUM[SSC_DMAC_RX_CH].DMAC_CFG = ul_cfg;
  DMAC->DMAC_CHDR = DMAC_CHDR_DIS0 << SSC_DMAC_RX_CH;

//transmit
  ul_cfg = 0;
  ul_cfg = DMAC_CFG_DST_PER(SSC_DMAC_TX_ID) |
           DMAC_CFG_DST_H2SEL |
           DMAC_CFG_SOD_DISABLE | //SOD: Stop On Done
           DMAC_CFG_FIFOCFG_ALAP_CFG;

  DMAC->DMAC_CH_NUM[SSC_DMAC_TX_CH].DMAC_CFG = ul_cfg;
  DMAC->DMAC_CHDR = DMAC_CHDR_DIS0 << SSC_DMAC_TX_CH;
//

  NVIC_EnableIRQ(DMAC_IRQn);

  DMAC->DMAC_EBCIER = DMAC_EBCIER_BTC0 << SSC_DMAC_RX_CH;

  DMAC->DMAC_EBCIER = DMAC_EBCIER_BTC0 << SSC_DMAC_TX_CH;

  DMAC->DMAC_EBCISR;
}

void init_ssc() {
  clock_opt_t        rx_clk_option;
  data_frame_opt_t   rx_data_frame_option;
  clock_opt_t        tx_clk_option;
  data_frame_opt_t   tx_data_frame_option;

  memset((uint8_t *)&rx_clk_option,        0, sizeof(clock_opt_t));
  memset((uint8_t *)&rx_data_frame_option, 0, sizeof(data_frame_opt_t));
  memset((uint8_t *)&tx_clk_option,        0, sizeof(clock_opt_t));
  memset((uint8_t *)&tx_data_frame_option, 0, sizeof(data_frame_opt_t));

  pmc_enable_periph_clk(ID_SSC);
  ssc_reset(SSC);

  rx_clk_option.ul_cks               = SSC_RCMR_CKS_RK;
  rx_clk_option.ul_cko               = SSC_RCMR_CKO_NONE;
  rx_clk_option.ul_cki               = SSC_RCMR_CKI;
  //1 = The data inputs (Data and Frame Sync signals) 
  //    are sampled on Receive Clock rising edge.  
  rx_clk_option.ul_ckg               = SSC_RCMR_CKG_NONE; // bylo 0;
  rx_clk_option.ul_start_sel         = SSC_RCMR_START_RF_RISING;
  rx_clk_option.ul_period            = 0;
  rx_clk_option.ul_sttdly            = 1;

  rx_data_frame_option.ul_datlen     = BIT_LEN_PER_CHANNEL - 1;
  rx_data_frame_option.ul_msbf       = SSC_RFMR_MSBF;
  rx_data_frame_option.ul_datnb      = 1;//stereo
  rx_data_frame_option.ul_fslen      = 0;
  rx_data_frame_option.ul_fslen_ext  = 0;
  rx_data_frame_option.ul_fsos       = SSC_RFMR_FSOS_NONE;
  rx_data_frame_option.ul_fsedge     = SSC_RFMR_FSEDGE_POSITIVE;

  ssc_set_receiver(SSC, &rx_clk_option, &rx_data_frame_option);
  ssc_disable_rx(SSC);
//  ssc_disable_interrupt(SSC, 0xFFFFFFFF);

  tx_clk_option.ul_cks               = SSC_TCMR_CKS_RK;
  tx_clk_option.ul_cko               = SSC_TCMR_CKO_NONE;
  tx_clk_option.ul_cki               = 0; //example Atmel. bylo SSC_TCMR_CKI;
  //1 = The data outputs (Data and Frame Sync signals) 
  //    are shifted out on Transmit Clock rising edge.
  tx_clk_option.ul_ckg               = SSC_TCMR_CKG_NONE; // bylo 0.
  tx_clk_option.ul_start_sel         = SSC_TCMR_START_RF_RISING;
  tx_clk_option.ul_period            = 0;
  tx_clk_option.ul_sttdly            = 1;

  tx_data_frame_option.ul_datlen     = BIT_LEN_PER_CHANNEL - 1;
  tx_data_frame_option.ul_msbf       = SSC_TFMR_MSBF;
  tx_data_frame_option.ul_datnb      = 1;
  tx_data_frame_option.ul_fslen      = 0; // :fsden=0
  tx_data_frame_option.ul_fslen_ext  = 0;
  tx_data_frame_option.ul_fsos       = SSC_TFMR_FSOS_NONE; //input-slave
  tx_data_frame_option.ul_fsedge     = SSC_TFMR_FSEDGE_POSITIVE;

  ssc_set_transmitter(SSC, &tx_clk_option, &tx_data_frame_option);
  ssc_disable_tx(SSC);
  ssc_disable_interrupt(SSC, 0xFFFFFFFF);
}

void pio_B_SSC(void) 
{
// DUE: PA15(B)-D24, PA16(B)-A0, PA14(B)-D23 = DACLRC, DACDAT, BCLK  
  PIOA->PIO_PDR   = PIO_PA14B_TK; 
  PIOA->PIO_IDR   = PIO_PA14B_TK; 
  PIOA->PIO_ABSR |= PIO_PA14B_TK; 
  
  PIOA->PIO_PDR   = PIO_PA15B_TF; 
  PIOA->PIO_IDR   = PIO_PA15B_TF; 
  PIOA->PIO_ABSR |= PIO_PA15B_TF; 

  PIOA->PIO_PDR   = PIO_PA16B_TD; 
  PIOA->PIO_IDR   = PIO_PA16B_TD; 
  PIOA->PIO_ABSR |= PIO_PA16B_TD; 
}

Marco · « **Reply #21 on:** May 01, 2018, 10:30:26 pm »

Quote from: ogden on May 01, 2018, 08:49:23 pm

Such clipping approach would work only in textbook example for single signal source of clean sine tone, not real world application where various complex sounds are coming from multiple sources. This application indeed needs DSP processing, thus programming.

He talked about using 9 LEDs ... something which only works under limited circumstances doesn't seem like a huge problem to me. This is clearly more science fair than product.

ogden · « **Reply #22 on:** May 01, 2018, 10:53:24 pm »

Quote from: Marco on May 01, 2018, 10:30:26 pm

Quote from: ogden on May 01, 2018, 08:49:23 pm
Such clipping approach would work only in textbook example for single signal source of clean sine tone, not real world application where various complex sounds are coming from multiple sources. This application indeed needs DSP processing, thus programming.
He talked about using 9 LEDs ... something which only works under limited circumstances doesn't seem like a huge problem to me. This is clearly more science fair than product.

It does not matter you use 3x3 LEDs or HD display - clipping approach will work only in theory.

Siemens LMS Sound Camera is product indeed:

daslolo · « **Reply #23 on:** May 01, 2018, 11:03:03 pm »

Yes that one. They all use logarithmic spiral. Anyone knows why?
@ogden What's the problem with the clipping approach?
@MasterT good thing the complexity reinforce my decision of going all analog.
By the way, speaking of ADC, are there ADC out there that do mass conversion all at once? Then I wouldn't have to spend time in the mCU and counter-offset the signals.

Anyway I got the FFT running, and having two cores is sweet, so I might as well use programming to make this thing.
https://github.com/laurentopia/M5-Signal-Multimeter

ogden · « **Reply #24 on:** May 01, 2018, 11:27:59 pm »

Quote from: daslolo on May 01, 2018, 11:03:03 pm

Yes that one. They all use logarithmic spiral. Anyone knows why?

IMHO reason is Marketing. Spiral just looks better than straight beams.

Quote

@ogden What's the problem with the clipping approach?

Using clipping you lose signal amplitude information. Also - even quiet hiss riding on top of sound will add so much noise and "jitter" to phase measurements that w/o averaging it will not work. If you need averaging which more or less is DSP processing - then why don't do proper DSP from very beginning, why reduce resolution to 1bit to later struggle getting it (resolution) back?

Quote

@MasterT good thing the complexity reinforce my decision of going all analog.

I would love to see working, fully analog system

Quote

By the way, speaking of ADC, are there ADC out there that do mass conversion all at once? Then I wouldn't have to spend time in the mCU and counter-offset the signals.

There are flash ADCs' which are very fast. Again - processing will add delay. Sound propagation is not instant after all. So it does not actually matter - you find sound direction with 1ms, 2ms or 10ms delay.

MasterT · « **Reply #25 on:** May 01, 2018, 11:40:13 pm »

Quote from: daslolo on May 01, 2018, 11:03:03 pm

Yes that one. They all use logarithmic spiral. Anyone knows why?

@MasterT good thing the complexity reinforce my decision of going all analog.
By the way, speaking of ADC, are there ADC out there that do mass conversion all at once? Then I wouldn't have to spend time in the mCU and counter-offset the signals.

Anyway I got the FFT running, and having two cores is sweet, so I might as well use programming to make this thing.

Logarithmic spiral is to impress potential customer, that sound camera is indeed very complex and consequently costly feature. Btw, I should say that I did a sound localizer/ camera twice in the recent past, one with arduino Leonardo and arduino DUE.
You don't have to spend 120 usec or so to wait an adc conversion complete. There are two ways to save time. On the 8-bits atmega controllers you activate adc conv. complete interrupt, and go into interrupt subroutine when sample is ready. It's about 6-10 usec, compare to 120 usec if in waiting state. I'm talking atmega328/atmega32u4 etc.
Arduino DUE has DMA, so you go into interrupt subfunction only once when complete data array is ready, and sampling 1 msps is quite easy to do w/o CPU involvement.

Nice you have fft running, now you should check the timing, if you can run fft on stream of data (4 mics - 10k *4 = 40 ksps or so). There are some optimization may be required. Arduino DUE for example iis capable to process data via fft with 250-300 ksps, and optimization is not requred even to HI-FI 48kHz *4 =< 200 ksps sound. But I did optimize a processing on arduino Leonardo, interleaving processing for left /right than top/ bottom mics, as Leonardo is good for about 20 ksps.

Sparker · « **Reply #26 on:** May 01, 2018, 11:59:39 pm »

Quote from: MasterT on May 01, 2018, 07:10:31 pm

Cross correlation is the same thing as DFT. Has very little or zero practical value, since FFT about thousands time faster.

I think we can indeed get the cross correlation of two signals from their DFTs and thus save time, you are right. What I mean is that getting the cross correlation by any means should be the main method here, because it can directly tell us the time difference between two signals. Or not? If not then I don't understand how the FFT approach should work.

By making clipping you are essentially making non-linear distortion to your signal. Do an experiment: observe the spectrum of one sinewave and observe the spectrum of a clipped sinewave. Now do the same thing with two sinewaves. You should notice that peaks with frequencies w1 and w2 have multiplied, and you have peaks at frequencies like w1+w2, 2*w1+w2 and so on.

MasterT · « **Reply #27 on:** May 02, 2018, 01:27:35 am »

Quote from: Sparker on May 01, 2018, 11:59:39 pm

Quote from: MasterT on May 01, 2018, 07:10:31 pm
Cross correlation is the same thing as DFT. Has very little or zero practical value, since FFT about thousands time faster.
I think we can indeed get the cross correlation of two signals from their DFTs and thus save time, you are right. What I mean is that getting the cross correlation by any means should be the main method here, because it can directly tell us the time difference between two signals. Or not? If not then I don't understand how the FFT approach should work.

Right, cross-correlation outputs the time difference. FFT outputs the phase difference, but since we know the frequency & speed of sound in the air, it's a piece of cake to translate phase to time and back, whatever is more practical for final result. In direction finding an Angle to sound source is what has to be determined, though time difference or phase must be translated to angle, based on distance between two mic's.
Second reason against cross-correlation is that it's run summary report for all bandwidth. If there are two and more sound sources ( always in real world) CC gives meaningless data. Same time FFT could easily distinguish 100's sound sources , as long as they don't overlap in bandwidth, or not completely overlap. Cutting 78 Hz slice out of 0-10 kHz band , sorting out sound patterns each of noise sources you could identify narrow band that is specific to each of them, and find a right direction..
Same apply to standing waves, reverberation, echoes especially in indoor environment, only FFT is capable to sort out and throw away a part of data pull, that is most distorted/ corrupted, and still resolve a trigonometry equation..

Marco · « **Reply #28 on:** May 02, 2018, 01:34:47 am »

Quote from: MasterT on May 01, 2018, 07:10:31 pm

Cross correlation is the same thing as DFT. Has very little or zero practical value, since FFT about thousands time faster.

FFT is a form of DFT, discrete cross correlation can be accelerated with FFT. Preferably a real-FFT so you don't have to combine signals for efficiency.

MasterT · « **Reply #29 on:** May 02, 2018, 03:52:53 am »

Quote from: Marco on May 02, 2018, 01:34:47 am

Preferably a real-FFT so you don't have to combine signals for efficiency.

What does that mean, real-FFT? Is there non-real part as well exist? And "combine signal" is a new mathematics term? Sorry for my limited vocabulary, I know + - * /, never heard combine.

daslolo · « **Reply #30 on:** May 02, 2018, 04:09:10 am »

Quote from: MasterT on May 01, 2018, 11:40:13 pm

Quote from: daslolo on May 01, 2018, 11:03:03 pm
Yes that one. They all use logarithmic spiral. Anyone knows why?

@MasterT good thing the complexity reinforce my decision of going all analog.
By the way, speaking of ADC, are there ADC out there that do mass conversion all at once? Then I wouldn't have to spend time in the mCU and counter-offset the signals.

Anyway I got the FFT running, and having two cores is sweet, so I might as well use programming to make this thing.
Logarithmic spiral is to impress potential customer, that sound camera is indeed very complex and consequently costly feature. Btw, I should say that I did a sound localizer/ camera twice in the recent past, one with arduino Leonardo and arduino DUE.
You don't have to spend 120 usec or so to wait an adc conversion complete. There are two ways to save time. On the 8-bits atmega controllers you activate adc conv. complete interrupt, and go into interrupt subroutine when sample is ready. It's about 6-10 usec, compare to 120 usec if in waiting state. I'm talking atmega328/atmega32u4 etc.
Arduino DUE has DMA, so you go into interrupt subfunction only once when complete data array is ready, and sampling 1 msps is quite easy to do w/o CPU involvement.

Nice you have fft running, now you should check the timing, if you can run fft on stream of data (4 mics - 10k *4 = 40 ksps or so). There are some optimization may be required. Arduino DUE for example iis capable to process data via fft with 250-300 ksps, and optimization is not requred even to HI-FI 48kHz *4 =< 200 ksps sound. But I did optimize a processing on arduino Leonardo, interleaving processing for left /right than top/ bottom mics, as Leonardo is good for about 20 ksps.

The Spiral does look scienty and the housing looks desirably expensive.
I'd love to see your sound camera in action, do you have a link of your project? I didn't know the arduino boards could push 1msps! is this per channel? Through the arduino dev IDE or are you poking at registers directly?

I don't know how to calculate sps but maybe you're talking about the sampling frequency. In that case I'm sampling at 10khz.
As for the FFT timing, on the esp32, one 512 bin FFT takes 14ms, the capture takes 10ms @ 10khz, 40Khz is the sampling limit using a delayed loop of analogRead.
Someone bypassed the register safety lock and managed 25 MHZ output so fast input sampling must be possible but I must say I didn't understand it yet how it's done and what the impact on using a second core for capturing (what I'm doing) is.
I haven't used DMA but I read that the esp32 has DMA as well, I don't know if this is the same access as your DUE though.

hamster_nz · « **Reply #31 on:** May 02, 2018, 04:52:36 am »

The spiral looks like the one used as the core of the Square Kilometer Array.

https://www.skatelescope.org/layout/

Quote

The spiral layout design has been chosen after detailed study by scientists into how best optimise the configuration to get the best possible results.

This spiral configuration gives many different lengths (baselines) and angles between antennas resulting in very high-resolution imaging capability.

The perfect layout would be a random arrangement that maximises the number of different baselines and angles between antennas. However, the practicalities of construction as well as linking the antennas together with cables mean that the spiral configuration is the best trade off between image resolution and cost.

mikeselectricstuff · « **Reply #32 on:** May 02, 2018, 07:05:31 am »

Someone posted here a while ago that they had done something similar They used digital-output MEMS microphone modules and an FPGA, to make a pretty cheap system
I don't recall how far they got in producing visual output.

mikeselectricstuff · « **Reply #33 on:** May 02, 2018, 07:17:30 am »

I think this was it
https://hackaday.com/2016/07/01/1024-pixel-sound-camera-treats-eyes-to-real-time-audio/

mikeselectricstuff · « **Reply #34 on:** May 02, 2018, 07:25:06 am »

Quote from: daslolo on May 01, 2018, 11:03:03 pm

Yes that one. They all use logarithmic spiral. Anyone knows why?

My guess would be something to do with standing waves and/or having a large number of different mic-to-mic distances while maintaining symmetry of sensitivity

Digital-output MEMS mics with I2S or pulse-modulation schemes are very cheap, so it may make sense to trade off number of mics versus processing complexity

Marco · « **Reply #35 on:** May 02, 2018, 07:37:45 am »

Quote from: MasterT on May 02, 2018, 03:52:53 am

What does that mean, real-FFT? Is there non-real part as well exist? And "combine signal" is a new mathematics term? Sorry for my limited vocabulary, I know + - * /, never heard combine.

Most FFT implementations are complex<->complex, what we are generally interested in in DSP is a real->complex FFT and a complex->real iFFT. You can use a complex FFT to do two real-FFTs, but it's a headache.

MasterT · « **Reply #36 on:** May 02, 2018, 12:48:50 pm »

Quote from: daslolo on May 02, 2018, 04:09:10 am

I'd love to see your sound camera in action, do you have a link of your project? I didn't know the arduino boards could push 1msps! is this per channel? Through the arduino dev IDE or are you poking at registers directly?

I don't know how to calculate sps but maybe you're talking about the sampling frequency. In that case I'm sampling at 10khz.
As for the FFT timing, on the esp32, one 512 bin FFT takes 14ms, the capture takes 10ms @ 10khz, 40Khz is the sampling limit using a delayed loop of analogRead.
Someone bypassed the register safety lock and managed 25 MHZ output so fast input sampling must be possible but I must say I didn't understand it yet how it's done and what the impact on using a second core for capturing (what I'm doing) is.
I haven't used DMA but I read that the esp32 has DMA as well, I don't know if this is the same access as your DUE though.

I don't have a video, it's lost. Visualization was done on android tablet, I used BT to transfer processed/ filtered and sorted data. 1 mega sample per seconds via direct registers programming, arduino IDE is sloppy. One adc, 1 msps per all channels.

Stream sample rate only makes sense with real-time data processing, when adc conversion is going in background continuously, using dma or interrupts. FFT should be computed faster than sampling, 14 msec with 512 fft means you theoretically could get 36571 Hz sampling, not counting overhead for data management.

Quote from: Marco on May 02, 2018, 07:37:45 am

Quote from: MasterT on May 02, 2018, 03:52:53 am
What does that mean, real-FFT? Is there non-real part as well exist? And "combine signal" is a new mathematics term? Sorry for my limited vocabulary, I know + - * /, never heard combine.
Most FFT implementations are complex<->complex, what we are generally interested in in DSP is a real->complex FFT and a complex->real iFFT. You can use a complex FFT to do two real-FFTs, but it's a headache.

It's an old myth, that we could save time not doing math on the empty imaginary part of the input data. I did some research, and it's turns out that cpu clock savings comes with only half frequency resolution, so all this theory to define real->FFT is complete BS. There are many proved optimization technics, that doesn't sacrifice frequency resolution, like using higher Radix or Split-radix, but all depends on specific uCPU and it's instructions set, availability MAC, MULT vs ADD performance etc.

Marco · « **Reply #37 on:** May 02, 2018, 01:25:34 pm »

Quote from: MasterT on May 02, 2018, 12:48:50 pm

It's an old myth

No, multiplying by 0 is always a waste of time.

Quote

and it's turns out that cpu clock savings comes with only half frequency resolution

If you fill the imaginary input of a complex FFT with 0s, the output is symmetric ... that the index for the frequency bins counts up higher is irrelevant, half the information is completely redundant.

A real-FFT goes up to Nyquist, just like the real DFT, that's obviously sufficient.

MasterT · « **Reply #38 on:** May 02, 2018, 01:35:39 pm »

Quote from: Marco on May 02, 2018, 01:25:34 pm

Quote from: MasterT on May 02, 2018, 12:48:50 pm
It's an old myth
No, multiplying by 0 is always a waste of time.
Quote
and it's turns out that cpu clock savings comes with only half frequency resolution
If you fill the imaginary input of a complex FFT with 0s, the output is symmetric ... that the index for the frequency bins counts up higher is irrelevant, half the information is completely redundant.

.
Show your code, than talk.

Marco · « **Reply #39 on:** May 02, 2018, 02:16:56 pm »

ogden · « **Reply #40 on:** May 02, 2018, 03:07:21 pm »

Quote from: MasterT on May 02, 2018, 01:35:39 pm

Quote from: Marco on May 02, 2018, 01:25:34 pm
If you fill the imaginary input of a complex FFT with 0s, the output is symmetric ... that the index for the frequency bins counts up higher is irrelevant, half the information is completely redundant.
.
Show your code, than talk.

Code can be 3rd party as well

https://www.keil.com/pack/doc/CMSIS/DSP/html/group__RealFFT.html

Most likely you are interested in just DSP package:

https://github.com/ARM-software/CMSIS_5/releases/tag/5.3.0

Sparker · « **Reply #41 on:** May 02, 2018, 04:00:03 pm »

Quote from: MasterT on May 02, 2018, 01:27:35 am

Right, cross-correlation outputs the time difference. FFT outputs the phase difference, but since we know the frequency & speed of sound in the air, it's a piece of cake to translate phase to time and back, whatever is more practical for final result. In direction finding an Angle to sound source is what has to be determined, though time difference or phase must be translated to angle, based on distance between two mic's.
Second reason against cross-correlation is that it's run summary report for all bandwidth. If there are two and more sound sources ( always in real world) CC gives meaningless data. Same time FFT could easily distinguish 100's sound sources , as long as they don't overlap in bandwidth, or not completely overlap. Cutting 78 Hz slice out of 0-10 kHz band , sorting out sound patterns each of noise sources you could identify narrow band that is specific to each of them, and find a right direction..
Same apply to standing waves, reverberation, echoes especially in indoor environment, only FFT is capable to sort out and throw away a part of data pull, that is most distorted/ corrupted, and still resolve a trigonometry equation..

I've just tested the cross-correlation in matlab and it could actually work nice with multiple noise-like signals with narrow self-correlation and low cross-correlations. But the method works bad with human speech because it's not as random as I initially assumed.

DFT is the choice here indeed.
What I don't understand is, if both sound sources occupy exactly the same frequency, like the car control panel in the video in this thread, how can the system differentiate them?

Marco · « **Reply #42 on:** May 02, 2018, 04:58:36 pm »

It's all fine and well to say "use the DFT", but it doesn't really mean anything. A bunch of phase differences between microphones at given frequencies aren't that easy to convert to delay, the phases are modulo 2pi after all.

If you want to pick out part of the spectrum, just multiply the FFT transformed microphone signals with a Fourier domain representation of a minimum phase bandpass filter, before multiplying them with each other and doing the inverse FFT. It's still cross correlation, just of band limited versions of the microphone signals.

PS. don't forget to zero pad the microphone signals before doing the FFT.

ogden · « **Reply #43 on:** May 02, 2018, 05:49:36 pm »

Quote from: MasterT on May 02, 2018, 05:29:01 pm

It's crappy Radix-2, that is good for undeveloped third world tribes numba-umba.

Here we go

Quote

And for whoever picking any BS posted on wiki pages, and than posting on this thread, that was started by OP who acknowledged his lack of software skills in first message. I don't see a point to continue this dispute here, should we start another fft related thread?.

Yes, please. Share your wizdom. I am especially interested if discussion can result in faster and/or smaller Cortex-M optimized complex FFT code than that in CMSIS DSP lib.

MasterT · « **Reply #44 on:** May 02, 2018, 10:16:16 pm »

https://www.eevblog.com/forum/microcontrollers/fft-processing-using-ucpu/

daslolo · « **Reply #45 on:** May 04, 2018, 12:05:38 am »

Quote from: hamster_nz on May 02, 2018, 04:52:36 am

The spiral looks like the one used as the core of the Square Kilometer Array.

https://www.skatelescope.org/layout/

Quote
The spiral layout design has been chosen after detailed study by scientists into how best optimise the configuration to get the best possible results.

This spiral configuration gives many different lengths (baselines) and angles between antennas resulting in very high-resolution imaging capability.

The perfect layout would be a random arrangement that maximises the number of different baselines and angles between antennas. However, the practicalities of construction as well as linking the antennas together with cables mean that the spiral configuration is the best trade off between image resolution and cost.

OK good to see there is a very good reason. Reminds me of what I read about Viktor Schauberger years ago. I wonder what else we can lift from his eminent work on water.

Quote from: mikeselectricstuff on May 02, 2018, 07:25:06 am

Quote from: daslolo on May 01, 2018, 11:03:03 pm
Yes that one. They all use logarithmic spiral. Anyone knows why?
My guess would be something to do with standing waves and/or having a large number of different mic-to-mic distances while maintaining symmetry of sensitivity

Digital-output MEMS mics with I2S or pulse-modulation schemes are very cheap, so it may make sense to trade off number of mics versus processing complexity

What do you mean it has to do with standing wave?
Also can you explain trade off number of mics vs processing complexity? Seems to me that more mic = more FFT to churn

You guys lost me at radix until I saw its synonimous with "base", I'll ask why that matters in your other thread, it's interesting.

Can anyone give me an explanation of why that jambalaya with FFT will give me inter-mic delays? I understand that frequency domain graph will give me a sort of instant signature of the sound but I don't understand how that'll detect that one sound arrived a few ms on this mic vs that mic...

ogden · « **Reply #46 on:** May 04, 2018, 12:33:19 am »

Quote from: daslolo on May 04, 2018, 12:05:38 am

Can anyone give me an explanation of why that jambalaya with FFT will give me inter-mic delays?

Output of complex FFT is array of complex numbers which means that you get not only magnitude but phase information for each frequency bin of FFT as well. Using simple trigonometry you can calculate phase angle difference between bins of two FFT's. Knowing frequency of the bin and phase angle difference, you calculate signal delay time. This of course is not straightforward operation because at high frequencies phase angle difference most likely will exceed 360 degrees

[edit] Real FFT is just spectrum analyzer. Complex FFT is more than that.

Quote

I understand that frequency domain graph will give me a sort of instant signature of the sound but I don't understand how that'll detect that one sound arrived a few ms on this mic vs that mic...

During just 1 ms sound travels whooping 0.343m. Your mic array shall be gigantic to get few ms delay between mics

MasterT · « **Reply #47 on:** May 04, 2018, 01:05:24 am »

Quote from: daslolo on May 04, 2018, 12:05:38 am

1. Also can you explain trade off number of mics vs processing complexity? Seems to me that more mic = more FFT to churn

2. You guys lost me at radix until I saw its synonimous with "base", I'll ask why that matters in your other thread, it's interesting.

3. Can anyone give me an explanation of why that jambalaya with FFT will give me inter-mic delays? I understand that frequency domain graph will give me a sort of instant signature of the sound but I don't understand how that'll detect that one sound arrived a few ms on this mic vs that mic...

1. Correct. Use your analytical skills to sort out rubbish.
2 & 3. Take a crash course http://www.dspguide.com/ch8/8.htm
Equation 8.6. is the key to find a phase.

You didn't provide a link to the fft library you are using, if output data set has Re[] and Im[], than you know what to do.

daslolo · « **Reply #48 on:** May 04, 2018, 01:43:52 am »

Alright I returned all my electronic gizmos, going back to basics, I fired up matlab and got this:

Code: [Select]

x=.5*cos(2*pi*20*t+20*pi/180) +.1*cos(2*pi*54*t+-60*pi/180);
x2=.5*cos(2*pi*30*t+20*pi/180) +.1*cos(2*pi*54*t+-50*pi/180);
X = 1/N*fftshift(fft(x,N));
X2 = 1/N*fftshift(fft(x2,N));
%phase thingy
phase=atan2(imag(X),real(X))*180/pi;
phase2=atan2(imag(X2),real(X2))*180/pi;
plot(f,phase-phase2);

which makes 2 curves x and x2 offset by phase of 10 (10 what?) and after calculating the phase and phase2 I get this funky curve when substracting both phases... I was expecting to get that delta of 10 but maybe there is more massaging I need to do

daslolo · « **Reply #49 on:** May 04, 2018, 01:53:19 am »

Quote from: MasterT on May 04, 2018, 01:05:24 am

You didn't provide a link to the fft library you are using, if output data set has Re[] and Im[], than you know what to do.

https://github.com/kosme/arduinoFFT

Marco · « **Reply #50 on:** May 04, 2018, 02:20:05 am »

FFT is mostly useful as a fast way to implement convolution, and thus cross correlation. I don't know why cross correlation didn't work for you in matlab ... but it should, look at the example in their documentation. As you noticed and as I said just having a bunch of different phases for the different microphones doesn't give you much useful information, the cross correlation gives you something useful in the time domain. As I also said, when using FFT to implement cross correlation you have to zero pad the signals to twice the length (because FFT based convolution is circular).

hamster_nz · « **Reply #51 on:** May 04, 2018, 03:02:43 am »

Quote from: daslolo on May 04, 2018, 12:05:38 am

Can anyone give me an explanation of why that jambalaya with FFT will give me inter-mic delays? I understand that frequency domain graph will give me a sort of instant signature of the sound but I don't understand how that'll detect that one sound arrived a few ms on this mic vs that mic...

A short answer...

If you multiply the samples in one signal by the other, then integrate over a number
of samples you can get a measure of how 'alike' to signals are.

If you want to match signals at different offsets, you have to do this at each offset.
The result is a time series that shows how much the signals match at different alignments.

Testing a 16k sample signature in a set of 16k samples at each alignment will require
268,435,456 multiply-add operations. That is a lot of math. Or O(N^2) if you like big-O complexity notation.

FFT is an effective way to do this, but with less work.

You split the signature and test data into their frequency and phase components using FFT.

You multiply the components in one signature by the ones in the other. If the frequency
is only in one of them, it will be removed (as zero times anything = zero). If the frequency is in both, they will be enhanced (and the phase changed).

You are then left with the set of amplitudes that indicate which frequencies that are common to both signals, with some phase information (remember phase information = timing information)

By running the Inverse FFT on this result, where things are 'in sync' the magnitudes of the various components add up, where they are 'out of sync' on average they cancel out.

The result is a time series that shows how much the signals match at different alignments
- exactly same numbers (except rounding!) as found by the hard method.

However, because FFT and IFFT are each an O(N*logN) complexity this is far more practical for large values of N.

So for 16k of samples, the hard way takes 268,435,456 units of work. The FFT way takes 2*16k*4.2 = 1,101,004 units of work (however the size of a unit of work may be different between the two processes, so you can't just say it will be 256 times faster).

This also brings up another point. To get good timing information from such a process, your signature must have a good mix of frequencies in it - so a sine wave will match to another sine wave of the same frequency when they are in phase, giving ambiguous results. The signature must also have structure to it ~~- one bit of pure random noise is much like any other bit of random noise~~.

This is one of the reason that dolphins use 'clicks' for echolocation and radar uses 'chirps'. This technique will not allow you to accurately pinpoint the location of that annoying hum, unless you have lots of microphones to allow you to remove the ambiguity.

daslolo · « **Reply #52 on:** May 08, 2018, 03:51:58 am »

Alright @hamster_nz, I understand that the sound needs harmonic for the FFT to do a match, fortunately, hums have harmonics, that is what bothers (something to do with the brain latching on the pattern). Thanks to your explanation I now understand that multiplying the spectrum of the various mic would give me the matching frequencies.
I don't understand how iFFT would give me offset between signals, which I think is what I need to triangulate the source.
I don't understand neither that thing you say about multiplying phase so I tried doing it in matlab. Phase edlta is 10 degrees between x and x2, the sine wave is at 10Hz so the offset between x and x2 is 1/10*10/360=0.027s (right?)
This graph shows 2, so I'm doing something wrong somewhere.

Code: [Select]

fc=10;
fs=32*fc;
t=0:1/fs:2-1/fs;
x=.5*cos(2*pi*20*t+20*pi/180) +.1*cos(2*pi*54*t+-60*pi/180) ;%time domain signal with phase shift
x2=.5*cos(2*pi*20*t+30*pi/180) +.1*cos(2*pi*54*t+-50*pi/180) ;
N=256; %FFT size
X = 1/N*fftshift(fft(x,N));%N-point complex DFT
X2 = 1/N*fftshift(fft(x2,N));%N-point complex DFT

%phase delta
X3=X.*X2;
phase3=atan2(imag(X3),real(X3))*180/pi; %phase
df=fs/N; %frequency resolution
sampleIndex = -N/2:N/2-1; %ordered index for FFT plot
f=sampleIndex*df; %x-axis index converted to ordered frequencies
plot(f,phase3.*real(X3));

hamster_nz · « **Reply #53 on:** May 08, 2018, 12:03:18 pm »

Sorry for the large swath of code, but it is everything (although using DFT not FFT)

Code: [Select]

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define POINTS   (128)
#define FFT_SIZE (POINTS/2+1)
#define PI       (3.1415926535897932384626433)

double samples1[POINTS],       samples2[POINTS], samples1flipped[POINTS];
double spectrum1_r[FFT_SIZE], spectrum1_i[FFT_SIZE];
double spectrum2_r[FFT_SIZE], spectrum2_i[FFT_SIZE];
double spectrumX_r[FFT_SIZE], spectrumX_i[FFT_SIZE];
double samplesX[POINTS];


/**********************************************************************/
void generate_signature(double *samples) {
   int i;
   for(i = 0; i < POINTS; i++)
      samples[i] = 0;

   for(i =  0; i <  5; i++) samples[i] = -1.0;
   for(i =  5; i < 10; i++) samples[i] = 1.0;
   for(i = 10; i < 15; i++) samples[i] = -1.0;
   for(i = 15; i < 20; i++) samples[i] = 1.0;
}

/**********************************************************************/
void flip_samples(double *s1, double *s2) {
   int i;
   for(i = 0; i < POINTS; i++)
     s2[i] = s1[POINTS-1-i];
}
/**********************************************************************/
void offset_sample(double *samples1, double *samples2, int offset) {
   int i;
   for(i = 0; i < POINTS; i++)
      samples2[i] = samples1[(i-offset)%POINTS];
}

/**********************************************************************/
void complex_mult(double *r1, double *i1,
                  double *r2, double *i2,
                  double *r3, double *i3) {
   int i;
   for(i = 0; i < FFT_SIZE; i++) {
      r3[i] = (r1[i] * r2[i]) - (i1[i] * i2[i]);
      i3[i] = (r1[i] * i2[i]) + (r2[i] * i1[i]);
   }
}

/**********************************************************************/
void add_noise(double *samples, double level) {
   int i;
   for(i = 0; i < POINTS; i++) {
      samples[i] += (rand()*level)/RAND_MAX;
   }
}

/**********************************************************************/
void print_max(double *samples) {
   int    max_i = 0;
   double max = samples[0];
   int i;
   for(i = 1; i < POINTS; i++) {
      if(samples[i] > max) {
        max_i = i;
        max   = samples[i];
      }
   }

   if(max_i >= POINTS/2)
      max_i -= POINTS;
   printf("Max is at %i (%10.4f)\n", max_i, max);
}

/**********************************************************************/
void dft(double *samples, double *r, double *i) {
  int b,p;
  for(b = 0; b < FFT_SIZE; b++)
  {
    r[b] = 0.0;
    i[b] = 0.0;
    for(p = 0; p <POINTS; p++)
    {
      double angle = 2*PI*b*p/POINTS;
      r[b] += samples[p] * cos(angle);
      i[b] -= samples[p] * sin(angle);
    }
  }
}

/******************************************************/
void idft(double *r, double *i, double *samples) {
  int b,p;
  double rs[FFT_SIZE];
  double is[FFT_SIZE];

  for(b = 0; b < FFT_SIZE; b++) {
     rs[b] = r[b]/(POINTS/2);
     is[b] = -i[b]/(POINTS/2);
  }
  rs[0]          = r[0]/POINTS;
  rs[FFT_SIZE-1] = r[FFT_SIZE-1]/POINTS;

  for(p = 0; p < POINTS; p++)
  {
    for(b = 0; b < FFT_SIZE; b++)
    {
      double angle = 2*PI*b*p/POINTS;
      samples[p] += (rs[b] * cos(angle)) + (is[b] * sin(angle));
    }
  }
}
/******************************************************/
int main(int argc, char *argv[]) {
   int i;
   /* Prepare the data */
   generate_signature(samples1);
   offset_sample(samples1,samples2, 50);
   add_noise(samples2, 0.8);

   /* Flip the samples to turn convolution it to correlation */
   flip_samples(samples1,samples1flipped);

   /* Take the DFT */
   dft(samples1flipped,spectrum1_r,spectrum1_i);
   dft(samples2,spectrum2_r,spectrum2_i);

   /* Multiply the spectrums */
   complex_mult(spectrum2_r, spectrum2_i,
                spectrum1_r, spectrum1_i,
                spectrumX_r, spectrumX_i);

   /* Inverse DFT to find how well they match */
   idft(spectrumX_r,spectrumX_i,samplesX);

   /* Output the results */
   for(i = 0; i < POINTS; i++) {
      printf("%4i, %10.7f, %10.7f, %10.7f\n",i, samples1[i], samples2[i], samplesX[i]);
   }
   print_max(samplesX);
   return 0;
}

In the picture below.

- The first graph is the 'signature' being looked for.

- The second is the data to look for the signature in

- The thrid graph is the output of the FFT correlation - the peak is when the how much the second signal is like the first. Note the peak is at the offset you need to align to.

I will be the first to admit that this is just a hack, you will need to develop and test it...

Marco · « **Reply #54 on:** May 08, 2018, 01:36:20 pm »

Quote from: hamster_nz on May 04, 2018, 03:02:43 am

The signature must also have structure to it - one bit of pure random noise is much like any other bit of random noise.

It's the opposite, each bit of noise is unique. A white noise source will give a really clear spike when you cross correlate two microphones receiving it.

ebastler · « **Reply #55 on:** May 08, 2018, 07:50:25 pm »

Quote from: daslolo on May 01, 2018, 11:03:03 pm

Anyway I got the FFT running, and having two cores is sweet, so I might as well use programming to make this thing.
https://github.com/laurentopia/M5-Signal-Multimeter

What's the hardware in that picture? (The little screen plus processor device to the left?) Looks neat!

hamster_nz · « **Reply #56 on:** May 09, 2018, 04:38:10 am »

Quote from: Marco on May 08, 2018, 01:36:20 pm

Quote from: hamster_nz on May 04, 2018, 03:02:43 am
The signature must also have structure to it - one bit of pure random noise is much like any other bit of random noise.

It's the opposite, each bit of noise is unique. A white noise source will give a really clear spike when you cross correlate two microphones receiving it.

Yes, you are right.

daslolo · « **Reply #57 on:** May 09, 2018, 05:31:24 am »

Quote from: ebastler on May 08, 2018, 07:50:25 pm

Quote from: daslolo on May 01, 2018, 11:03:03 pm
Anyway I got the FFT running, and having two cores is sweet, so I might as well use programming to make this thing.
https://github.com/laurentopia/M5-Signal-Multimeter

What's the hardware in that picture? (The little screen plus processor device to the left?) Looks neat!

M5Stack

daslolo · « **Reply #58 on:** May 09, 2018, 11:18:14 pm »

Quote from: hamster_nz on May 08, 2018, 12:03:18 pm

I will be the first to admit that this is just a hack, you will need to develop and test it...

Thank you so much: it does what I need. I like that you didn't bother with FFT and I never realized how slow DFT is until now

Why is the FFT size half the sample size?
In term of cpu cost, counting 5 microphones will this need 5!=120x iFFT + 5x FFT + 5x flipped FFT?
You flip X1 before doing a complex mult, is this the equivalent to a convolution? If so, could the entire thing be done with applying a NxN kernel to both signal?

If anyone wants to try it real quick on their mCPU, here is the arduino code:

Code: [Select]

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "Adafruit_GFX.h"// Hardware-specific library
#include <MCUFRIEND_kbv.h>
MCUFRIEND_kbv tft;

#define POINTS   (128)
#define FFT_SIZE (POINTS/2+1)
#define PI       (3.1415926535897932384626433)

double samples1[POINTS], samples2[POINTS], samples1flipped[POINTS];
double spectrum1_r[FFT_SIZE], spectrum1_i[FFT_SIZE];
double spectrum2_r[FFT_SIZE], spectrum2_i[FFT_SIZE];
double spectrumX_r[FFT_SIZE], spectrumX_i[FFT_SIZE];
double samplesX[POINTS];


/**********************************************************************/
void generate_signature(double *samples)
{
	int i;
	for (i = 0; i < POINTS; i++)
		samples[i] = 0;

	for (i = 0; i < 5; i++) samples[i] = -1.0;
	for (i = 5; i < 10; i++) samples[i] = 1.0;
	for (i = 10; i < 15; i++) samples[i] = -1.0;
	for (i = 15; i < 20; i++) samples[i] = 1.0;
}

/**********************************************************************/
void flip_samples(double *s1, double *s2)
{
	int i;
	for (i = 0; i < POINTS; i++)
		s2[i] = s1[POINTS - 1 - i];
}
/**********************************************************************/
void offset_sample(double *samples1, double *samples2, int offset)
{
	int i;
	for (i = 0; i < POINTS; i++)
		samples2[i] = samples1[(i - offset) % POINTS];
}

/**********************************************************************/
void complex_mult(double *r1, double *i1,
	double *r2, double *i2,
	double *r3, double *i3)
{
	int i;
	for (i = 0; i < FFT_SIZE; i++)
	{
		r3[i] = (r1[i] * r2[i]) - (i1[i] * i2[i]);
		i3[i] = (r1[i] * i2[i]) + (r2[i] * i1[i]);
	}
}

/**********************************************************************/
void add_noise(double *samples, double level)
{
	int i;
	for (i = 0; i < POINTS; i++)
	{
		samples[i] += (rand()*level) / RAND_MAX;
	}
}

/**********************************************************************/
void print_max(double *samples)
{
	int    max_i = 0;
	double max = samples[0];
	int i;
	for (i = 1; i < POINTS; i++)
	{
		if (samples[i] > max)
		{
			max_i = i;
			max = samples[i];
		}
	}

	if (max_i >= POINTS / 2)
		max_i -= POINTS;
	printf("Max is at %i (%10.4f)\n", max_i, max);
 tft.drawLine(max_i, 50, max_i,150, 0b111111111100000); 
	tft.setCursor(0, 170);
	tft.print("Offset = ");
 tft.print(max_i);
}

/**********************************************************************/
void dft(double *samples, double *r, double *i)
{
	int b, p;
	for (b = 0; b < FFT_SIZE; b++)
	{
		r[b] = 0.0;
		i[b] = 0.0;
		for (p = 0; p < POINTS; p++)
		{
			double angle = 2 * PI*b*p / POINTS;
			r[b] += samples[p] * cos(angle);
			i[b] -= samples[p] * sin(angle);
		}
	}
}

/******************************************************/
void idft(double *r, double *i, double *samples)
{
	int b, p;
	double rs[FFT_SIZE];
	double is[FFT_SIZE];

	for (b = 0; b < FFT_SIZE; b++)
	{
		rs[b] = r[b] / (POINTS / 2);
		is[b] = -i[b] / (POINTS / 2);
	}
	rs[0] = r[0] / POINTS;
	rs[FFT_SIZE - 1] = r[FFT_SIZE - 1] / POINTS;

	for (p = 0; p < POINTS; p++)
	{
		for (b = 0; b < FFT_SIZE; b++)
		{
			double angle = 2 * PI*b*p / POINTS;
			samples[p] += (rs[b] * cos(angle)) + (is[b] * sin(angle));
		}
	}
}
/******************************************************/
void setup(void)
{
	Serial.begin(115200);
  uint16_t ID = tft.readID();
  tft.begin(ID);
  tft.setRotation(1);
  tft.fillScreen(0x0000);

	int i;
	/* Prepare the data */
	generate_signature(samples1);
	offset_sample(samples1, samples2, 50);
	add_noise(samples2, 0.8);

 double timer = micros();

	/* Flip the samples to turn convolution it to correlation */
	flip_samples(samples1, samples1flipped);

	/* Take the DFT */
	dft(samples1flipped, spectrum1_r, spectrum1_i);
	dft(samples2, spectrum2_r, spectrum2_i);

	/* Multiply the spectrums */
	complex_mult(spectrum2_r, spectrum2_i,
		spectrum1_r, spectrum1_i,
		spectrumX_r, spectrumX_i);

	/* Inverse DFT to find how well they match */
	idft(spectrumX_r, spectrumX_i, samplesX);

tft.setCursor(0,0);
  tft.println(micros()-timer);

	for (i = 0; i < POINTS; i++)
	{
		tft.drawLine(i, 100 - 10 * samples1[i - 1], i, 100 - 10 * samples1[i], 0b0000000000011111);
		tft.drawLine(i, 100 - 10 * samples2[i - 1], i, 100 - 10 * samples2[i], 0b0000010000011111);
		tft.drawLine(i, 100 - 2 * samplesX[i - 1], i, 100 - 2 * samplesX[i],   0b1111100000000000);
	}
	print_max(samplesX);
}

void loop(void) {}

hamster_nz · « **Reply #59 on:** May 10, 2018, 04:19:41 am »

Quote from: daslolo on May 09, 2018, 11:18:14 pm

Thank you so much: it does what I need. I like that you didn't bother with FFT and I never realized how slow DFT is until now

You could easily make it a bit faster by not calling SIN() and COS() so much, you can make a lookup table but it will cost you in memory. That way sin() and cos() become something like:

sin = sin_table[(index*band)%table_size];
cos = sin_table[(table_size/4+index*band)%table_size];

You have to 'flip' (actually mirror in time) the needle that you are looking for in the haystack. My code is slightly wrong, and gives you a peak with an offset that is off by one - item 0 should not move, but items[1] & items[n-1] should swap and so on.

I am not sure you will need all 120 IFFTs. You may only need to do a handful to get enough information that you can locate the source.

daslolo · « **Reply #60 on:** May 11, 2018, 01:30:52 am »

Oh yeah, just 20x faster

When I get the raspberry pi tomorrow I'll get to work, for real.
Anyone using bare metal things like https://ultibo.org/?


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Making a sound camera (Read 10014 times)

Share me