Author Topic: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?) (Read 6351 times)

Yansi · « **on:** May 25, 2016, 03:45:42 pm »

Hi!

Preface:

I am just building some more complex free time audio project, comprising few outputs, inputs and SPDIF input. errr... AES/EBU (AES3 balanced) input to be exact.

A part of the design functionality is something like a simple SPDIF soundcard (AES3, but that doesn't matter, these are compatible up to some point). Let's suppose my local DACs (and ADCs) in the design work from a synchronous low jitter precision clock source. Now I want to bring data to the system using SPDIF interface. But there's a catch - SPDIF clocks are asynchronous against my local system, because SPDIF clocks are defined by the transmitter (which may be almost anything from a PC up to some other professional AES3 equipped studio stuff)

A "sort-of-standard" hobby soundcard designs use something like DIR9001 chip from Texas, then slap there a DAC and done. But this is not the case, as my system also does some real time processing on other sources of data (ADCs) and needs to "cross the clock domains".

The problem:

I would like to use an integrated SPDIFRX peripheral inside STM32. The peripheral does not have any clock recovery features I could use to switch the whole system from local clock to the recovered SPDIF clocks. How to do the "clock domain crossing"? When I leave my DAC running at my local clock and feed it from the SPDIF received data, overflows or underflows have to occur naturally, as those two clock domains are not synchronous. The more the two clocks deviate one to each other, the more often samples of data will be missing or dwelling in buffers.

How to solve this issue? It seems like the SPDIFRX peripheral is half-unusable, but still I don't want to believe that, as there must be some solution to it. I have seen some (but veeery little) examples from other people, that just copy the incoming data to the DAC. Like this one: http://test.openstm32.org/forumthread921 No visible effort to synchronize the stream. But yet it somehow works. Wtf? How?

I have tried to look for information about asynchronous resampling algorithms, but fuckin' hell! I have found a lot of stuff, but mostly useless university papers, as those were some very fancy algorithms that used half the computational grunt of an adult SHARC DSP and yet it was called an efficient algorithm. Wtf 2.0?

Have somebody tried the SPDIFRX in STM32? What is your experience with it? How did you solve the synchronization issue? Or am I missing some points completely and looking for problems where they aren't?

Thank you for help,
Yan

EDIT: It seems that realtime ASRC is doable even on Cortex M3: http://www.dspconcepts.com/products/dsp-library/sample-rate-converter
But HOW? :-O Haven't found any barely useful info till now.

Buriedcode · « **Reply #1 on:** May 25, 2016, 10:30:54 pm »

I have never used the STM32, but have done some work on clock recovery from serial streams (8/16/32 bit devices, manchester coded etc..).

You're right in that systems are clocked by the incoming stream, often by a good 'ol PLL, which is why bad SPDIF connections *can possibly* cause jitter (assuming they get the data). USB DAC's have a similar problem, but deal with that by either skipping/doubling samples to keep the buffer at a reasonable level (synchronous), or informing the host of the sample rate relative to the USB tick (asynchronous), or using an on-board PLL (isochronous).

I'm pretty sure the STM32 has PLL's, with fractional dividers that allow a wide range of frequencies generated. Perhaps you could sample the incoming stream, get a 'rough' idea of what the sample rate is (assuming its a standard, so you'll only have to check for a few possibilities), set your PLL for that, and tune it on the fly. You could either check the buffer level for underrun/overrun and adjust accordingly, or use the stream to increment a counter, whilst having your system clock increment a counter directly, and periodically read the difference. Basically a digital PLL.

There will be 'jitter' on the output as the oscillator adjusts every now and then, but it probably won't be audible if the PLL has half decent resolution. Perhaps examples using USB audio using isochronous transfer would tweak the I2S PLL on the fly also.

I probably should pick up a discovery board as everyone raves about them, whilst I'm still writing a USB soundcard for the PIC24...

Scrts · « **Reply #2 on:** May 26, 2016, 02:49:17 pm »

Well, it seems a simple solution to solve by the FPGA, but since you want STM32, you will end up in controlling the frequencies. In video world, there's a technique called genlock - you write the data to the memory, then check the buffer fill level. If it's getting towards full -> increase the frequency, if it's getting empty -> decrease. So you'd need to control internal PLL if it's accurate enough or otherwise you'd need to use external clock/PLL with a control. E.g. SiLabs have PLLs with I2C interface.

Yansi · « **Reply #3 on:** May 26, 2016, 03:36:15 pm »

I don't see any easier solution for it in an FPGA. The clocks will drift the same way, FPGA or MCU. So what simple solution have you thought of? I think there is none simple one.

For real time audio, there only two solutions possible: A PLL locked to the incoming asynchronous stream to clock the whole system from the recovered clock, or to use ASRC algorithm.

When using multiple asynchronous data inputs (like multiple SPDIF/AES inputs at once), ASRC is the only one solution.

And as it seems there is zero to none information on the internet how to cook a diy ASRC, I have just slapped a DIR9001 on my board. Luckily, I have only one SPDIF/AES input to deal with. Funny though, I will still try to decode the SPDIF data inside the STM32, instead of the DIR9001. Just because the SPDIFRX peripheral seems to be way more capable by means of convenient use (DMA, automatically assembling U and C bits into words,...) The DIR9001 will be used only for clock recovery. It's not that expensive, to be used like that. If I fail to work the SPDIFRX peripheral out, I have connected also the I2S bus from the DIR9001 as a backup.

Buriedcode: The SPDIFRX peripheral provides means of accurate input frame rate frequency measurement. There is a toggle bit which toggles on each frame and this bit can be connected to one of the TIMers to masure the exact frequency. Unfortunately the PLLs inside the MCU are quite crap for such means of accurate and fine settings. There only a few integer dividers. And also the coefficients cannot be modified on thy fly. (which is sick).

hamster_nz · « **Reply #4 on:** May 26, 2016, 09:32:14 pm »

Maybe a polyphase filter, where the 'virtual sample rate' is high enough to meet your jitter needs.

Conceptually it takes an incoming 48,005 S/s signal, up-samples to 49,157,120 S/s, then decimates it to give you one in every 48,005/48,000th sample from the 49,157,120 S/s stream.

To do this you will need 1024 different FIR filters, of maybe 15 different taps (pulling numbers out of thin air), and you select the filter based on the relative phases between the two signals. CPU usage is 15 fixed-point MAC for each sample, or 720,000 integer multiply/accumulates per second. You can reduce the filter size to meet the CPU requirement, but it will also drop the quality of the filtering.

Yansi · « **Reply #5 on:** May 26, 2016, 11:13:25 pm »

15 fixed point MACs sound like nothing... I will have spare time for that! (... in that Cortex M4F

) Seems interesting, but I know little to nothing about that kind of filter (i only barely know polyphase exists).

I will definitely try to read something about that, but now I have to finish the PCB so I can have at least some piece of HW to play with.

Scrts · « **Reply #6 on:** May 27, 2016, 06:25:26 pm »

Well, recent FPGAs have fractional PLLs, so it's easy to adjust the frequency according to the buffer level.

Yansi · « **Reply #7 on:** May 28, 2016, 02:09:40 pm »

Thats still not much of a valid solution, because this will work only for a system with a single async. clocked data input.

The polyphase filter seems much more what might be possible to use, but still I don't understand it much/at all, even how could I use that in case of the asynchronous input stream.

Scrts · « **Reply #8 on:** May 31, 2016, 07:40:09 pm »

Quote from: Yansi on May 28, 2016, 02:09:40 pm

Thats still not much of a valid solution, because this will work only for a system with a single async. clocked data input.

The polyphase filter seems much more what might be possible to use, but still I don't understand it much/at all, even how could I use that in case of the asynchronous input stream.

This would work with as many input sources as clock inputs in the FPGA (which is usually a lot).

Yansi · « **Reply #9 on:** May 31, 2016, 08:04:31 pm »

No it won't !!

Brain please!

Or how will you clock a single DAC from multiple PLLs fractional or not? It is a nonsense. We a re talking here about synchronizing asynchronous together, not making a heap of asynchronous receivers, which could be done easily without any PLLs, just with enough oversampling (as is done in the SPDIFRX in STM32).

Scrts · « **Reply #10 on:** May 31, 2016, 08:35:40 pm »

Quote from: Yansi on May 31, 2016, 08:04:31 pm

No it won't !! Brain please!

Or how will you clock a single DAC from multiple PLLs fractional or not? It is a nonsense. We a re talking here about synchronizing asynchronous together, not making a heap of asynchronous receivers, which could be done easily without any PLLs, just with enough oversampling (as is done in the SPDIFRX in STM32).

Ok, maybe I understand the problem differently.
What I've done before on different systems with asynchronous clocks was a custom made FIFO, where the write is done with incoming clock and read is done with local clock. I always kept the tracking of write and read pointer addresses, so I could keep the write pointer ahead of read pointer by half of the buffer. If the write pointer becomes faster, the buffer is going towards overflow, I increase the clock frequency of the read pointer and vice-versa if the buffer is getting empty.
I am not sure you you can use this for SPDIF, because this method is mostly for packeted streams.

Yansi · « **Reply #11 on:** May 31, 2016, 09:44:43 pm »

I think you cannot use that, packetized data or not. If you have multiple asynchronous clocked input streams, it does not help in any way to tweak any frequency (the fifo read, in this case). Against which asynchronous input would you tweak the output? You will synchronize the output with one of the inputs, than what about other inputs?

There might be a slight confusion - I think you are missing the part I have to process all data synchronously (mix, filter, combine, multiply, whatever...) at a single output rate. The output DAC is only one and can accept only one clock signal.

The only valid solution is to use kind of asynchronous resampling algorithm, like the polyphase filter which was offered. Unfortunately, I can't talk much about these, as was the reason I've been asking here in the first place

hamster_nz · « **Reply #12 on:** June 01, 2016, 04:04:04 am »

A brief introduction on how the Polyphase filter works.

Say you had raw samples, at a rate of 1kHz, and you want to move them to 1.5kHz.

To do this you can
1. 'Upsample' to 3kHz sample rates
2. 'Downsample to 1.5kHz.

This is at the same time easier than it sounds and harder than it sounds.

The second step is easy. If the 3kHz stream of samples contains no frequencies over 750Hz, then we can just discard every other sample and not loose information.

So the hard bit is how to upsample.... and it isn't too hard either

We can just put two zeros between every sample, then run that through a low pass filter (with a cut-off below 750Hz) and we have our 3kHz stream

Getting the ideal low-pass FIR filter kernel is very complex, but here's one I made earlier:

Code: [Select]

 n    filter[n]
-12 0.0517
-11 0.0641
-10 0.0460
 -9  0.0000
 -8 -0.0575
 -7 -0.1008
 -6 -0.1034
 -5 -0.0490
 -4 0.0612
 -3 0.2067
 -2 0.3527
 -1 0.4604
  0 0.5000
  1 0.4604
  2 0.3527
  3 0.2067
  4 0.0612
  5 -0.0490
  6 -0.1034
  7 -0.1008
  8 -0.0575
  9 0.0000
 10 0.0460
 11 0.0641
 12 0.0517

it is not ideal, just =SIN(n*PI()/4.5)/(n*PI()/4.5)*0.5, but is close enough.

To apply the kernel you multiple the stream of samples with filter coefficients, and then sum the result - e.g.

Code: [Select]

 total = 0;
 for(f = -12; f < 13; f++) {
   total += sample_in[n+f] * filter[f];
 }
 sample_out[n] = total;

So that is how to get a 3kHz sample stream, and then just throw away every other sample.

Great this is all standard FIR DSP. So what is a polyphase filter?

It is just an optimization of that process, by not calculating the values of sample[n] that you will throw away during decimation, and by understanding that due to the upsampling, most of the input samples will be zero.

Here are the MACs used to generate one of the output samples:

Code: [Select]

0.0517	* 0
0.0641	* 0
0.0460	* 1.0000
0.0000	* 0
-0.0575	* 0
-0.1008	* 0.8660
-0.1034	* 0
-0.0490	* 0
0.0612	* -0.8660
0.2067	* 0
0.3527	* 0
0.4604	* -1.0000
0.5000	* 0
0.4604	* 0
0.3527	* -0.8660
0.2067	* 0
0.0612	* 0
-0.0490	* 0.8660
-0.1034	* 0
-0.1008	* 0
-0.0575	* 1.0000
0.0000	* 0
0.0460	* 0
0.0641	* 0.8660
0.0517	* 0

Because of the ratio of input sample rate to output sample rates, there are three different phases, and each requires around 9 MAC operations to generate an output value.

If we were not using a polyphase filter, it would require 25 MACs per sample, and we would calculate twice as many samples before d(so 50 MACs per output sample). As the upsampled sample rate goes up, using polyphase gets more and more efficient.

When the input and output sample rates are very close to each other e.g. 48,000Hz and 48,005Hz you can fudge it to be very close to ideal (e.g. with maybe 1024 different phase alignments to choose from).

NiHaoMike · « **Reply #13 on:** June 01, 2016, 06:29:26 am »

Depending on how much latency is acceptable, one trick you can do is buffer the incoming data and then duplicate or drop samples during silent periods. That way, there would not be any noticeable glitching. You can make the threshold adaptive in that how far the buffer is about to overflow/underflow changes how low the envelope has to be before samples get duplicated or dropped, with higher levels becoming more acceptable as the situation gets more "desperate". Obviously, that will only work with small rate mismatches.

Yansi · « **Reply #14 on:** June 01, 2016, 06:42:12 am »

hamster_nz Thank you for your detailed explanation! I will read through carefuly, when I'll be home in the evening.

NiHaoMike I've read that large enough circular buffer can smoothen out the clock varaiations, as you say. But I'm targeting a real time audio application and want the latency lower than 1ms. (a buffer of few tens of samples won't help much). But sure, for playback only this might be good enough!

Marco · « **Reply #15 on:** June 01, 2016, 09:04:54 am »

I don't think you want to use polyphase filters to solve clock drift between two unsynchronized sources. It's a good to go from say ~44.1 kHz to ~48 kHz source, but not ideal for the alignment between two sources which require non integer ratio interpolation. You could toggle between two integer ratio polyphase filters I guess to get the non integer ratio, but I doubt it would perform well.

Interpolation with >96 dB signal to noise ratio can be done with a filter of 32 taps, with coefficients derived by linear interpolation from an lookup table consisting of 2048 samples of a Kaiser windowed Sinc function. So ~100 MACs (not that that's a very useful metric on architectures which can't dual issue with the MACs).

jahonen · « **Reply #16 on:** June 01, 2016, 09:33:28 am »

I think that easiest way is to use external SPDIF receiver with ASRC, like SRC4392 and let it handle the sample rate conversion, just provide it with correct master clock from your µC and serial audio sync signals, and use serial audio input to feed the audio data to the MCU. Of course it is not neat single chip solution but it works.

Another way would be to fine tune the MCU master clock frequency via some external circuit (software controlling DAC + varicap or something like that) so that MCU operates exactly synchronous to the incoming SPDIF data, but that is another can of worms.

Regards,
Janne


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?) (Read 6351 times)

Yansi

SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Buriedcode

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Scrts

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Yansi

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

hamster_nz

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Yansi

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Scrts

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Yansi

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Scrts

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Yansi

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Scrts

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Yansi

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

hamster_nz

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

NiHaoMike

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Yansi

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Marco

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

jahonen

Re: SPDIF RX - async clocks question (SPDIFRX in STM32: anyone tried?)

Share me