Hello,
Preface:
I am trying to understand, how ASRC (Asynchronous Sample Rate Converter) works. Intention is, to be able to design and write a piece of code, that could solve the very common and typical problem of having a signal crossing two different clock domains with asynchronous clocks. Typically for example, receiving audio from an asynchronously clocked media (receiving it via ETHernet, SPDIF, etc...) and trying to feed them to a DAC, that is getting a local clock, that is of course not synchronous to the source clock.
The usual situation hence looks like I have a receive circular buffer (with DMA) and an output circular buffer (also possibly with DMA). Even though the source and sink sides have a nominal sample rate of say 48 kHz, but the clocks can (and in real world definitely will) drift and have some jitter. So one of the buffers can potentially overflow (or underflow), depending of which of the clock is faster. I will then be dropping or duplicating samples, which is a big NO as that will result in very audible clicks and signal distortion.
The solution to this according to my research is an ASRC. Even though I have hoarded quite a bunch of different articles, books and other PDFs, I found it difficult to find information about specific implementations (to be able to see examples or more detailed descriptions, other then plain theory.
What I've learnt:
I understand the basic concept of SSRC, when the clocks are synchronous. For example to interpolate by L = 4, I insert three zero samples in between the input samples and than use a band limiting filter to get rid of the in-band image signal. For the band limiting, a FIR filter is typically used. Due to 3/4 of samples being zero, more efficient FIR filter called polyphase is used. I understand the concept of the polyphase FIR filter and how it is possible to reduce the workload by splitting the filter into a L of shorter filters.
What I do not understand fully: .. is how to utilize the polyphase FIR in an ASRC to produce the dynamically variable output length.
What the implementation needs, is a function, that gets fed a constant number of input samples and outputs a variable number of samples (as is required) each time it is called. (Or the other way round: feedin in variable amount of samples and outputting a constant amount each call).
From what I have found, ASRC are being implemented almost the same as SSRC [1], apart that the output polyphase FIR has as many as 2
20 phases [2]. But I am not sure, how having a very long output FIR (many phases) allows me to on occasion generate a couple more or less samples. Could anyone please enlighten me how that works? Even though for example the [2] gives some explanation how the coefficient pointer hops in the large table, the explanation seems a bit weird. (See section 3.2 page 18 in [2]).
The only explanation I could come up myself is to have a rather large table of different coefficient sets for the output polyphase FIR (for slightly different ratios) and to switch in between the sets as necessary so, that the amount of output sampes on average will give the required sample rate ratio.
In the end, I will attempt to write down some code and test the ideas above.
[1] lib_src user guide by XMOS,
https://www.xmos.ai/download/lib_src-[userguide](1.0.0rc2).pdf
[2] Sample rate conversion in DSPs, Marian Forster, Graz TU Austria,
https://www2.spsc.tugraz.at/www-archive/downloads/1_SRC_Marian_Forster_1031275.pdf