After a bit of a head banging session with the STM32 HAL Libraries:
https://www.eevblog.com/forum/microcontrollers/stm32-i2s-w-external-master-clock/I managed to get a "a-typical" signal path up and running with the topology:
WM8786 ADC (Slave 16bit 48K) -i2s-> STM32 (Master/Master) -i2s-> PCM5102 (Slave 16bit 48K)
Additionally I added other tasks, just to test the water on how picky things are on timing. It looks like about 25% load duty cycle just copying the buffers int, by int. Which sounds bad, but that's 250uS on a single frame buffer.
See:
https://www.eevblog.com/forum/beginners/breadboards-and-mhz/Giving it a much more beefy task like, calculating and rendering a peak level indicator and I had to get much more careful and spread the task out around the "super loop" and not do it all in one cycle. I even ended up making the "super loop" "ordered" and the audio buffer processing mutually exclusive so it always gets priority. Even with that, the rendering of the single channel meter on a TFT screen (a bar 20 pixels by 320 pixels), took longer than a single or double frame buffer. I had to go to a whole 4ms buffer time.
Firstly though, the TFT driver is not using DMA, just interrupts, so it's pretty invasive, a DMA driver might be better, but ... It's not a valid use case really. No audio processing MCU is going to been made deal with anything more than a tiny oLED in this project. There will hopefully be a 4" TFT touch screen, but, the MCU 'owning it' will not have audio time critical tasks. UI / HID inputs are incredibly slow in comparison and can be queued, wait and updates sent/received when time allows.
I'm studying up to see if I can pick up enough terminology to use/abuse and tune a filter library for the EQ stage. In parallel, other than continuing the breadboard prototypes I am exploring the architecture design.
There are still questions surrounding the USB "front ends", aka inputs. I hope to use high speed SPI for the internal streams. I only need to couple to the audio clocks at the front-ends and rear-ends. Where an I2S ADC, DAC or USB interface will have to actually clock out the audio. For everything in the middle I do not need to slave to the audio clock and can send many I2S streams over a synchronous time sliced SPI/DMA bus at 48Mbit/s+.
I have started a google sheet with timings for various packet/buffer sizes. I need to form a complete end-to-end path analysis and workout total latency end to end.
Ideally, unless I implement endpoint feedback on the USB - which will primarily be the interface live audio comes in - I have set the optimistic latency budget to under 20ms. If I intended to connect an instrument to it, it would be less than 10ms. Luckily I don't. However By 50ms (2 HD video frames) you might just start to notice A/V sync issues, by 200ms they will be obvious and annoying.
For a "worst" case scenario, assuming 16@48k, it works out at something like a 0.25ms latency to write 4 channels of I2S data to a 48Mbits SPI. ie, the longest a channel has to wait on its next data is about a quarter of a milisecond. Obviously, however, that goes up if you need to buffer for any reason. Also, if a stream misses it's CS slot and has to wait on the next slot, it could be over twice that long. Double the buffer size, double the latency. However it tells me that SPI will not be the problem. As long as I implement a "fair" contention policy or stick to the fixed "round robin" time slicing on the bus.
So... if I have a 4ms input buffer to allow for basic input processing like: calculate the peak audio level, left/right, update an oLED periodically with the input level in %, periodically respond to a rotary encoder. Also things like dead time detection, buffer under/overruns, anti-pop/zipper/buffer crash detection etc. It's I2S stream gets sent on the SPI bus when it's CS line is pulled.
All 4 channels dump their streams on to a single SPI bus. The bus mastering will be handled by one of a pair of STM32 bus processors. Primary and secondary. These will be where the mixing, routing and EQing happens. So all inputs are SPI slaves, the primary bus processor (SPI Master) gives them permission, round robin, to off load I2S frames onto the bus for an agreed time. However the other bus processor is a slave and is free to "spy on the spi". Such that if it's been set up to process channel 3 it will watch the bus CS lines for channel 3 to start and buffer off it's I2S frames for processing onward. Similarly the primary bus processor, the master, can buffer off whatever streams it is setup to process.
Hence the architecture thus far is sort of:
n Inputs -> SPIBus-> 2 processors -> SPIBus -> nOutputs.
Maybe, 4->1->2->1->4.
Other "channels" to consider are the HID controls for the processors. I haven't really spent much time thinking about this other than the realisation the realtime nature of audio will mean that HID part will need to be decoupled out of the critical path... that's going to involve at least one more "admin/status/control" bus. Although that would be a low speed, bursty, low priority bus. Messages like errors, "Output 4 BUF_UR" causing a warning indicator LED or TFT panel to light up momentarily to communicate this. Also more routine updates for channel levels, CPU load on processors, that kind of thing. Obviously the ability to see the input channels, select them, select the mix levels, eq settings and the output channel to assign it to. All have to be designed into a UI.
A a programmer though, I know what I'm going to write a command line UART interface first, just so I don't have to get bogged down into painstaking UI design. As a programmer I can see that UART command interface being far too permanently temporary too!
Open Issues:
* Fix/Implement endpoint feedback and Isochronous sync for the USB audio on STM32.... or explore the hardware availability for using hardware USB Audio bridges like PCM270x.
* HAL Driver bugs are irritating, every time I regen my code from MX I have to go and fix their bugs again. Need a more workable solution without leaving MX behind completely.
* Implement a working stream drop/pad for reclocking broken streams or fixing buffer over/under runs.
* IC Availability issues and exploring AliExpress sources to see if they work. If it works as it should I might not care if it's a fake at this stage.
Questions:
* is going to 24bit internal audio format and using the extra 8 bits as pure digital headroom, before limiter/compressor it back to 16bit at the output stage..... going to be worth the extra processing overheads.
* How will SPI Slave timing work in terms of the critical CS lines. If a slave is not immediately ready to send when it's line is pulled this has to be detected somehow. I will read up, but I expect this issue can be solved by allowing either end control over the CS line. The bus master pulls the line low for a very short amount of time at the start of the slaves window. The slave has to immediately respond pulling the CS line low again, then transmitting. The issue I see here is the bus master is handing control to the slave, if the slave gets "naughty" and hogs the bus, its a bug, but a nasty to fix one. It also has electrical considerations when asking can the master pull the line harder than the slave to "cut it off"... I don't think that would matter, if it's stomping on the MOSI/MISO it's irrelevant what the CS lines are saying. So the bus has to be "consensual" and "fair" on both master and slave sides. If that can be maintained ... and I am the single programming writing both slave and master, then I can implement a dynamic bus mastering protocol which offers space to a slave, but revokes it if the slave does not immediately respond with interest by holding it's CS line low. The master will then trust the slave to not hog the bus, allow it to hold the line low until it's finished, then round-robin to the next "fairest" slave.