Author Topic: STM32 DMA FIFO bursts and timing tolerances. (Read 6846 times)

paulca · « **on:** January 17, 2023, 01:45:40 pm »

I getting fed up with circular array buffers. To be honest they cause more problems than they solve.

STM32 DMA supports multi-buffer mode where you can specify two distinct unrelated buffers instead of halving one buffer. However the STM32 HAL I2S driver does not support the use of this. I put it on the back burner to fix that by reimplementing my own I2S driver by borrowing and butchering the HAL code.

In the interim I discovered another technique which might work as well.

DMA Burst Mode and FIFOs. The idea I'm hoping to achieve is DMA'ing a buffer to the I2S peripheral's FIFO. Hoping that when the last DMA transfer completes into the FIFO the TX_Cmplt interrupt will fire and I will have just enough time to point it to it's new buffer before the FIFO runs out.

It would seem however the FIFO is only 16 octets in size. So even a 16bit I2S stream only has 8 samples (4 stereo pairs). At 48K that's about 100us which should be fine. EDIT: Not sure on that number now. However there are other delays in the DMA/I2S peripheral process pipeline, end to end, the DMA cores themselves have FIFOs etc. As Peter pointed out in a post when a DMA transfer reports being complete may not actually mean the peripheral has completely finished the last transaction, just that it's read for a new one.

The reason I'm posting, is, I'm obviously just discovering this technique and I'm giving you guys the opportunity to say "Nope, don't go there, it's won't work." or "Why not just do this instead?"

Why don't I like basic circular buffers?
1. They leave open the error condition of looping the same buffer over and over which creates dangerous audio output that can actually damage hearing.
2. Handling multiple streams requires memcpy'ing buffers all the time to "get out of the way" of the DMA pointers.
3. Managing buffer halves and flipping pointers is just so "flint stones"
4. I NEED to change buffers. I need to adjust buffer lengths. I need fall back/fail safe "silent" error conditions.
5. I have plenty of memory (1Mb) and plenty of power 450Mhz, it's time to bring out some of the "big iron" toys to the party.
EDIT
6: Circular buffers on ins, processing and output forces my timing into being 1 buffer long at each buffer transfer. With the asynchronous "pooled" buffers there can be multiple buffers for the same stream in play in the system.

Here's my current approach: (coded, not tested yet)

DavidAlfa · « **Reply #1 on:** January 17, 2023, 02:06:37 pm »

Why don't you simply use the half-transfer isr?

paulca · « **Reply #2 on:** January 17, 2023, 02:15:47 pm »

Quote from: DavidAlfa on January 17, 2023, 02:06:37 pm

Why don't you simply use the half-transfer isr?

But do what in it?

If you mean the half-transfer on the circular buffers. I need more than 2 halves and I need to change the halves at will.

If you mean the half-transfer on the DMA to FIFO transfer. Hmm. It certainly gives me plenty of time, but maybe too much time. I mean what can I do at "half transfer", I can't change the buffer it's transferring at that point. I could delay for 1/2 a buffer so I'm hot looping on the DMA Busy flag with a buffer ready, but that won't work when there are 2, 3 or 4 receiving I2S streams. I can't block. EDIT: I "can" poll select loop though... hmmm....

Maybe you mean something else.

EDIT: If I use 192 byte buffers, but only fill 96 bytes, leaving the rest zeros. That gives the failsafe of silent failure. However, it might also give me (with the FIFOs) help, enough time to stop and restart the DMA stream... but as I type that, I know it won't work the DMA will already have started to transfer 0s and if you interrupt the I2S it resets it's FIFOs to 0.

DavidAlfa · « **Reply #3 on:** January 17, 2023, 05:38:21 pm »

It's a circular buffer of ex. 1KB, so you get a dma isr every 512bytes, you have another 512bytes left to fill that half, giving plenty of time.

96+96bytes might be too small, it all depends on the I2S rate.

paulca · « **Reply #4 on:** January 20, 2023, 04:26:43 pm »

I am aware of how circular buffers work and how 1960s they are. Especially unbounded ones. My situation is just too complex for basic "ping pong", I need multi-buffer.

I got the code I wrote (to abstract the audio buffers into a buffer pool) running tonight. It got far, far, further than I expected (for about 300 lines of fresh, never run code).

After a few whoopsies with pointer deference and byteSize vs. word size it failed exactly where it was expected to... the Mix down, as I only have 1 I2S input running. The idle one never provides a buffer so it never mixes, perfect error case capture.

However, I identified where the design is lacking in that regard and I don't need anything as medieval as a timer. I just carry the cascade through from the output I2S. When it needs data that is the call for the cascade to work backwards to the Mixer where the buffers decouple. In LL terms when the output I2S TX Complete fires I give it the next waiting pre-pared DMA buffer and immediately flag the mixer to mix "right now" the next one with whatever it's got, no delay. Input streams will quickly align to the output cadence. I hope.

I am having to add latency by having "now" buffers and "next" buffers at the decoupling point, but that fine. The input streams have as many buffers as is in their pool to carry on out of sync and realign later.

The trickier part I expect is making 100% sure buffers are returned when used and not accidentally leaked. A fixed amount of buffers are allocated on start up. Admin will monitor the total free and also monitor that with several metrics and check for consistency. Gets vs. Frees vs. Total available etc. Thread-safing it and prioritising interrupts to make that thread-safing practical will be a challenge.

Note. The lower level coupling to I2S is isolated in one place. If it turns out I don't have enough time to restart the DMA transfer before the I2S FIFO expires, I have a plan B and a plan C

I can make it work.

It also technically allows me to completely decouple these buffer streams from I2S and use SPI instead if I choose.

Neat trick. Instead of assigning buffers to "NULL" to denote they are unavailable, pointing them to a static ZERO array means any mistakes/crashes should play silence.

Other bonuses. A single buffer is never shared between components concurrently. Buffer crashes and warp overs cannot occur. No DMA is reading from a bit of memory being written to. The only concurrency scope is the bufferpool pointers and their pass chain from GetBuffer to FreeBuffer is kept "sane".

Siwastaja · « **Reply #5 on:** January 20, 2023, 07:24:00 pm »

100us is MASSIVE amount of time to do something as simple as buffer swapping and DMA reconfiguration. 450MHz? That's 45000 clock cycles. You can probably create the whole buffer (e.g., a software synthesizer) during that time.

ISR entry, DMA reconfiguration where you just make a few checks and swap pointers, then exit the ISR, is maybe something like 50 clock cycles, i.e. 100ns or so.

STM32 DMA supports automatic buffer swapping (our double buffer mode), but with such easy timing constraints, it doesn't matter.

You can also just simply not use DMA at all and directly write into the I2S FIFO in an ISR handler every 1/fs or so. Heck, given Cortex-M7 at 400MHz, I have written a software DC/DC control loop which runs at 250kHz (including actual calculations, not just data copy), and an SDLC protocol input stage (with bit destuffer AND continuous CRC calculation) which runs at 500kHz. In interrupts.

All options are open with so much excess CPU processing power available.

The things you are working with are only problematic on desktop PC in userland applications. On MCU, you have interrupt latencies within tens of clock cycles, guaranteed, with nearly no jitter.

NorthGuy · « **Reply #6 on:** January 20, 2023, 09:57:51 pm »

Another word for "circular buffer" is "queue". You queue your items on one end. And you pick up items in the same order from the other end as you please. It's really hard to find a better implementation for a queue other than a circular buffer.

Your frustration with circular buffers is because you create a queue for bytes, but then you try to use it for blocks. This, of course, won't work well, but is easily fixed.

Why don't you create a circular buffer for blocks? For example, each elements of the circular buffer will be the pointer to a data block.

When you want to put a block into the circular buffer, you allocate memory for the block, you fill your block in, then you put the pointer to the newly created block into your queue.

At the reading end, when DMA is done with a block, you de-allocate the block, then retrieve the pointer to the next block from the queue and give it to DMA to process.

To allocate blocks, if you don't have a heap, you can create your own allocator. The simplest is a set of N buffers (each capable of holding the longest block) accompanied with one word of information - 0 means the buffer is empty, (x > 0) means the buffer holds a block of length x.

paulca · « **Reply #7 on:** January 21, 2023, 11:29:59 am »

Good to know on the power. I am just not familiar and comfortable with what performance I have yet. So I'm taking measures I might not need to take. Note, an F411 will mix 2 I2S channels, even without DMA, I know I have got that prototype working ages ago. What it won't do however is run 2 shelf and 3 or 4 peak filters without DSP BiQuad optimisation as well as down mix 2 or 3 I2S. Including calculating the coeficients on the fly given user input.

And... my architecture requires 2 of these 'buses'. 2 distinct mix and EQ stages. If I can still get a single H7 to do all that without hassle, then I suppose I best try and figure out how to maximise the number of I2S IO I can get from the SAI blocks etc.

Circular DMA buffers with interrupts memcopying the buffer halves into the next stage would work, it would just be wasteful. It also has way more fiddly low level concurrency issues.

The main issue I still don't see anyone appreciating is how do you mix 2, 3, 4 buffers of I2S together? Ultimately all the buffers need to exist at that time to mix them. However there is no way to synchronise the streams, or at least no easy way. I can start them as two adjacent lines of code, but they are slaves, when they start and stop and how they align is upto the master. You should, surely be able to see that if a buffer can slip by up to a buffer length, there is no way to synchronise within 1 buffer span. Something is going to trample over something.

With my approach that doesn't matter. The input will get itself a new buffer, the mixer will run without a buffer (when I fix it) and if a buffer it early or late the concurrency colision scope is that of the pointers, no memcpy from LIVE buffers to protect. The pointer either points to the new buffer or it doesn't at the exact moment in time the mixer runs. A memcpy getting interrupted by an I2S stream writing into that buffer half and all hell breaks loose. Making a memcopy atomic sounds dangerous, a pretty long time to be blocked on a single core. If my design the software hangs up, the buffers run out and the stream stops. It doesn't blast me with a 1Khz looped sample or a wall of pink noise.

It remains to be seen how often this occurs in the wild. Those brute force buffer drops will only be tolerable if they happen very infrequently. As everything will be running off different clocks, although the same rate. There will be slippage. I just hope it's not that often or I will need to start buffer padding and truncating at the frame level rather than the 1ms buffer level.

paulca · « **Reply #8 on:** January 21, 2023, 11:44:39 am »

The dragon I can hear breathing in it's lair however is the I2S input/output formats.

It would seem, as I'm using pre-build PCB modules I am not going to get the kind of choice and control over those as I would have liked. It looks as though, as least two components will not play nice at 16bit. One of them puts it on a 32bit frame which might be ok, I can handle both 16bit/16bit and 16/32bit by discarding the other 16bits, assuming I can determine which. The other aligns it's 16bit funny in the frame and the DAC won't accept it, need to look into that one.

I might have to just go to 24bit on a 32bit frame and 96K just to keep a flat format in, through and out. For the consumer analogue end I really don't need more than 16bit/48K but for the mixing and filtering it would be very advantageous to use bigger more precise floats/doubles and 2 to 4 times Nyquist and round it off at the end.

I mean the I2S abstraction I wrote could handle multiple stream formats and config, each buffer would be labelled with it's format so something sensible happens in the right places to it. I can go there IF I have to, when I have to.

Siwastaja · « **Reply #9 on:** January 21, 2023, 12:33:06 pm »

With a microcontroller (i.e., not writing mixing software for general purpose PC), you have the option to filter / mix in periodic ISR, reading the single values from input buffers, mixing (calculate one output sample), then outputting the value to I2S FIFO directly. So basically no DMA at all. The advantage is simplicity of code: no management of blocks / buffers, and of course, minimized latency from input to output (if you do mixing of realtime signals).

In general purpose computing, even in kernel space but especially in userspace, you have the risk of mysterious delays of many microseconds, even milliseconds. That is why audio is processed in block buffers. With standalone DSPs or microcontrollers, you have the luxury of direct processing of data.

paulca · « **Reply #10 on:** January 21, 2023, 02:13:56 pm »

So this diagram is exagerated, however I think it shows no matter how long Time t is asynchronous streams will not remain aligned such that you can mix them within Time t (red arrows are valid mix window beginnings).. I don't think it maters if they are 8 bytes or 192bytes or a mega byte.

Also while this prototype has 2, maybe 3 I2S inputs, ideally I would like more in the final project with more complex routing and multiple processing buses. I can certainly mix it all up in one big soupy code base or I can componentize it, async it, block buffer it and abstract the complex stuff out of the way of the important stuff and ideally create a modularised design which can be "plug and pray", update the config extensible.

I realise these concept are some what alien in the MCU/Realtime world. As you said, excessive power overheads. I can waste lots of CPU and memory with layers of abstraction

to make my life easier and my code stronger.

Anyway, the answer to my original question looks like a confident, yes, as long as I don't dilly-dally in the ISRs I should easily be able to carefully abstractly flip a pointer

Siwastaja · « **Reply #11 on:** January 21, 2023, 02:29:10 pm »

Quote from: paulca on January 21, 2023, 02:13:56 pm

I can waste lots of CPU and memory with layers of abstraction to make my life easier and my code stronger.

This is what business management believes is true, but it really is not; this mindset has ruined computing. Instead, make your life easier and code stronger by doing things as simple as possible, so that you understand what is going on. This is definitely possible with audio DSP stuff, it's not rocket science.

Mixing asynchronous signals will require some kind of real-time resampling algorithm. Simplest, nearest neighbor, can be trivially implemented in said ISR based approach: keep copy of latest sample for each input stream. If new sample is not available, then use the previous one.

NorthGuy · « **Reply #12 on:** January 21, 2023, 03:29:41 pm »

Quote from: paulca on January 21, 2023, 02:13:56 pm

So this diagram is exagerated, however I think it shows no matter how long Time t is asynchronous streams will not remain aligned such that you can mix them within Time t (red arrows are valid mix window beginnings).. I don't think it maters if they are 8 bytes or 192bytes or a mega byte.

Obviously, if your receiver is a slave and the streams have different clocks, they will get out of sync - a number of samples transmitted per minute will be different. But they have good crystals, so the difference will not be dramatic. You can determine how big the difference is. Say, if the max difference is 20 samples per minute and your mixer wants to work for an hour, the difference will be 1200 samples at most. You would need a buffer longer than this. And you need to fill in the buffers, so that each contains 1200 samples before you begin producing the output.

By the end of the hour, the slowest transmitter will be 1200 samples behind the fastest one. This is just because they clocks were not ideal. You shouldn't follow their clock errors. You just mix them sample by sample, and output the result with your own "ideal" clock.

paulca · « **Reply #13 on:** January 21, 2023, 03:52:44 pm »

We can debate that all day, but mark my words, those days are numbered. The problem is the markets are changing. It used to be the case that, given hardware is pretty "fixed" in nature so can the code that runs upon it be, written once off bespoke and never changed for the life of the product! I mean that is one of the main advantages, stability and robustness through having read-only code and no general purpose read/write OS to mess things up or maintain. It used to be that the MCUs an embedded code where doing relatively simple things down in the bit bashing with rather simplistic <1000 line project with a single author.

However those days are gone. The market now demands not only that your device work and play nicely with others... which are likely to diversify and be subject to change, but also the market now demands things like CICD and weekly production releases straight out to the 10s of thousands of end users instantly as a firmware update with a simultaneous roll out on the clouds/server side. The market is demanding such rapid pace of development that the only way to achieve it is larger and larger teams, more and more code reuse and frameworking (aka here's one I prepared earlier). When no one person has a total line for line understanding of the software the 1980s approach fails. Not instantly, but in a slow painful death as team members leave the company to avoid working on it anymore.

Anyway. Here, by example is just how complex and wasteful the buffer pool is. It's 40 lines of code! Now I no longer need to worry about where buffers come from. If I have a problem with the buffers I know where to look/fix. I don't need to go fixing the buffer handling code across the whole set of ISRs at once.

EDIT I'm not sold on the zeroBuffer thing yet. I think it creates as many corner cases as it addresses when compared with using NULL pointers for unavailable buffers.

DavidAlfa · « **Reply #14 on:** January 21, 2023, 03:57:14 pm »

To me there isn't any difference between circular double-buffering and circular half-buffering, only "Two buffers that can be in different ram areas" and "two buffers placed consecutivelly".

The DMA ISR will generate a interrupt when it finishes the current buffer and starts the second one, it's same ping-pong game!
If the ISR are too fast, the fix is obvious, increase your buffer size at the expense of a slightly higher output delay.

paulca · « **Reply #15 on:** January 21, 2023, 04:26:02 pm »

I realise that MCU people conflate FIFOs, queues, circular buffer and double buffering to be the same thing. They aren't. Related, often implemented with, using or as one-another but not the same.

Circular buffer != double buffering.
A FIFO ~!= a circular buffer. A FIFO maybe implemented using a circular buffer, but it can be implemented other ways. The same for a LIFO or a stack.
A Queue != FIFO and so on....

It matters not where the memory is. They are implemented differently, have different pros, cons, synchronising and concurrency considerations. Most are in some ways defined as software engineering design patterns with distinct properties.

No matter how many times an MCU datasheets says there are the same thing is doesn't make it so.

paulca · « **Reply #16 on:** January 21, 2023, 04:40:27 pm »

Quote from: DavidAlfa on January 21, 2023, 03:57:14 pm

The DMA ISR will generate a interrupt when it finishes the current buffer and starts the second one, it's same ping-pong game!

I'm not ping-ponging. I have more than 2 time slots in play. It's not just IN buffer OUT buffer. There is the alignment/mix/eq buffer.

The only two solutions are:
* memcopy
* multi-buffering

The mechanics are so close I realise it's confusing. In the case of a DMA circular array the array is fixed in memory. Anything which is to go into it or out of it must be moved to that fixed place in memory. The entire copy process has to proceed uninterrupted or the dragons will be loose. Any mistakes, hang ups, missed interrupts and I get blasted with noise.

In the case of multibuffering, there can be as many buffers as I want, they can exist in many different arrays of memory, they can concurrently be in use across many different "threads" without any of them ever being in the same bit of memory or in any contention. The DMA stream is not "fed" data into it's plate while it's eating from it, it's pointed to a whole new plate while the last one can sit on the counter until the dishwasher is ready (tangent analogy!)

It also provides aggregation decoupling if I so needed. If I had more than one core that could be fun. I could collect a bunch of buffer pointers into a batch and gain processing efficiencies and the beauty is, the client code doesn't need to know a single thing about it.

NorthGuy · « **Reply #17 on:** January 21, 2023, 05:02:18 pm »

Quote from: paulca on January 21, 2023, 03:52:44 pm

Anyway. Here, by example is just how complex and wasteful the buffer pool is. It's 40 lines of code! Now I no longer need to worry about where buffers come from. If I have a problem with the buffers I know where to look/fix. I don't need to go fixing the buffer handling code across the whole set of ISRs at once.

Buffer pool doesn't require random access and often is served better with a linked list, for example:

Code: [Select]

Buffer_t* getBuffer() {
  Buffer_t* temp = freePool;
  if (temp)  freePool = freePool->Next;
  return temp;
}

void freeBuffer(Buffer_t* buf) {
   buf->Next = freePool;
   freePool = buf;
}

This is less code, easier to write, works faster.

Now tell me. Why the market demands bloat, such as zeroPool, instead?

paulca · « **Reply #18 on:** January 21, 2023, 05:12:24 pm »

I'm not throwing away your idea Siwastaja.

Thinking through the implementation it's effectively a single ISR. I2STxFifoEmpty or whatever.

A for loop round the input I2S to lift their FIFOs, mixing as I go with += and then the actual EQ computations is a few dozen FPU instructions, a few hundred clock cycles and we write that to the output I2S FIFO and go back to sleep... or computing the coefficients and outputing metrics VU meter and stats to the screen.

The point on the synchronisation being inherent in it is a valid point. It automatically introduces padding or truncating. Late data, the previous sample gets reused (fine at the sample level, not at the block level). Early data overwrites the FIFO which is exactly the same as shortening the buffer by 1 frame to resync.

The concerns I have are around the margins and scalability.

Those I2S FIFOs are very short. With 32bit stereo frames consuming the entire FIFO for a single stereo sample works out about 10us per FIFO.

So I have no doubt it's going to work for 2 streams, maybe 3, but when it gets to 4 input streams and 2 output streams with 2 different EQs and some routing. It could get very tight.

So it's not that it's an bad approach its just that it's very brittle and when it's capacity is exceeded there is little to no way to extend on it other than a faster micro or slower I2S.

paulca · « **Reply #19 on:** January 21, 2023, 05:25:36 pm »

Quote from: NorthGuy on January 21, 2023, 05:02:18 pm

Quote from: paulca on January 21, 2023, 03:52:44 pm
Anyway. Here, by example is just how complex and wasteful the buffer pool is. It's 40 lines of code! Now I no longer need to worry about where buffers come from. If I have a problem with the buffers I know where to look/fix. I don't need to go fixing the buffer handling code across the whole set of ISRs at once.

Buffer pool doesn't require random access and often is served better with a linked list, for example:

Code: [Select]
Buffer_t* getBuffer() { Buffer_t* temp = freePool; if (temp) freePool = freePool->Next; return temp; } void freeBuffer(Buffer_t* buf) { buf->Next = freePool; freePool = buf; }
This is less code, easier to write, works faster.

Now tell me. Why the market demands bloat, such as zeroPool, instead?

Because your version is unbounded, unsafe, leaking and a NPE exception waiting to happen. Nor is it sequentially thread safe. Your buffers will get allocated and freed out of order and your linked list will corrupt. When you wrap it up with error handlers and corner case conditions it will be a lot longer that the fixed array version.

Mine uses a finite set of buffers which are deliberately IDd such that they can be individually accounted for. It also makes several of the buffer operations atomic. Out-of-ordering does not matter. Buffers are not allocated and discarded but reused. There is no dynamic allocation (except in init) required.

The zeroBuffer isn't bloat, it's a fail safe. The idea is that the pointers are NEVER NULL. It removes the entire ability for a NULL pointer issue by removing NULL and replacing it with (I would hope) a constant buffer of zeros. Putting that into the bufferpool however introduces other issues like handing the zeroBuffer to an Rx peripheral for example and other issues, so I'm dropping it.

Seriously. How many threads in this forum are a wall of tears from MCU people writing 1980s code and then getting all f'ked up by concurrency issues? Hmm? All of these things are simples until suddenly they aren't. Then tears. Why are you attacking me for doing it properly in the first place and engineering it such that I do not encounter the same tears. Designing it in from the start. As my day job is on big multicore distributed iron it's my trade to work these things out in ways that don't result in tears later. You would be surprised how rare that is and how often it's done piecemeal and a liability.

NorthGuy · « **Reply #20 on:** January 21, 2023, 06:11:52 pm »

Quote from: paulca on January 21, 2023, 05:25:36 pm

Because your version is unbounded, unsafe ...

It's safe unless you deliberately try to break it.

Quote from: paulca on January 21, 2023, 05:25:36 pm

leaking

not leaking. Show me a code where it would leak.

Quote from: paulca on January 21, 2023, 05:25:36 pm

and a NPE exception waiting to happen.

But that's what you want. If you try to allocate a buffer, get NULL, but try to use anyway, the program crashes making you aware of the problem so that you can fix it before it causes problems in production.

Quote from: paulca on January 21, 2023, 05:25:36 pm

Nor is it sequentially thread safe. Your buffers will get allocated and freed out of order and your linked list will corrupt.

Not really. show me a code which would do this.

Quote from: paulca on January 21, 2023, 05:25:36 pm

When you wrap it up with error handlers

I won't.

Quote from: paulca on January 21, 2023, 05:25:36 pm

and corner case conditions

There's no corner conditions.

Quote from: paulca on January 21, 2023, 05:25:36 pm

Mine uses a finite set of buffers

mine too

Quote from: paulca on January 21, 2023, 05:25:36 pm

which are deliberately IDd such that they can be individually accounted for.

Albeit useless, you can add Id to my buffers too.

Quote from: paulca on January 21, 2023, 05:25:36 pm

It also makes several of the buffer operations atomic. Out-of-ordering does not matter.

zeroBuffer is not thread safe. Two concurrent threads may get the same buffer. Need to disable interrupts to use in different threads. Same as with mine.

Quote from: paulca on January 21, 2023, 05:25:36 pm

Buffers are not allocated and discarded but reused. There is no dynamic allocation (except in init) required.

Same with mine. THe pool is created once and buffers are reused. That's what the buffer pool does.

Quote from: paulca on January 21, 2023, 05:25:36 pm

The zeroBuffer isn't bloat, it's a fail safe. The idea is that the pointers are NEVER NULL. It removes the entire ability for a NULL pointer issue by removing NULL and replacing it with (I would hope) a constant buffer of zeros.

Very bad idea. Accessing non-existing buffer is a bug. You don't want to mask bugs. Otherwise they may go unnoticed and will cause problems later.

Quote from: paulca on January 21, 2023, 05:25:36 pm

Why are you attacking me for doing it properly in the first place and engineering it such that I do not encounter the same tears.

I am not attacking you. Somebody taught you very bad programming principles which are dominant today and produce bloaty and buggy software, but do indeed save money by letting companies to employ incompetent coders. I try to demonstrate that other ways exist. I don't expect to succeed really. Just hope that may be not all is lost and you can embrace sound engineering principles some day in the distant future.

paulca · « **Reply #21 on:** January 21, 2023, 07:09:12 pm »

You know what, put in practical terms, it doesn't actually bother me.

I mean we are getting close to cleaning up the last mess your style of programming left behind in the enterprises of the 1980s. The embedded and cloud spaces are converging and I don't think your coding style will win. I don't see enterprises pushing functionality down into the MCUs as they currently are. They simply won't accept that pioneering freelander style of coding. It does not scale. It doesn't scale in code and it doesn't scale in teams. We who have ventured forth into the world of millions and millions of lines of code and teams of hundreds or thousands have found... it gets exponentially harder as it scales. We have learnt the hard way that "hacked out code" does not scale. When requirements change, you need to rewrite it again and again and again. At some point someone is going to say, "STOP!", just take the god damn pre-canned, reusable, slightly slower dependency and stop reinventing the wheel! That leads you actual software engineering. A discipline which is far younger than the MCU is and exists exactly because there are very real and very hard limitations to hacking out low level code.

I mean, lets not argue whether a linked list of array is approapriate. Technically, collections theory would suggest a set is the ideal container. However the easiest way to implement a set in C is via an array with integers as keys pointing to a fixed array. Which leads to the second qualifer for a linked list. That it allows items to be added, removed and reordered. None of which are required. It doesn't really matter that the array is ordered, it's really being used as a set.

I can sort of see a differential in approach forming here. I design, I use patterns, I reuse patterns, I aim to create reusable patterns and then I consider how I implement that.

I can see how watching that (this) process can be very frustrating for a monoglot embedded C engineer.

Technically I am a guest in your domain in these forums, so I shouldn't be too critical.

I am honestly in part just bantering with you. Nor am I saying my approach is the right one, just that I also don't think yours is either. I've got a few take aways from the thread. I've learnt few things, all is good.

Siwastaja · « **Reply #22 on:** January 21, 2023, 07:32:13 pm »

Quote from: paulca on January 21, 2023, 04:40:27 pm

Quote from: DavidAlfa on January 21, 2023, 03:57:14 pm
The DMA ISR will generate a interrupt when it finishes the current buffer and starts the second one, it's same ping-pong game!
The mechanics are so close I realise it's confusing. In the case of a DMA circular array the array is fixed in memory. Anything which is to go into it or out of it must be moved to that fixed place in memory.

That's not the idea of DMA! The whole point of DMA is you can change the pointer and write/read to/from anywhere. (This seems to be something DavidAlfa didn't get, either; if you pre-configure the DMA memory addresses then indeed, there is not much difference between one buffer or double buffering mode. But double buffer mode allows you to change the other pointer while one part is accessed.)

paulca · « **Reply #23 on:** January 21, 2023, 07:33:59 pm »

I mean, it all sounds familar to me the way these discussions always seem to go.

Peter-H mentioned it also, if you give people too much information, enough to confuse and distract them, then the thread derails. If you give the too little information they ask stupid questions and propose stupid solutions.

I recall having a horrible time explaining my heating system to a bunch of HomeAssistant fan boys. They repeatedly under estimated the overall picture, features, functionality and protections.

I mean take any one tiny code device, someone picks up on it and says it's not right or isn't needed. You start to explain that exactly how, why and what that component is or does involved a dozen hours of design, dry running, redesigning, diagramming, documenting and doodling to finally choose that approach as either the best or in many cases the least sucky. However without knowing ANY of those particulars or even understand what the thing true does end to end, the average forum guy can tell you the choice was wrong.

It's like telling someone you took the car to the shops and as their only experience in life is living in a city centre they might bork at you and say, "You don't need a car to go to the shops! LOL Garf, garf". However the fact you live in rural utah and the nearest shop is 50 miles away doesn't occur to them at all.

Thankfully. In the day job surrounded by mostly other peer engineers I don't have as much difficulty. On my engineering approach and it being "one big conspiracy of management", what bollox are you talking? Why would management want bloated software? Management want working software, on time, on budget with value add too boot. If you keep giving them estimates to rewrite everything because you refuse to generalise and componentise then... you will be got rid of fairly quickly.

The embedded / hardware / firmware space is a bit more screwed though, because your dependencies cost money per item and you sell products. So it matters a lot to your bean counters if you use a $1 MCU or an $8 MCU. Those are very, very hard limits if the product has a target retail price of only £19.99. The cost of 100,000 MCUs of up to $800,000 versus $100,000 actually will pay a lot of salaries. In my domain that is flipped entirely. The cost of engineers far, far exceeds the cost of the hardware. So engineers who can develop fast, reliable, adaptable code win, regardless of performance or install size. As long as it's good enough and meets the requirements.

paulca · « **Reply #24 on:** January 21, 2023, 07:37:28 pm »

Quote from: Siwastaja on January 21, 2023, 07:32:13 pm

That's not the idea of DMA! The whole point of DMA is you can change the pointer and write/read to/from anywhere. (This seems to be something DavidAlfa didn't get, either; if you pre-configure the DMA memory addresses then indeed, there is not much difference between one buffer or double buffering mode. But double buffer mode allows you to change the other pointer while one part is accessed.)

Agreed. What I'm trying to get at is circular mode is literally just a feature that STM32 added, it has no relation to DMA at all. The fact that STM32 choose to offer a circular mode using an array and 2 halves is an implementational detail decoupled for the concept of DMA. It would be equally valid to not provide a circular array "mode".

Functionally speaking:
HAL_DMAEx_StartMultibuffer( buffer[0], buffer[1] )
and
HAL_DMA_Start( buffer );

are different.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: STM32 DMA FIFO bursts and timing tolerances. (Read 6846 times)

Share me