Author Topic: DMA Usage in Projects (Read 1262 times)

Kittu20 · « **on:** September 05, 2023, 06:21:24 pm »

Hello everyone,

I understand that DMA is a powerful feature for minimizing CPU involvement in data transfers between memory and peripherals. However, I'm eager to learn from real-world experiences and applications.

Could you please share some practical scenarios or examples where you've found DMA to be exceptionally useful in microcontroller-based projects? I'm interested in understanding when and why you would choose to leverage the DMA feature over traditional methods that involve the CPU directly.

Your insights and real-world examples would be greatly appreciated, as they will help me better grasp the practical benefits of DMA in microcontroller applications.

zilp · « **Reply #1 on:** September 05, 2023, 06:45:32 pm »

Really, there is nothing magic about DMA, it's just a sort of very specialized co-processor, and you use it if you can't meet application constraints with your main processor core. Usually, that would be either because you can't meet realtime scheduling deadlines "in software" (like, you have to feed a DAC with a one-sample FIFO without buffer underruns), or because your main core can't keep up with the amount of data that you need to move plus other processing loads, so you can use DMA to reduce that load/increase throughput.

globoy · « **Reply #2 on:** September 05, 2023, 07:28:27 pm »

One very common place it is used - especially in hobbyist projects - is to push pixels to an LCD over an interface like a parallel port or SPI. On micro-controllers with enough memory one might have two in-memory frame buffers configured in a ping-pong arrangement. The CPU can be drawing to one frame buffer while a DMA engine is moving the other frame buffer (or parts of it) to the LCD controller. This frees the CPU to spend almost all of its time drawing as opposed to moving data.

Other places it is often used are with higher speed peripherals. Consider some sort of camera module or networking interface. DMA can be a good way to move an entire frame of data (whatever that frame contains) between memory and peripheral. The CPU typically only needs to attend to servicing some interrupts and setting up the transfer which takes very little time. Then it can do other things.

Audio is another area since it generally has hard-time requirements (e.g. you can't let a codec starve for data or you'll hear it). A DMA engine can manage one or more audio buffers for the CPU. DMA engine controllers often can support things like lists of transfers which let the CPU keep ahead of them and let them provide an uninterrupted flow of data to or from something.

For a specific example I've made thermal imaging cameras using either a Flir or Infiray modules and LCD controlled by an ESP32 dual-core micro-controller. The Flir especially has pretty high data transfer rates because of the crazy way they made the interface (you have to read more than 3X the frame rate just to keep it happy). DMA is used to read packets from the cameras and also to push pixel data to the LCD (as well as manage the Wifi radio and filesystem on a Micro-SD card). This is helpful because the CPUs have to do stuff like analyze the raw data to get image data and then apply a bilinear interpolation to scale the image for the display, as well as attend to housekeeping activities, user interface stuff, writing files to storage and communicating over wifi. There's no way this could have been done without a lot of DMA going on (e.g. having the CPU babysit individual bytes being moved over a SPI interface).

DavidAlfa · « **Reply #3 on:** September 05, 2023, 08:53:31 pm »

- Send a buffer through SPI/I2C/UART while you're processing the next one.
- Transmit raw audio through I2S while the CPU is busy decoding the next MP3 frame.
- Fill a buffer with some specific value while doing something else.
- Transfer a raw picture from a SPI flash to the display without any cpu intervention, going peripheral to peripheral.
- Make a RGB LED animation using a circular DMA buffer triggered by a timer.

Etc, etc, etc. DMA is extremely useful.

ejeffrey · « **Reply #4 on:** September 06, 2023, 03:43:15 pm »

One thing that is common in many of the examples above is that you are feeding data to a "slow" peripherals that doesn't have much of its own memory. For example, a typical microcontroller serial port operates at only about 15 kilobytes per second but might only have a 1 byte fifo. SPI peripherals are a bit faster and might load 16 or 32 bit words.

Doing the loads themselves don't take much time but if you are generating tons of interrupts or worse polling to see when you can write the next word there is a lot of overhead

A device could have a larger built in fifo, let you transfer data at high speed to fill the fifo and then only fire an interrupt when the fifo is almost empty. The CPU would still be doing the transfers but larger blocks are more efficient. However each peripheral would need it's own memory and an interface to the CPU capable of fast burst transfers.

One reason to use DMA is that it essentially allows you to allocate a portion of your main system memory to specific peripherals and gives you control over exactly how it is used, while letting the peripherals themselves be simpler, smaller, and lower power.

gpr · « **Reply #5 on:** September 06, 2023, 04:12:02 pm »

One example from my recent experience, might be useful for somebody.
I've been working on a "music synthesizer" device, which generates music by writing to DAC register at some sample rate, say 48kHz.

Before DMA:
Set timer to fire at sample frequency. In timer interrupt handler, do some simple calculation of sample value and write it to DAC output register.

After DMA:
Precalculate a buffer of samples, say 1024 values. Setup DMA using this buffer as a source (with address increment) and DAC register as a destination (without address increment, so all DMA writes a going to DAC output register), configure "half transfer completed" dma interrupt. Set timer to sample frequency, but instead of configuring timer interrupt, configure it to trigger DMA transfer of one sample, from buffer to dac register. In "half transfer complete" interrupt calculate and fill transferred half of buffer with new data.

Result: no more timer interrupts storm, all writes from buffer to DAC are happening in DMA, triggered by timer. All in hardware, without using CPU. CPU just prepares a buffer of samples occasionally without hard real-time requirements. Effectively this removes timer interrupt overhead.

I didn't measure the improvement quantitatively yet. Probably I can measure maximum achievable sample rate in both cases, or learn how to gather some statistics from FreeRTOS I'm using, suggestions about it are welcome.

Doctorandus_P · « **Reply #6 on:** September 06, 2023, 04:40:55 pm »

Like gpr mentioned. I guess quite often you do not need the extra performance of DMA, but it eases software design and helps wit achieving timing deadlines. Instead of handling each byte separately, you just connect a timer to a DMA channel and let the hardware do it's thing while the microcontroller is doing background tasks.

=====================
A project I am thinking of doing (but have not started yet) is to use the highest speed DMA I can get to do DDS. I very much liked Jesper's very old mini DDS project in which an under powered AVR and hand crafted assembly was pumping out DDS data. That project needed a reset to get out of a loop to for example update DDS parameters, as there was no room for that in the small AVR.

If I do this DDS project, then I will have multiple buffers, each a few kB, with DDS data, and for (for example a sine wave) the wave table is not stored linearly in the buffer, but it is pre-shuffled in such a way that consecutive DDS output data can be pushed by DMA to an R-2R network (keeping costs down). With a 16-bit R-2R network, you can also do most of the amplitude in software. And each time settings are changed, the uC would then pre-calculate one of the buffers, and then swap the DMA to use the new buffer.

It would be quite a lot of finicky programming to get correct frequency out of the thing, because you have 3 parameters to manage. 1). total length of the buffer. 2). The number of periods you store in the pre-calculated DDS buffer 3). The rate with which the DMA updates the output. But I think it's doable, and you can build an arbitrary function generator this way which can output up to a few MHz and fine resolution and amplitude control. The only thing I am not sure about yet, is if DMA can do glitch free circular output from a buffer. (I can accept glitches when the parameters change and (another?) DMA channel has to be configured to use the new settings.

ajb · « **Reply #7 on:** September 06, 2023, 04:59:38 pm »

In some cases DMA is built right into high speed peripherals and you have no choice but to use it. This is common for USB and Ethernet MACs, as it simply wouldn't be possible/practical for the core to be directly involved in transactions at the speeds/latencies those interfaces require.

You can also use DMA to coordinate actions between peripherals to produce relatively complex behavior with no CPU involvement. For example, a friend of mine has a project that send samples to two different sets of DACs, where the sample rate can be varied on the fly, and there's a programmable phase offset between the two sets of channels. Total data rate is something like 20Mbps max across ten channels. Six timer/counters and five DMA channels handle all of the output timing and coordination: sample timing, phase offset between channel sets, transferring data to the DACs from two different buffers, driving the #LDAC lines, and executing rate changes at specified points from a separate rate buffer. It consumes a lot of peripherals and was a lot of work to implement, but it requires almost zero interaction from the core, so allows for a higher sample rate than might otherwise be possible given the hardware.

joeqsmith · « **Reply #8 on:** September 06, 2023, 06:38:56 pm »

I remember reading about it in college and thinking it was a good idea but never had an application to use it for a home project until many years later.

The attached photo is showing an Ethernet reverse print server. Photo 17, just below my 10Mb Netgear hub is my Ethernet print server. This allows me to put a Centronics type printer on the LAN. So the PC sends the plot to the print server which then sends the data to the reverse print server, which mimics a printer. The reverse print server then sends this data over the Ethernet to the printer. The whole reason for this is to allow me to use my old test equipment with a Centronics port to print to my LAN based printers. Video shows it in operation. Very vintage tech.

DMA... So my reverse print server uses a Motorola MC6801 which is a small, slow 8-bit microcontroller. One problem is the protocol requires knowing how much data you plan to send. This means I needed lots of memory (not by today's standards). Then it's not like there is an easy way to put that micro on Ethernet. So I wrote a stack for it in assembler. To work around the limitations, I used an FPGA. This device handles refreshing the DRAM. But more important, it performs the handshake from the simulated printer port and directly moves this data into the DRAM. The micro then has access to the DRAM and can read it at a slow pace while it handles the Ethernet traffic.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: DMA Usage in Projects (Read 1262 times)

Kittu20

DMA Usage in Projects

zilp

Re: DMA Usage in Projects

globoy

Re: DMA Usage in Projects

DavidAlfa

Re: DMA Usage in Projects

ejeffrey

Re: DMA Usage in Projects

gpr

Re: DMA Usage in Projects

Doctorandus_P

Re: DMA Usage in Projects

ajb

Re: DMA Usage in Projects

joeqsmith

Re: DMA Usage in Projects

Share me