Yep.. but to defense of the OP that is if you know what you're doing and are fully familiar with theory of operation. I've worked with DMA's on various ARM processors and PIC32, and once you get the hang of each family of processor (e.g. documentation writing style) they are all equally simple to setup.
One key to thing to note is to always check what DMA IRQs are supported on which channels. I know that for PIC32 this could be any source so was never an issue, but on STM32F4 I think there were 2x8 DMA streams, with each stream having 8 IRQ source channels that were fixed. Of course you could configure any stream you desire at any time, but some peripherals may have their IRQ mapped to the same DMA stream, which is a pain if you wanted to use both. Look up the table to see what's up.
Nevertheless, a brief explanation:
DMA can be set up to perform memory transfers in block sizes. For this you need a source and destination address, and ofcourse a size (just like memcpy).
What is then needed in addition is a "trigger" for a transfer event. Usually these are mapped onto IRQs of the processor, and are often implemented as a "shadow" to the IRQ controller. I.e. the peripheral must generate an IRQ for DMA to transfer, but the IRQ doesn't have to be enabled on the CPU/NVIC for it to be handled in software. However, as long as the DMA transfer does take care of clearing the IRQ status.
Now it's just like that: you enable a peripheral and enable the IRQ (sometimes specifically for DMA mode, e.g. on STM32), you set up a DMA channel with source/destination and size, and press start. For a peripheral you often create a pointer to the data register. For memory you give it a pointer (e.g. the start of..) an array you want to transmit/receive from/into. You can then wait for DMA to finish or receive an interrupt from the DMA controller that the block transfer was done.
Some further nuances:
- Most DMA peripherals have 3 or 4 operation modes: memory to memory (like a memcpy), memory to peripheral (transmitting data), peripheral to memory (receiving data), peripheral to peripheral (perhaps an odd one).
Some modes like memory to memory can also be ran self-timed, i.e. with no IRQ event, it will just complete as quick as it can.
- Source/destination word width: you need to select how many bytes of data needs to be transferred per event.
- A burst sizes allows you to perform multiple word read/writes per event.
- Auto increment addressing: actually this is often what decides if a source or destination is "peripheral" or "memory". In memory we have arrays, so we need to auto increment the address pointer to proceed. For peripherals the data register will always stay on the same place.
- Take note of the place of DMA in a processor system. DMA is often hooked up somewhere near the CPU and memory, an extremely busy place. If your CPU and DMA buffer is located in the same block of SRAM, you could get contentions and waits in execution (hence a priority for the transfer can be set). This is a good reason why you see such a complex bus switching matrix on the ARM/STM32 CPUs, and also why it's not a bad idea to run as many of the CPU-bound tasks (like heap and stacks in a RTOS) in the TCM SRAM on the STM32 CPUs. But do note; this TCM SRAM is not accessible by the DMA controller, so any transfer will likely fail.
What it could also mean is that you want to manually instruct the C linker (using a C attribute) where you want some of your DMA buffers to reside, so that you can have full control over this.
Some DMA controllers can have even more fancy features; like addressing strides (auto-incrementing address pointer by arbitrary amount of words), automatic CRC calculation that can be read after the transfer was finished, etc.