Author Topic: Cortex M7 and DMA (Read 8556 times)

luiHS · « **on:** May 23, 2018, 07:57:44 pm »

Hi.

I currently work with Cortex M4, STM32 and Kinetis, but I need a more powerful microcontroller, a Cortex M7.

A while ago I tried Atmel SAM S70, but I did not like it because I did not find good information to use and configure DMA. It really is my main problem with most microcontrollers, poor or no information on how to use and configure the DMA. For me, the DMA is essential to make my applications, and I do not understand why it is so difficult to find information and examples of sources to use it, mainly by SPI and direct GPIO.

About Cortex M7, I have thought about testing the STM32H7 and the new NXP i.MX RT1020 that will be available in June, as soon as it is available I want to buy an evaluation board or develop my own board, now I have one with the RT1050, but to make and manufacture my board with the Neoden4 that I bought, I need a TQFP (with BGA the boards come out too expensive).

Can anyone suggest where to find good information or tutorial on DMA for STM32 or Kinetis? I suppose that for RT1020 it will be similar to Kinetis, I hope that this time NXP will provide better technical documents to be able to use the DMA.

According to your experience, which Cortex M7 would be a good choice, for features, available documentation, tutorials, example sources, etc ...?

The Cortex M7 that I have thought of trying would be these. All of the ARM world, to have the free C / C ++ compiler, that leaves out the Microchip products. And all with TQFP available, to be able to make cheap PCBs.

1.- STM32H7
2.- Atmel SAM S7x, E7x, V7x
3.- NXP i.MX RT1020 (my preferred option)

ataradov · « **Reply #1 on:** May 23, 2018, 08:11:18 pm »

You can use (and I would recommend doing so) LaunchPad build of the GCC, it is totally free. Also, SAM are Microchip products now.

The rest is really a matter of preference. Configuring DMA is such a simple thing, that I don't understand how it is a stopping point for selecting a micro.

JPortici · « **Reply #2 on:** May 23, 2018, 08:33:30 pm »

Why not consider PIC32MZ? They are equally poweful and the DMA engine is very flexible.
Plus, you can use -O1 in free mode and bypassing the license is very easy anyway.
They are also available in TQFP, with a ton of peripherals and memory, too.

Otherwise, i would probably go with NXP. In my experience STM and and Atmel had strange gotchas and/or crap/wrong/hard to understand documentation

luiHS · « **Reply #3 on:** May 23, 2018, 08:41:51 pm »

Quote from: JPortici on May 23, 2018, 08:33:30 pm

Why not consider PIC32MZ? They are equally poweful and the DMA engine is very flexible.
Plus, you can use -O1 in free mode and bypassing the license is very easy anyway.
They are also available in TQFP, with a ton of peripherals and memory, too.

Otherwise, i would probably go with NXP. In my experience STM and and Atmel had strange gotchas and/or crap/wrong/hard to understand documentation

For quite some time I decided not to use Microchip PIC32 microcontrollers anymore, the compiler is not free and it is very expensive, I can not work in demo mode without optimization available. In addition their evaluation boards are extremely expensive compared to the Discovery of ST or the Freedom of NXP / Kinetis.

Currently I work only with ARM and I am very happy, mainly with Kinetis MK66, and also some developments with STM32, both with very good tools for the development with Cubemx and MCUExpresso, all free. And their evaluation boards are very cheap.

I am waiting for the NXP RT1020 in June, it looks like a very powerful microcontroller, Cortex M7 500Mhz, with TQFP available.

luiHS · « **Reply #4 on:** May 23, 2018, 08:44:31 pm »

Quote from: ataradov on May 23, 2018, 08:11:18 pm

You can use (and I would recommend doing so) LaunchPad build of the GCC, it is totally free. Also, SAM are Microchip products now.

The rest is really a matter of preference. Configuring DMA is such a simple thing, that I don't understand how it is a stopping point for selecting a micro.

Can you share where to find good information, tutorials and example sources, on how to configure and use DMA for STM32, NXP Kinetis or Atmel SAM S70?.

I searched a lot and found hardly anything useful, that explains it clearly and in detail.

ataradov · « **Reply #5 on:** May 23, 2018, 10:30:04 pm »

Quote from: luiHS on May 23, 2018, 08:44:31 pm

Can you share where to find good information, tutorials and example sources, on how to configure and use DMA for STM32, NXP Kinetis or Atmel SAM S70?.

Here is example of memory to memory and memory to peripheral transfers for V7x/S7x/E7x.

This is taken from some test code, and I don't remember exactly what I was testing, but this illustrates the idea nicely. And the details can be worked out as needed.

luiHS · « **Reply #6 on:** May 23, 2018, 10:39:00 pm »

Quote from: ataradov on May 23, 2018, 10:30:04 pm

Quote from: luiHS on May 23, 2018, 08:44:31 pm
Can you share where to find good information, tutorials and example sources, on how to configure and use DMA for STM32, NXP Kinetis or Atmel SAM S70?.
Here is example of memory to memory and memory to peripheral transfers for V7x/S7x/E7x.

Thanks for the source.

Do you know any tutorial that explains in detail how the DMA is configured and used ?, I've been looking for information for STM32 and Kinetis for a long time, and, if possible, for Atmel SAM.

The only good information I have about DMA in STM32 is in the book MASTERING STM32 by Carmine Noviello. I also bought several books about Kinetis and Atmel SAM, from Mazidi, but none of them dealt with the DMA, I contacted the author and he recognized that it was a pending issue and that they would include it in future editions.

I have not found any detailed tutorial on how to configure and use the DMA in the microcontrollers I use regularly, STM32, Kinetis and Atmel SAM S70. This is the reason why I discarded using the Atmel SAM S70, which initially I liked a lot, are powerful and cheap, but no tutorial on the DMA.

Of the Kinetis, the little that I know, is by Teensy's sources and libraries, I did not find anything of the NXP itself to use it in MCUExpresso.

ataradov · « **Reply #7 on:** May 23, 2018, 10:43:56 pm »

The datasheet and some experimentation were sufficient for writing this code. I don't know any other sources, I have not needed any additional help for this.

Geoff_S · « **Reply #8 on:** May 24, 2018, 03:08:22 am »

Quote from: luiHS on May 23, 2018, 10:39:00 pm

Of the Kinetis, the little that I know, is by Teensy's sources and libraries, I did not find anything of the NXP itself to use it in MCUExpresso.

I would recommend building and installing the MCUXpresso SDK for your board/processor, and then importing some of the SDK examples. They have many examples for DMA including use of the DMA muxer, for memory-peripheral and memory-memory transfers.

While the SDK is an abstraction away from the bare metal, it is easy to examine the source code to see what it is doing with registers etc.

JPortici · « **Reply #9 on:** May 24, 2018, 05:44:53 am »

Quote from: luiHS on May 23, 2018, 08:41:51 pm

the compiler is not free

True

Quote

and it is very expensive

I recently had a look at the subscription license, it is not that costly, it takes about 5 years of subscription in order for it to cost as the permanent license

Quote

I can not work in demo mode without optimization available.

except that in demo mode.. there is no demo mode. there is Free mode. and -O1 is available, so there are optimizations. in the compiler manual there is a list of what gets enabled per each optimization level.
Then again, you can bypass the license without much effort with the "specs.txt" method, but if you don't want to i understand completely, as i don't do it myself.

Quote

In addition their evaluation boards are extremely expensive compared to the Discovery of ST or the Freedom of NXP / Kinetis.

for which part? i don't remember the H7 discovery board being that cheap.
SSTM32-F7 discovery is 76€ for example. Did you mean nucleo? They are cheap and so are Olimex's boards, such as https://www.olimex.com/Products/PIC/Development/PIC32-HMZ144/open-source-hardware (22 euro)
Same for NXP.

Just wanted to set the story straight. I am myself waiting for the new i.MX-RT to play

ataradov · « **Reply #10 on:** May 24, 2018, 05:47:41 am »

I don't understand why anyone would pay a cent for an ARM compiler. Here https://launchpad.net/gcc-arm-embedded , there is your totally free compiler for all platforms built by ARM. What else do you need?

julianhigginson · « **Reply #11 on:** May 24, 2018, 06:21:41 am »

Are you using HAL with the STM32? I find the DMA stuff I have setup in HAL for the STM32F767 is very simple and seems to work well. (unlike SPI...)

JPortici · « **Reply #12 on:** May 24, 2018, 12:02:29 pm »

Another M7 contender i didn't know about
Kinetis KV5x - https://www.nxp.com/products/processors-and-microcontrollers/arm-based-processors-and-mcus/kinetis-cortex-m-mcus/v-seriesreal-time-ctlm0-plus-m4-m7/kinetis-kv5x-240-mhz-motor-control-and-power-conversion-ethernet-mcus-based-on-arm-cortex-m7:KV5x

luiHS · « **Reply #13 on:** May 24, 2018, 04:45:12 pm »

Quote from: JPortici on May 24, 2018, 12:02:29 pm

Another M7 contender i didn't know about
Kinetis KV5x - https://www.nxp.com/products/processors-and-microcontrollers/arm-based-processors-and-mcus/kinetis-cortex-m-mcus/v-seriesreal-time-ctlm0-plus-m4-m7/kinetis-kv5x-240-mhz-motor-control-and-power-conversion-ethernet-mcus-based-on-arm-cortex-m7:KV5x

KV5 is for motor control, I need mine for multimedia applications, so need SDIO for micro SD card. And probably will have another features not available in KV55.

luiHS · « **Reply #14 on:** May 24, 2018, 04:46:55 pm »

Quote from: julianhigginson on May 24, 2018, 06:21:41 am

Are you using HAL with the STM32? I find the DMA stuff I have setup in HAL for the STM32F767 is very simple and seems to work well. (unlike SPI...)

For STM32, I use Workbench (Eclipse) + Cubemx, for Kinetis I work with Teensy and I can also work with MCUExpresso.
The question is where to find good information with details on how to configure everything to work with DMA, a tutorial with examples.

For STM32, I really have Carmine Noviello's book, it is a very good and very complete book, it practically treats everything in a practical and very detailed way. I can not find any tutorials for Kinetis, although I have examples of Teensy libraries. But for Atmel SAM I have practically nothing, just a few examples of Atmel Studio, but no tutorial, I would have liked to work with these micros, but the lack of documentation on DMA made me discard it.

Some examples of code sources can help a little, but really the useful thing is a tutorial that explains each parameter in detail and with an example. Sometimes I read sources, but for DMA there are many parameters, not documented in the sources.

ehughes · « **Reply #15 on:** May 24, 2018, 05:31:54 pm »

I am doing quite a bit with the i.MX RT.

A couple things

1.) It is the fastest M7 available today and is the future of the M7 implementations at NXP. It is also built on a much smaller process node (as compared to just about every other MCU on the market) which means that is faster/lower power but also will always be a flashless part.

2.) Just as a warning: It is the most difficult M7 to get started with. It sounds from your questions that you are not used to configuring hardware on your own from the manual.

This device was built by the i.MX applications processor team. There is quite a bit of IP from the i.MX6 and i.MX7 families. The manual is 3600 pages and there a ton of details. It is a very powerful and flexible part so it also takes some time to get up to speed. There are many boot options, execution options, etc.

3.) You mentioned QFP for low cost board. Depending on your boot options, you will need QSPI or Hyper Flash (which is an 8-bit DDR interface). Both require controlled impedance and matched trace lengths for high speed operation. Using the QFP part still means a minimum 4-layer board with proper reference planes and skinny traces to get the required transmission line impedance. You may not be saving as much money as you think going with the QFP.

4.) If you are executing out of the external flash, there is a cache and it will need to tuned for your use case. Fastest execution is from I-TCM only. The Ram controller allows different mixes of I-TCM & D-TCM and OC-RAM.
It is a much faster clock but there are many other aspects to execution speed.

luiHS · « **Reply #16 on:** May 24, 2018, 06:05:27 pm »

Quote from: ehughes on May 24, 2018, 05:31:54 pm

2.) Just as a warning: It is the most difficult M7 to get started with. It sounds from your questions that you are not used to configuring hardware on your own from the manual.

To configure the hardware, I work with software assistants, Cubemx for STM32, MCUExpresso Config Tools for Kinetis. I think that for Atmel SAM with Atmel Studio also provides some wizard to configure all the hardware, without needing to write the source code for it.

Quote

This device was built by the i.MX applications processor team. There is quite a bit of IP from the i.MX6 and i.MX7 families. The manual is 3600 pages and there a ton of details. It is a very powerful and flexible part so it also takes some time to get up to speed. There are many boot options, execution options, etc.

I do not like much the data sheet, generally they are very hard documents, and nothing practical, without a single example that is what helps the most to understand something. I prefer the tutorials. I only check datasheet, for some details about some parameter, but never to understand how work or configure something.

Quote

3.) You mentioned QFP for low cost board. Depending on your boot options, you will need QSPI or Hyper Flash (which is an 8-bit DDR interface). Both require controlled impedance and matched trace lengths for high speed operation. Using the QFP part still means a minimum 4-layer board with proper reference planes skinny traces to get the required transmission lines. You may not be saving as much money as you think going with the QFP.

Most of my current PCBs are quite large, at 2 layers, converting them to 4 layers would greatly increase the cost, it is not something I plan to do.

I understand that when working with high speed signals using a flash or external RAM, it is necessary that the length of the tracks is the same in the signals of a parallel bus, I do not know if same consideration in a QSPI bus (not I have used it yet). In any case I have no problems designing the circuitry considering those features, I have enough experience with Eagle and Altium, both help with assistants to do it (differencial pair signals and meanders).

luiHS · « **Reply #17 on:** May 24, 2018, 06:46:08 pm »

I just checked the QSPI, I see that there are 4 data lines instead of 2 for SPI. Therefore, clock, chip select and four data lines. I understand that for high speed lines, the 6 tracks must have the same length from the i.MX microcontroller to the external QSPI flash memory.

The data sheet of RT1020 is not yet available, but I can check the data sheet and the RT1050 schematic. This works with a flash memory IS25WP064AJBLE QSPI, I will use the same on my custom board with RT1020 when it is available in June.

I am very interested in RT1020 to replace all my current Cortex M4, this microcontroler seems very powerful and cheap, it is my preferred choice of Cortex M7. I hope that NXP does not have stock problems as it has been happening for some time with the Kinetis, which are increasingly difficult to obtain. The price in AVNET of the RT1020 seems really very cheap, although have to add the cost of the external memory QSPI or Hyperflash.

ataradov · « **Reply #18 on:** May 24, 2018, 06:48:53 pm »

QSPI is not really "fast" by modern standards. If your traces are under 2" long, I would not worry about trace length matching.

luiHS · « **Reply #19 on:** May 24, 2018, 08:26:59 pm »

Quote from: ataradov on May 24, 2018, 06:48:53 pm

QSPI is not really "fast" by modern standards. If your traces are under 2" long, I would not worry about trace length matching.

Well, thanks. In any case, I will design with the coincidence of the length of the tracks for the QSPI flash, it is not difficult to do it, so I verify the meander function of Eagle. I have never used it before because my designs are all with Cortex M4, with a maximum clock speed of 180Mhz, and with internal flash.

I will probably also design a second board with Hyperflash memory, to test both. This design I understand that will be more critical in the coincidence of the length of tracks of the data bus, I do not know if also in the control lines.

I think, that RT1020/RT1050 also allow start and run software from SD card (SDIO parallel 4 lines), I will test too.

Siwastaja · « **Reply #20 on:** May 25, 2018, 10:56:17 am »

Quote from: luiHS on May 23, 2018, 07:57:44 pm

Can anyone suggest where to find good information or tutorial on DMA for STM32

The reference manual. In STM32, the DMA is fairly straightforward and just works; quite sure it's very similar on most ARM MCUs. Not too many registers to configure; set the source and destination addresses, data sizes, num of transfers, and the DMA request channel numbers, and off you go. There are some small catches like the need to clear interrupt flags before enabling the DMA on STM32, but it's still about 5-10 lines of code total to make it work.

If it takes more than 30 minutes, or more than 10 lines of code, with any library/framework/whatever, the library fails its only job (making things easier to use than without it), and can be simply ditched.

For me, it's hard to believe the lack of information you describe. DMAs should be documented in the reference manuals for any ARM MCU vendor; maybe some Chinese ultra cheap or otherwise "weird" processors might be an exception.

The most difficult aspect is working around DMA request matrix connectivity limitations (depending on MCU, of course; I had quite hard time on STM32F205 recently). If you only need DMA for one peripheral only, everything's simple, but if you need multiple (simultaneously), you need to extend your PCB design time considerations so that you not only look which peripheral is available at which pins, but also whether these peripherals can all work on DMA simultaneously, to avoid ugly surprises when your design is already built and you are developing the software. For example, USART1 and SPI1 might not be available through DMA simultaneously, but USART2 and SPI1 would work; this needs to be thought over during schematic and layout capture.

ataradov · « **Reply #21 on:** May 25, 2018, 04:23:21 pm »

I fully agree with a previous poster. All DMAs are pretty much the same and have been since the beginning of time. All you need to do is just go over all registers one by one and set appropriate values. There will be only one or two registers that need any amount of thinking (configuration registers) the rest are just pointers to various memory locations.

MT · « **Reply #22 on:** May 26, 2018, 11:46:05 pm »

Quote from: julianhigginson on May 24, 2018, 06:21:41 am

Are you using HAL with the STM32? I find the DMA stuff I have setup in HAL for the STM32F767 is very simple and seems to work well. (unlike SPI...)

It's the opposite. Besides F4 DMA is seriously inflexible when you want to do some more advanced configs using autonomous data movements using other peripherals.

hans · « **Reply #23 on:** May 27, 2018, 07:56:51 am »

Yep.. but to defense of the OP that is if you know what you're doing and are fully familiar with theory of operation. I've worked with DMA's on various ARM processors and PIC32, and once you get the hang of each family of processor (e.g. documentation writing style) they are all equally simple to setup.

One key to thing to note is to always check what DMA IRQs are supported on which channels. I know that for PIC32 this could be any source so was never an issue, but on STM32F4 I think there were 2x8 DMA streams, with each stream having 8 IRQ source channels that were fixed. Of course you could configure any stream you desire at any time, but some peripherals may have their IRQ mapped to the same DMA stream, which is a pain if you wanted to use both. Look up the table to see what's up.

Nevertheless, a brief explanation:

DMA can be set up to perform memory transfers in block sizes. For this you need a source and destination address, and ofcourse a size (just like memcpy).
What is then needed in addition is a "trigger" for a transfer event. Usually these are mapped onto IRQs of the processor, and are often implemented as a "shadow" to the IRQ controller. I.e. the peripheral must generate an IRQ for DMA to transfer, but the IRQ doesn't have to be enabled on the CPU/NVIC for it to be handled in software. However, as long as the DMA transfer does take care of clearing the IRQ status.
Now it's just like that: you enable a peripheral and enable the IRQ (sometimes specifically for DMA mode, e.g. on STM32), you set up a DMA channel with source/destination and size, and press start. For a peripheral you often create a pointer to the data register. For memory you give it a pointer (e.g. the start of..) an array you want to transmit/receive from/into. You can then wait for DMA to finish or receive an interrupt from the DMA controller that the block transfer was done.

Some further nuances:

- Most DMA peripherals have 3 or 4 operation modes: memory to memory (like a memcpy), memory to peripheral (transmitting data), peripheral to memory (receiving data), peripheral to peripheral (perhaps an odd one).
Some modes like memory to memory can also be ran self-timed, i.e. with no IRQ event, it will just complete as quick as it can.

- Source/destination word width: you need to select how many bytes of data needs to be transferred per event.

- A burst sizes allows you to perform multiple word read/writes per event.

- Auto increment addressing: actually this is often what decides if a source or destination is "peripheral" or "memory". In memory we have arrays, so we need to auto increment the address pointer to proceed. For peripherals the data register will always stay on the same place.

- Take note of the place of DMA in a processor system. DMA is often hooked up somewhere near the CPU and memory, an extremely busy place. If your CPU and DMA buffer is located in the same block of SRAM, you could get contentions and waits in execution (hence a priority for the transfer can be set). This is a good reason why you see such a complex bus switching matrix on the ARM/STM32 CPUs, and also why it's not a bad idea to run as many of the CPU-bound tasks (like heap and stacks in a RTOS) in the TCM SRAM on the STM32 CPUs. But do note; this TCM SRAM is not accessible by the DMA controller, so any transfer will likely fail.
What it could also mean is that you want to manually instruct the C linker (using a C attribute) where you want some of your DMA buffers to reside, so that you can have full control over this.

Some DMA controllers can have even more fancy features; like addressing strides (auto-incrementing address pointer by arbitrary amount of words), automatic CRC calculation that can be read after the transfer was finished, etc.

luiHS · « **Reply #24 on:** May 28, 2018, 07:03:23 am »

Hello hans.

What documentation did you use to learn how to configure the DMA in each microcontroller, the Reference Manual, Source Code examples ...?

My only current good documentation is the book MASTERING STM32 for the STM32, for the Kinetis I only have some Teensy sources, and for Atmel SAM S70 I can not find anything.

I hope to have more luck with the NXP RT1020 using MCUExpresso, which are about to go on sale. At least for June the data sheet should be available, and for June or July the chip.

Best Regards


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Cortex M7 and DMA (Read 8556 times)

Share me