Author Topic: Looking for Cortex-M3 or M4 with DMA to GPIO capability (Read 28362 times)

tymm · « **Reply #25 on:** February 05, 2015, 03:58:12 am »

Quote from: andersm on February 04, 2015, 08:48:45 am

Quote from: tymm on February 04, 2015, 04:16:42 am
Simplest is if you're DMAing to a whole GPIO port at once; if you're picking and choosing the pins it can get more complicated - though the "bit banding" on many M3's is one way to simplify that.
Most, if not all, manufacturers have ways to set individual GPIO pins. Eg. ST have bit set/reset registers, and NXP have a very unusual method where the LSBs of the address used to access the port become a mask.

sure. but like i said, it gets more complicated... not insanely so, but you need to dig through and match the DMA approach with the register(s) available (and for example can require 2 DMA channels & kill atomicity if you need to access separate set/reset registers).

The low-end ST parts for example have a register that will let you do bit sets & resets as an atomic operation across arbitrary pins on a port, but to use it properly with DMA you'd need to precompute 32 bits per transfer rather than the 16 that you would otherwise write directly to the ODR. Not a hard thing to do at all - just takes an extra XOR per value & larger data type - but it does make for a more complicated setup - and even small amounts of data munging in some cases can even break throughput requirements.

That NXP approach IIRC is basically bit banding, except it's in the address space of the peripheral. I seem to recall NXP also doing some things in that GPIO implementation - at least if it's the same one they used on the LPC11U24 - that made writing decent, low-latency GPIO abstractions a real pain... (ugliness from throwing together assorted registers from multiple GPIO ports into a single block)

andersm · « **Reply #26 on:** February 05, 2015, 07:21:13 am »

Quote from: tymm on February 05, 2015, 03:58:12 am

The low-end ST parts for example have a register that will let you do bit sets & resets as an atomic operation across arbitrary pins on a port, but to use it properly with DMA you'd need to precompute 32 bits per transfer rather than the 16 that you would otherwise write directly to the ODR. Not a hard thing to do at all - just takes an extra XOR per value & larger data type - but it does make for a more complicated setup - and even small amounts of data munging in some cases can even break throughput requirements.

You don't even need the XOR. If the same bit is set in both the "set" and "reset" halves, the pin is set, so you can treat the "reset" as a constant bitmask which can be prefilled in your transfer array.

knik · « **Reply #27 on:** February 05, 2015, 08:59:46 am »

Quote from: Yansi on February 04, 2015, 09:04:31 am

Regarding that ARM is Von Neumann architecture, so it has only one memory space and memory mapped peripherals

No, that's not true.
C-M3 is Harvard architecture, single address space, separate buses.
Some time ago I did some tests on STM32F1 and by surprise code run from sram was slower than run from flash, apparently Harvard.

donotdespisethesnake · « **Reply #28 on:** February 05, 2015, 10:39:38 am »

Quote from: knik on February 05, 2015, 08:59:46 am

Quote from: Yansi on February 04, 2015, 09:04:31 am
Regarding that ARM is Von Neumann architecture, so it has only one memory space and memory mapped peripherals

No, that's not true.
C-M3 is Harvard architecture, single address space, separate buses.
Some time ago I did some tests on STM32F1 and by surprise code run from sram was slower than run from flash, apparently Harvard.

No, on Cortex the memory architecture is definitely von Neumann. The fact you can execute code in "data" RAM demonstrates that (since that is the definition of von Neumann!)

The fact there are multiple buses to improve performance does not mean it is Harvard, because it would work exactly the same with one bus. Only if multiple buses are *required* does it mean the architecture is Harvard, IOW if you can't execute code in Data RAM.

knik · « **Reply #29 on:** February 05, 2015, 06:11:47 pm »

Can you please give more details on your definition of von Neumann architecture.
What I could find clearly indicates C-M3 is not a von Neumann.

ARM infocenter states:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka11516.html

Quote

QUESTION
What is the difference between Harvard Architecture and von Neumann Architecture?
ANSWER
The name Harvard Architecture comes from the Harvard Mark I relay-based computer. The most obvious characteristic of the Harvard Architecture is that it has physically separate signals and storage for code and data memory. It is possible to access program memory and data memory simultaneously. Typically, code (or program) memory is read-only and data memory is read-write. Therefore, it is impossible for program contents to be modified by the program itself.
The von Neumann Architecture is named after the mathematician and early computer scientist John von Neumann. von Neumann machines have shared signals and memory for code and data. Thus, the program can be easily modified by itself since it is stored in read-write memory.

Edit:
Just found another quote from ARM:

Quote

http://www.arm.com/files/pdf/IntroToCortex-M3.pdf:
At the heart of the Cortex-M3 processor is an advanced 3-stage pipeline core, based on the Harvard architecture,

dannyf · « **Reply #30 on:** February 05, 2015, 06:14:25 pm »

Quote

on Cortex the memory architecture is definitely von Neumann.

Maybe it is both?

rsjsouza · « **Reply #31 on:** February 05, 2015, 06:42:49 pm »

In some DSPs I used to call it "modified harvard", as the core itself has multiple buses (to prevent bus contention) and the memory used to have dual port access (to allow code and data to be fetched/stored simultaneously). However, the memory space is unified (von Neumann) as you don't have code/data pages.

Hypernova · « **Reply #32 on:** February 06, 2015, 08:51:58 am »

Quote from: rsjsouza on February 05, 2015, 06:42:49 pm

In some DSPs I used to call it "modified harvard", as the core itself has multiple buses (to prevent bus contention) and the memory used to have dual port access (to allow code and data to be fetched/stored simultaneously). However, the memory space is unified (von Neumann) as you don't have code/data pages.

But sometimes the bus's can still be setup to encourage you to treat it as pure harvard. On the F28335 DSP (C2000 core) I use at work instruction fetches from the 2nd half of RAM incurs a 1-wait state penalty, vice-versa for data fetches from the 1st half.

To stay on the subject of DMAs annoyingly the C2000 DMA bus have no access to GPIO DAT/SET/CLR registers. Last month when I had to do some pattern generation on GPIO pins I ended up commandeering the Ext. Memory interface output register as my data destination and using EMIF's dedicated data pins instead.

knik · « **Reply #33 on:** February 06, 2015, 09:42:27 am »

Quote from: Hypernova

But sometimes the bus's can still be setup to encourage you to treat it as pure harvard

I suppose that was it in my example.
I ran the same code from ram and from flash(2 wait states) and the ram version was slower(single bus).
When run from the slower flash mem it was faster(double bus).

splin · « **Reply #34 on:** February 06, 2015, 04:55:20 pm »

Table 2 from the ST application note AN4566, 'Extending the DAC performance of STM32 microcontrollers' is interesting. It summarises the DMA rates writing from memory to the DAC which is on the APB1 bus. I don't know about DMA to the GPIOs specifically but this should be a good start:

Product

APB max speed

DAC max sampling rate

STM32F0 series

48 MHz

4.8Msps

STM32F100xx

24 MHz

2.4Msps

STM32F101xx
STM32F103xx
STM32F105xx
STM32F107xx

36 MHz

4.5Msps

STM32F2 series

30 MHz

7.5Msps

STM32F3 series

36 MHz

4.5Msps

STM32F40x
STM32F41x

42 MHz

10.5Msps

STM32F42x

45 MHz

11.25Msps

STM32L0

series

32 MHz

4.0Msps

STM32L1

series

32 MHz

3.2Msps

Rasz · « **Reply #35 on:** February 06, 2015, 08:07:29 pm »

Quote from: splin on February 06, 2015, 04:55:20 pm

Table 2 from the ST application note AN4566, 'Extending the DAC performance of STM32 microcontrollers' is interesting. It summarises the DMA rates writing from memory to the DAC which is on the APB1 bus. I don't know about DMA to the GPIOs specifically

APB2 is 2x the speed
I found this
http://forums.leaflabs.com/topic.php?id=774#post-4683

" about 6.7MHz"

dannyf · « **Reply #36 on:** February 06, 2015, 10:17:13 pm »

Quote

Now the problem is, that GPIO<->DMA transfers seem to be largely omitted by chip manufacturers.

It is generally available on chips equiped with DMA.

rsjsouza · « **Reply #37 on:** February 08, 2015, 07:20:52 pm »

Quote from: Hypernova on February 06, 2015, 08:51:58 am

But sometimes the bus's can still be setup to encourage you to treat it as pure harvard. On the F28335 DSP (C2000 core) I use at work instruction fetches from the 2nd half of RAM incurs a 1-wait state penalty, vice-versa for data fetches from the 1st half.

Yes, you are correct for the C28x family. I should have been clearer when I replied; I had in mind other DSP families such as C54x and C55x.

Quote from: Hypernova on February 06, 2015, 08:51:58 am

To stay on the subject of DMAs annoyingly the C2000 DMA bus have no access to GPIO DAT/SET/CLR registers.

Yes, I have heard that is quite an annoying limitation of the C28x DMA. It was also a limitation on C54x and C55x.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Looking for Cortex-M3 or M4 with DMA to GPIO capability (Read 28362 times)

tymm

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

andersm

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

knik

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

donotdespisethesnake

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

knik

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

dannyf

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

rsjsouza

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

Hypernova

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

knik

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

splin

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

Rasz

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

dannyf

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

rsjsouza

Re: Looking for Cortex-M3 or M4 with DMA to GPIO capability

Share me