Electronics > Microcontrollers

Learning the STM32H745ZI dual core microcontroller

(1/2) > >>

PDP-1:
Intro
I'm working on bringing up a dual core STM32H745 chip on my own custom PCB, starting with developing code on a Nucleo-H745ZI-Q dev board where I hope I can trust that the hardware works and then porting over to my board once the code is in a workable state. I'm not really a software guy by training and have only muddled around with the F429 series of STM before so I thought it might be useful to post the Nucleo code on GitHub and talk about it here in case anyone wants to follow along and give advice/feedback. I've also never used GitHub before so that will be a learning project too!

My toolchain consists of Visual Studio with the VisualGDB plug-in to run the programmer/debugger. I'm working mostly at a bare metal register level because I find that I often spend as much time trying to figure out what the auto-generated HAL code is doing as I would spend by just RTFM and working it out on my own. I do sometimes use STMCube spit out some code and then comb through it and reduce it down from pages of text to the few lines that actually do something.

Anyway, I got the Nucleo board to Blinky state on both cores today and made my first ever two GitHub repos, one for the M4 core and one for the M7 core. So far so good!


Processor Overview
The STM32H745ZI has three main parts:

* A Cortex M7 core that can run up to 480MHz with dual-precision floating point math
* A Cortex M4 core that can run up to 240MHz with single-precision floating point math
* A region of shared memory that both cores can access to talk to each other My reason for choosing this processor is that I'd like to have the M7 core run in almost interrupt-less mode running a hardware control algorithm in a very deterministic way while the M4 core takes care of all of the messy and unpredictable stuff like talking to the outside world and monitoring the data coming out of the M7 to make sure the system is running as intended. I was a bit worried about if I could pull that off with a single core F429 processor, plus the dual precision floating point unit on the M7 really eases some concerns about loss of precision when the control loop is running fast.


Getting to Blinky x2
Sysprogs, the developers of VisualGDB provide this guide on how to work with dual core processors. Basically it involves running two instances of Visual Studio, each containing a project for one of the cores. This works but doesn't feel like the best solution since (a) it's a bit cumbersome swapping between the instances all the time and (b) when we get to the point where the two processors start talking to each other they'll need to agree on what the messages being passed back and forth mean and that will require somehow coordinating a shared set of code files between the projects. I tried to make one Visual Studio solution containing three projects for the M4/M7/common stuff and it kind-of worked but the debugger often got confused between the two cores and crashed. So two instances of Visual Studio it is for now.

I made two instances of Visual Studio, made two empty code projects, and pulled in all of the startup files/linker scripts/etc. into each for the relevant core. (I really hate relying on outside libraries or referenced files to maintain that stuff, I like to have it all under my control.) I pulled in my own GPIO drivers from another project an had each core drive a different LED on the Nucleo on and off in a very simple way. It worked!

Next up will be starting the chip up for real, ramping up the clock tree, etc.

jnk0le:

--- Quote from: PDP-1 on June 22, 2024, 02:46:51 am ---My reason for choosing this processor is that I'd like to have the M7 core run in almost interrupt-less mode running a hardware control algorithm in a very deterministic way while the M4 core takes care of all of the messy and unpredictable stuff like talking to the outside world and monitoring the data coming out of the M7 to make sure the system is running as intended. I was a bit worried about if I could pull that off with a single core F429 processor,

--- End quote ---

Jitter wise cortex-m7 is less predictable/deterministic than cortex-m4.
Also, interrupts on cortexm are quite deterministic with a bit of jitter from tail chaining, late arrival, etc. and of course, more from memory subsystem. (caches, waitstates etc.)


--- Quote from: PDP-1 on June 22, 2024, 02:46:51 am ---plus the dual precision floating point unit on the M7 really eases some concerns about loss of precision when the control loop is running fast.
--- End quote ---

https://eprint.iacr.org/2022/405.pdf


--- Quote ---The non-constant timeness was clearly observed when generating two random
double-precision values for addition, with an average runtime of 16 clock cycles
and standard deviation of 4.1. However, when we generated random values in
the same range such they had the same exponents, the runtimes were constant
and consistant at 10 clock cycles. Moreover, when we mixed randomness from
two fixed exponent ranges we observed constant and consistant runtimes of 19
clock cycles.
--- End quote ---

Now your determinism goes out of the window.

PDP-1:

--- Quote from: jnk0le on June 22, 2024, 01:12:07 pm ---Jitter wise cortex-m7 is less predictable/deterministic than cortex-m4.

--- End quote ---
Interesting, is there any known reason why that is? Maybe just the extra complexity of the M7 requiring more clock cycles to coordinate across all the different clock domains inside the chip?


--- Quote from: jnk0le on June 22, 2024, 01:12:07 pm ---Also, interrupts on cortexm are quite deterministic with a bit of jitter from tail chaining, late arrival, etc. and of course, more from memory subsystem. (caches, waitstates etc.)

--- End quote ---
I had concerns about the memory latency, this chip has a semi complex memory layout with a bunch of potential bus masters on each one. If you got a cache miss and had to go get some data and someone else owned the bus you'd get stuck waiting for a while.

The M7 core does have a reasonably large DTCM and ITCM area at 64k each. My guess at this early stage in development is that they will be more than large enough to hold all of the time critical runtime code which should help a lot. There will always be times when you need to go access the other shared memory areas though.


--- Quote from: jnk0le on June 22, 2024, 01:12:07 pm ---[floating point timing] Now your determinism goes out of the window.

--- End quote ---
This is good info, thanks! I did a quick estimate assuming the M7 is running at 480MHz, the loop is iterating at 100kHz, and we do 100 floating point operations per iteration. With those numbers if you got all 'good' calculations at 10 cycles, vs getting all 'bad' calculations at 19 cycles you have a range of spending 20-40% of your loop iteration time in the FPU. In the realistic case you'd get some good and some bad exponents so you'd be jittering around within that range, but since the calculations you'd be doing every cycle would be the same the real jitter would likely be over a narrower band. Still, that isn't insignificant and is definitely a thing to watch.

jnk0le:

--- Quote from: PDP-1 on June 22, 2024, 04:02:54 pm ---
--- Quote from: jnk0le on June 22, 2024, 01:12:07 pm ---Jitter wise cortex-m7 is less predictable/deterministic than cortex-m4.

--- End quote ---
Interesting, is there any known reason why that is? Maybe just the extra complexity of the M7 requiring more clock cycles to coordinate across all the different clock domains inside the chip?

--- End quote ---
Mostly branch predictor.

Accessing uncached peripherals in lower clocked domains increases the (clock relative) latency.


--- Quote from: PDP-1 on June 22, 2024, 04:02:54 pm ---
--- Quote from: jnk0le on June 22, 2024, 01:12:07 pm ---Also, interrupts on cortexm are quite deterministic with a bit of jitter from tail chaining, late arrival, etc. and of course, more from memory subsystem. (caches, waitstates etc.)

--- End quote ---
I had concerns about the memory latency, this chip has a semi complex memory layout with a bunch of potential bus masters on each one. If you got a cache miss and had to go get some data and someone else owned the bus you'd get stuck waiting for a while.

The M7 core does have a reasonably large DTCM and ITCM area at 64k each. My guess at this early stage in development is that they will be more than large enough to hold all of the time critical runtime code which should help a lot. There will always be times when you need to go access the other shared memory areas though.

--- End quote ---

M4 can use D2 memories as its own ITCM/DTCM


--- Quote from: PDP-1 on June 22, 2024, 04:02:54 pm ---
--- Quote from: jnk0le on June 22, 2024, 01:12:07 pm ---[floating point timing] Now your determinism goes out of the window.

--- End quote ---
This is good info, thanks! I did a quick estimate assuming the M7 is running at 480MHz, the loop is iterating at 100kHz, and we do 100 floating point operations per iteration. With those numbers if you got all 'good' calculations at 10 cycles, vs getting all 'bad' calculations at 19 cycles you have a range of spending 20-40% of your loop iteration time in the FPU. In the realistic case you'd get some good and some bad exponents so you'd be jittering around within that range, but since the calculations you'd be doing every cycle would be the same the real jitter would likely be over a narrower band. Still, that isn't insignificant and is definitely a thing to watch.

--- End quote ---

Note that this paper didn't explicitly state anything about denormals. Which is usually expected to be even worser than "different exponents"

Sal Ammoniac:
Have you looked into ST's own development tool, STM32CubeIDE? I'd suspect they have a better dual core debugging solution than VisualGDB.

I haven't looked into this option yet myself, but I do have a Nucleo-H745ZI-Q sitting around waiting for me to get to it and I'll be following this thread closely.

Navigation

[0] Message Index

[#] Next page

There was an error while thanking
Thanking...
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod