Author Topic: 32F4 CCM memory - gotchas?  (Read 814 times)

0 Members and 1 Guest are viewing this topic.

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 993
  • Country: gb
  • Doing electronics since the 1960s...
32F4 CCM memory - gotchas?
« on: May 11, 2021, 05:09:49 pm »
My current memory map is

FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 1024K
RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 128K
CCM MEMORY (xrw): ORIGIN = 0x10000000, LENGTH - 64k - not used

The CCM memory cannot be accessed by DMA, so you can have DMA transfers running at full speed while the CPU continues to execute code. So it would seem to be a good place to put the stack (which currently is at the top of RAM). Until someone does a DMA transfer to a buffer allocated inside a function, etc.

What are the other gotchas with using the CCM block?
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3503
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: 32F4 CCM memory - gotchas?
« Reply #1 on: May 11, 2021, 05:16:58 pm »
CCM on STM32F4 is not executable. So the best use for it would be the stack. This would allow different memory uses go different places: stack and non-DMA heap in CCM, DMA heaps in the SRAM blocks.
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 993
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 CCM memory - gotchas?
« Reply #2 on: May 11, 2021, 05:35:14 pm »
Interesting. So...

CCM MEMORY (rw)

One could also put things like serial port comms buffers in there.

It is probably most handy for stuff which doesn't need initialising, because to do that you have to add some asm code to the startup .s file. I found this which describes it
https://www.openstm32.org/Using%2BCCM%2BMemory

Are there any other gotchas with CCM? Is there anything else that needs RAM access, other than the CPU and DMA?

One concern is that the STM ethernet code uses DMA (apparently uses its own dedicated DMA controller) and if it does DMA to a buffer which is on the stack, that will break. And that library is vast.
« Last Edit: May 11, 2021, 05:37:10 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline harerod

  • Regular Contributor
  • *
  • Posts: 223
  • Country: de
  • ee - digital & analog stuff
    • My services:
Re: 32F4 CCM memory - gotchas?
« Reply #3 on: May 11, 2021, 09:32:40 pm »
The most annoying feature is that the CCM and SRAM are not contiguous. The other drawbacks are listed in the manual. What to do with the CCM? Obvious choice is stack and certain data that is handled by the CPU.
In a current project I run freeRTOS with static RAM memory allocation. This goes into the 128kB SRAM segment. The CCM is used for our dynamic file system. Thinking about it, that 407 firmware has one of of the weirdest memory layouts:
128kB FLASH protected memory "Bootloader"
256kB FLASH Application
640kB FLASH Filesystem / low write/erase rate
128kB SRAM HEAP/Task memory
 64kB CCM dynamic Filesystem
Before you speak, let your words pass through three gates: At the first gate, ask yourself “Is it true?” At the second gate ask, “Is it necessary?” At the third gate ask, “Is it kind?” – Rumi
 

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 2854
  • Country: ca
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 993
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 CCM memory - gotchas?
« Reply #5 on: May 12, 2021, 08:24:52 am »
Yes; funny isn't it how the same thing gets rediscovered by different people :)

Also this site https://www.openstm32.org/Using%2BCCM%2BMemory
is wrong. The statement

char in_ccram_buffer[1024] __attribute__((section("ccmram")));

doesn't work. It has to be the other way round

__attribute__((section("ccmram"))) char in_ccram_buffer[1024] ;

or more cleanly

#define CCMRAM __attribute__((section(".ccmram")))
.
.
.
CCMRAM char in_ccram_buffer[1024];

This appnote also talks about executing code (except ISRs, curiously) from CCM
https://www.st.com/resource/en/application_note/dm00083249-use-stm32f3-stm32g4-ccm-sram-with-iar-ewarm-keil-mdk-arm-and-gnu-based-toolchains-stmicroelectronics.pdf

The 192k bug seems to have been fixed in current ST Cube IDE



And using CCM for the stack crashes. I spent some time on this before realising that _eccmram is not the end of the CCM; it is just the end of the data which has been declared as going into the CCM :)

But actually it crashes anyway, even with
   ldr sp, =0x10010000

which is a mystery. It doesn't crash right away; it crashes in the RTOS, so this is something more complicated.
« Last Edit: May 12, 2021, 11:37:41 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 7219
  • Country: fr
Re: 32F4 CCM memory - gotchas?
« Reply #6 on: May 12, 2021, 05:51:58 pm »
Also this site https://www.openstm32.org/Using%2BCCM%2BMemory
is wrong. The statement

char in_ccram_buffer[1024] __attribute__((section("ccmram")));

doesn't work. It has to be the other way round

__attribute__((section("ccmram"))) char in_ccram_buffer[1024] ;

or more cleanly

#define CCMRAM __attribute__((section(".ccmram")))

I really don't think the order makes any difference, at least with GCC. I think I've tried various orders for attributes and never saw a difference.

Now just a note (may be a typo): you're using "ccmram" as the section name, and in the last #define, you're using ".ccmram" (with a dot). I don't remember what the 'section' attribute expects exactly. Should the prefix dot be there or not? Does it make a difference? Anyway, just noted there was a difference in your examples.

When using specific sections with attributes in your code, it's always a good idea to check the .map file (generate it!) and see if everything really went into the right section. Did you?

And using CCM for the stack crashes. I spent some time on this before realising that _eccmram is not the end of the CCM; it is just the end of the data which has been declared as going into the CCM :)

But actually it crashes anyway, even with
   ldr sp, =0x10010000

which is a mystery. It doesn't crash right away; it crashes in the RTOS, so this is something more complicated.

To get out of the unknown, you may want to set up a hard fault handler, and see from there what was the cause of the exception.
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 993
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 CCM memory - gotchas?
« Reply #7 on: May 12, 2021, 07:44:01 pm »
In the linkfile I have

MEMORY
{
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)  : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)     : ORIGIN = 0x10000000, LENGTH = 64K
}

and

.ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
   
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

and in the C source I have

#define CCMRAM __attribute__((section(".ccmram")))

CCMRAM uint8_t inbuf[PKT_BUF_LEN];

and yes this definitely works; chcked in the symbol list and in the memory usage.

Re the crash, it crashes in osKernelStart(); and it ends up here



and the stack backtrace shows



and this is where it does a stack test which fails



But I understand this is not something which can be debugged on a forum :) I am posting it in case somebody has come across this before. The FreeRTOS memory management is complicated and I don't understand it (this project was started by someone else). AIUI the RTOS has various memory management options to suit different usage scenarios and AIUI in ours we give it a 48k heap (actually 32k now because the 48k overflowed the 128k RAM once that was brought down from the 192k bug ;) ) and it then allocates the various thread stack requirements out of that. So you could have say 10 threads each with 3.2k of stack. Well, not quite; for some reason allocating more than ~2k to any thread bombs the system. Also the RTOS code probably assumes the initial stack (which continues to be used in main.c until the RTOS starts, and continues to be used by ISRs) sits in main RAM, so changing _estack = 0x20020000 (top of main 128k RAM) to _estack = 0x10010000 (top of CCM) falls over. Obviously using the heap in an embedded system is generally dumb (unless one always matches the malloc and free, but then why not just do it in a function and have the data on the stack, to be safely dumped when it returns) but in this case the malloc is done only once as each thread is initialised.

And definitely the linkfile calculation to warn if the heap reaches the bottom of the stack



fails. Once I get the SP in CCM running then I will try to fix that.

This is where I am now



I am generally familiar with text data and bss etc from countless previous projects (I go back to before macro-80) but the syntax here is pretty impenetrable at times.

Grateful as always to anyone with the patience to read this stuff :)
« Last Edit: May 12, 2021, 07:51:08 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3503
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: 32F4 CCM memory - gotchas?
« Reply #8 on: May 13, 2021, 02:10:08 am »
1. Have a look at SCB to find out why it crashed. You should also turn on separate abort exceptions so things can better be distinguished.
2. What is the address of taskSELECT_HIGHEST_PRIORITY_TASK?
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 993
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 CCM memory - gotchas?
« Reply #9 on: May 13, 2021, 05:17:42 am »
 800d032:   4b23         ldr   r3, [pc, #140]   ; (800d0c0 <vTaskSwitchContext+0xa0>)
 800d034:   2200         movs   r2, #0
 800d036:   601a         str   r2, [r3, #0]
      taskSELECT_HIGHEST_PRIORITY_TASK();
 800d038:   4b22         ldr   r3, [pc, #136]   ; (800d0c4 <vTaskSwitchContext+0xa4>)
 800d03a:   681b         ldr   r3, [r3, #0]
 800d03c:   9303         str   r3, [sp, #12]

It bombs in prvPortStartFirstTask.
« Last Edit: May 13, 2021, 05:32:06 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3503
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: 32F4 CCM memory - gotchas?
« Reply #10 on: May 13, 2021, 09:05:01 am »
800d032:   4b23         ldr   r3, [pc, #140]   ; (800d0c0 <vTaskSwitchContext+0xa0>)
 800d034:   2200         movs   r2, #0
 800d036:   601a         str   r2, [r3, #0]
      taskSELECT_HIGHEST_PRIORITY_TASK();
 800d038:   4b22         ldr   r3, [pc, #136]   ; (800d0c4 <vTaskSwitchContext+0xa4>)
 800d03a:   681b         ldr   r3, [r3, #0]
 800d03c:   9303         str   r3, [sp, #12]

It bombs in prvPortStartFirstTask.
How does the registers look?
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 993
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 CCM memory - gotchas?
« Reply #11 on: May 13, 2021, 12:30:57 pm »
Normal, PC and SP etc.



The stack has been decremented only very slightly.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 4217
  • Country: fi
Re: 32F4 CCM memory - gotchas?
« Reply #12 on: May 13, 2021, 12:59:51 pm »
Look at the registers in the SCB block. Specifically, if you are hardfaulting, you want to look at the HFSR, which will tell you why it hardfaulted. Usually the reason is something else that escalated into hardfault though so you need to go further.

This is all described in ARM core manuals, go step by step to actually find the reason. It's a bit daunting first; it may take hours to days to learn how to find the reason (and finding the cause usually takes a few steps), but once you get to the actual reason, the fix is usually a 10 second job.

This seems quite usable tutorial regarding fault finding: https://interrupt.memfault.com/blog/cortex-m-fault-debug
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 993
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 CCM memory - gotchas?
« Reply #13 on: May 13, 2021, 03:09:00 pm »
I suspect there is something falling over in FreeRTOS, around here



The MSP and PSP are pointing at the CCM and the main RAM respectively, which is what I would perhaps expect, because the RTOS gives each thread a stack allocated in the heap which is in main RAM.

taskCHECK_FOR_STACK_OVERFLOW is not defined (the stuff is in stack_macros.h)

xpsr
   Hex:0x81000003
   Decimal:-2130706429
   Octal:020100000003
   Binary:10000001000000000000000000000011


I wonder if the RTOS is expecting the entry stack to be above the heap? The heap could go into the CCM, perhaps.

In the linker script there is this test which will obviously not work for the stack being in CCM




but I don't know how to fix this, or how to move the heap (32k currently) into CCM. Well, perhaps the order in the linkfile is what determines that (heap going after bss, which would be normal) so I did this



and hey... it is running! 10k left in CCM.

The gotcha is that DMA cannot be used for anything on the main stack, the RTOS process stacks (which come out of the heap), or the heap!

There is a remaining issue that when I look at some RTOS process, I see the process stack is still in main RAM. No idea how it is getting there; the RTOS is supposed to be allocating process stacks out of the heap.



That has been solved. I discovered the original programmer left a second 48k heap inside the RTOS as a static, IOW allocated out of bss:



which also explains the huge BSS usage... That RTOS stuff was supposed to be malloced out of the main heap.
« Last Edit: May 13, 2021, 06:16:39 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 993
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 CCM memory - gotchas?
« Reply #14 on: May 14, 2021, 07:54:18 am »
Well, I got it all going now. Thank you all.

There is a dramatic performance improvement too. Since I am not currently using ethernet (which has its own DMA controller and uses static buffers in BSS) this is probably due to the CPU doing stuff like cache filling without affecting data operations. On one metric there is a 5x speedup.

It does look like FreeRTOS doesn't like its own stack memory (which is now allocated out of a 48k static buffer sitting in CCM) to be in a different section (lower address) than the general stack (which is also now in CCM). I was seeing the MSP at 0x1... and the PSP at 0x2... and it didn't like that.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3503
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: 32F4 CCM memory - gotchas?
« Reply #15 on: May 14, 2021, 10:00:50 am »
There is a dramatic performance improvement too. Since I am not currently using ethernet (which has its own DMA controller and uses static buffers in BSS) this is probably due to the CPU doing stuff like cache filling without affecting data operations. On one metric there is a 5x speedup.
This is very expected, as by moving stacks and some of the heap into CCM, you killed a lot of CPU-DMA bus contention.

It does look like FreeRTOS doesn't like its own stack memory (which is now allocated out of a 48k static buffer sitting in CCM) to be in a different section (lower address) than the general stack (which is also now in CCM). I was seeing the MSP at 0x1... and the PSP at 0x2... and it didn't like that.
You may want to file this as a bug.
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 993
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 CCM memory - gotchas?
« Reply #16 on: May 14, 2021, 01:36:18 pm »
Even with no DMA, the speedup could presumably be due to cache refilling not blocking access to the CCM for stack accesses.

The price paid for this is that the CCM is not executable :)

Re my above comment about the RTOS, it actually seems to work now. The issue was with _sbrk which assumed heap and stack are adjacent, and did a trap if the two met.

There are so many pitfalls with using CCM because all the tools make the above assumption. The linker script likewise.
« Last Edit: May 14, 2021, 07:42:13 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3503
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: 32F4 CCM memory - gotchas?
« Reply #17 on: May 15, 2021, 01:00:06 pm »
Even with no DMA, the speedup could presumably be due to cache refilling not blocking access to the CCM for stack accesses.
I don't think so, as the cache-equipped Flash interfaces and the SRAM blocks are different downstream ports on the bus matrix.

The price paid for this is that the CCM is not executable :)
Not on those older STM32F4 parts, but for newer parts like STM32F3, G4 and F7 the CCM is executable.

There are so many pitfalls with using CCM because all the tools make the above assumption. The linker script likewise.
It is why I have multiple implementations for _sbrk for my projects to fit the needs of this.
 
The following users thanked this post: thm_w


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf