Author Topic: 32f4 arm32 huge stack frame 0x250+ bytes - why?  (Read 486 times)

0 Members and 1 Guest are viewing this topic.

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3698
  • Country: gb
  • Doing electronics since the 1960s...
32f4 arm32 huge stack frame 0x250+ bytes - why?
« on: June 29, 2023, 12:28:41 pm »
I have a call to b_main()



and then this



I found this when I suddenly lost a chunk from the stack (sp gets loaded before the call at the top).

This is some kind of a stack frame, but why so huge?

Funnily enough a call is not even needed; this is a one-way road to starting the product and b_main() never exits, so it could be a jump, presumably a long jump. But a google for arm32 jumps doesn't yield anything that makes sense. I think there is some obscure way to load the PC.




Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online hans

  • Super Contributor
  • ***
  • Posts: 1641
  • Country: nl
Re: 32f4 arm32 huge stack frame 0x250+ bytes - why?
« Reply #1 on: June 29, 2023, 01:00:38 pm »
"bl" is a linked jump, meaning that the return address is saved in lr. If that is handwritten assembler, then I agree, you could optimize it to a non-linked jump. But in C, it depends on the compiler to figure out if a function is supposed to exit.

But it won't change the sp reservation. I presume your main() has plenty of code that have locals which require some space on the stack. Typically the function prologue will increase sp, which is restored in the epilogue. I would expect that most locals in your main function will then be stored/retrieved with [sp + 0xXX] , where "XX" is between 0 and 591.
Any nested function call can also have a sp prologue/epilogue, so the total stack usage of your main call tree could be even bigger. I think there are ways of measuring that (filling stack with a waterlevel code, such a 0xCC, and then after sufficient observing how high the 'waterlevel' got), or with static analysis.

I think the only optimization here is in the stmdb instruction. If main() never exits, there is no point in storing the r4-r11 and lr registers from the caller. But I'm not sure if the C convention allows one to change that. Maybe by defining main as a naked function? The GCC documentation says that usage of locals should be avoided, so that suggests that would emit the stack prologue/epilogue. And apperently you need it, as your main requires 592 bytes of stack space.
« Last Edit: June 29, 2023, 01:02:57 pm by hans »
 
The following users thanked this post: peter-h

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8173
  • Country: fi
Re: 32f4 arm32 huge stack frame 0x250+ bytes - why?
« Reply #2 on: June 29, 2023, 01:19:38 pm »
I think the only optimization here is in the stmdb instruction. If main() never exits, there is no point in storing the r4-r11 and lr registers from the caller. But I'm not sure if the C convention allows one to change that. Maybe by defining main as a naked function?

GCC has attribute noreturn, which you should apply to main() and all other non-returning e.g. error handler functions that end up in software reset or infinite loop. I first encountered the necessity of this attribute with ATTiny25 with only 2K of flash. Stacking most of the 32 registers one by one was a significant waste of program memory in such case! STM instructions of course are not that bad so savings are small, and runtime cost of a few dozen CPU clock cycles at the beginning of main is usually irrelevant.
« Last Edit: June 29, 2023, 01:22:27 pm by Siwastaja »
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3698
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32f4 arm32 huge stack frame 0x250+ bytes - why?
« Reply #3 on: June 29, 2023, 01:29:41 pm »
Yes; I sussed out the huge stack frame bit. b_main has a 512 byte local buffer (on the stack), hence that.

I am sure that in the asm code which calls b_main I could do whatever I like - a call or a jump. But the stack frame will still get allocated as soon as b_main() is reached. The way in which the function is reached is not likely to change that.

What bit me in this case was that the asm startup was filling the stack with 'S' (for later examination of stack usage, etc). In the asm code (which used no stack) one could fill the whole stack but obviously in C, calling memset() etc, you can't do that because you are trashing the memset() return address :) So I reduced the fill length by 64 bytes, then 128, then more... then I went to look what the hell is really on the stack.

I solved it by making b_main just a little function which sets the SP per the amount of RAM; context is here:
https://www.eevblog.com/forum/microcontrollers/32f417-32f437-auto-detect-of-extra-64k-ram/
and then calls memset, and then calls b_main_real which is the original big b_main.

Code: [Select]

// This is the original main() called from the startup_stm32f407xx.s code
// We don't do much here to enable the B_memset to fill the stack area. The real b_main
// sets up a big stack frame because it contains so 512 buffer(s) on the stack.

void B_main(void)
{

// Get CPU type and store it

if ( B_HAL_GetDEVID() == 0x413 )
B_g_dev_id=417;
if ( B_HAL_GetDEVID() == 0x419 )
B_g_dev_id=437;

// Load SP per CPU type

asm volatile ("ldr sp, = _estack \n");
if (B_g_dev_id==437)
{
asm volatile ("ldr sp, = _estack+65536 \n");
}

// Fill stack with "S". This used to be in the startup .s code.
// We do it here because it is easier to grab the CPU ID in C code (above)

// ldr r2, = 0x2001e000  /*   = _estack - _Stack_Size  */
// b LoopFillStack
//FillStack:
// movs r3, 0x53535353   /* fill with 'S' */
// str  r3, [r2]
// adds r2, r2, #4
//LoopFillStack:
// ldr r3, = _estack     /* = 0x20020000 */
// cmp r2, r3
// bcc FillStack

// We are filling the stack area but the stack is used to call B_memset :)
// So we just reduce the filled length a bit. Only a few are needed.
// The syntax needed to pick up the symbols is weird; also used in _sbrk().

extern char _top;
char* stack_base; // base of stack (from linkfile)
stack_base = &_top;
extern char _Stack_Size; // size of stack (from linkfile)
char* stack_size;
stack_size = &_Stack_Size;

if (B_g_dev_id==437)
{
stack_base += (64*1024); // 32F437 has stack 64k higher up
}
B_memset((char*)stack_base,'S',(int)stack_size-128);

void B_main_real(void);
B_main_real();

// We should never get here

for (;;);

}

Obvious, really :)

I still don't know how to jump but I did something similar a while ago with this

Code: [Select]

// Enter the normal main() (via main_stub) in the application program.

extern char _code_base; // This is base+32k
uint32_t jmpaddr = (uint32_t)&_code_base; // Weird stuff to get a usable value
jmpaddr |= 1; // Bit 0 = 1 for thumb code
asm("bx %0"::"r" (jmpaddr)); // jmp (bx) etc

Quote
GCC has attribute noreturn, which you should apply to main() and all other non-returning e.g. error handler functions that end up in software reset or infinite loop. I first encountered the necessity of this attribute with ATTiny25 with only 2K of flash. Stacking most of the 32 registers one by one was a significant waste of program memory in such case! STM instructions of course are not that bad so savings are small, and runtime cost of a few dozen CPU clock cycles at the beginning of main is usually irrelevant.

Would that not break Cube's debugging stack trace?
« Last Edit: June 29, 2023, 01:32:19 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline eutectique

  • Frequent Contributor
  • **
  • Posts: 392
  • Country: be
Re: 32f4 arm32 huge stack frame 0x250+ bytes - why?
« Reply #4 on: June 29, 2023, 08:42:37 pm »
Add -fstack-usage to your compiler options, and inspect the corresponding *.su files:
https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-fstack-usage

There are other useful options regarding memory size:
https://embeddedartistry.com/blog/2020/08/17/three-gcc-flags-for-analyzing-memory-usage/
 
The following users thanked this post: paf, peter-h, wek


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf