Author Topic: Understanding ARM startup code (__scatterload stuff)  (Read 1053 times)

0 Members and 1 Guest are viewing this topic.

Offline simonlasnier

  • Frequent Contributor
  • **
  • Posts: 360
  • Country: dk
  • www.midronome.com
Understanding ARM startup code (__scatterload stuff)
« on: October 20, 2021, 11:13:15 am »
Hi :)

I recently moved from an STM8 to an ARM Cortex M-23 (GD32E230).

The nice thing about the STM8 is the simplicity, in particular the startup code, which only consists of one assembly file that I could easily read and change.
So to save space on my 32kB ROM I ended up removing pretty much all the startup code apart from setting up the stack pointer. For the variable initialization I just initialized the whole RAM to zero, then initialized in my main the (very few) variables that needed a different value. It worked very well, and I saved loads of ROM space.

I am trying to do the same for the ARM Cortex-M, but I just want to make sure I do not miss anything "important". Like I did for the STM8, I am checking the final ASM code in the project AXF file.

Reading on the net, it seems the __scatterloadXXX ASM functions do the following:
* initialize to zero all global variables which are uninitialized in C code (I guess as a safety?)
* initialize the other global variables (this part just gets removed if I have no initialized variables in my C code)

So far so good. But then comes the stuff I do not get:
* __user_setup_stackheap sets up the stack and the heap... The stack is already set up by the MSP register... What is there left to do? :o
And for the heap I guess it is only a "software" thing since it is not mentioned anywhere in the ARM Cortex Device Generic User Guide?
* After the call to my main() there is the whole exit and __rt_exit, which should just be a dumb forever loop? (or not needed at all since the main() does not ever end)

Unless any of these is used for debugging?

Many questions I know - sorry about that  ;D

Thank you!
Simon

PS: not 100% related, but the STM8 compiler used to create very readable Assembly listings with the associated C code in comments above each small group of Assembly commands.
When it comes to ARM, the only ASM listing I can get from the compiled code is the output of the "fromelf.exe", which shows a lot but not the original C code - is that possible?
Simon,
Creator of the Midronome - a simple and versatile MIDI Master Clock
Check it out on www.midronome.com
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 4802
  • Country: fi
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #1 on: October 20, 2021, 01:30:55 pm »
ARM startup is really simple because as you say, SP is set by hardware from the vector table.

Initializing globals to zero is not for safety and has nothing to do with ARM startup; it's C runtime feature and every C program on every architecture has to do it. It's simply because standard guarantees that globals (and statics within functions) without any explicit initializers are initialized to zero. So of course programmer expects this guarantee holds, but it needs to be actually done somewhere because real RAM contains random values, startup code is the place to do that.

Nothing else is really needed, and you can even skip those global initialization things but then it ain't standard-compliant C.

Usually you do want initialized globals because it makes your life easier, so then you need that, plus call to main().

It's matter of taste if you want to add something extra to the startup code, or do it in the beginning of main().
 
The following users thanked this post: simonlasnier

Offline simonlasnier

  • Frequent Contributor
  • **
  • Posts: 360
  • Country: dk
  • www.midronome.com
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #2 on: October 20, 2021, 01:36:54 pm »
Brilliant  ;D ;D ;D

That is exactly what I had hoped   :phew:

Also I had no idea it was a C standard that non-initialized variables should be set to zero... Gosh I could have saved so many hours writing these =0; everywhere for the past couple of years :palm:
Simon,
Creator of the Midronome - a simple and versatile MIDI Master Clock
Check it out on www.midronome.com
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1265
  • Country: se
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #3 on: October 20, 2021, 01:46:33 pm »
* initialize to zero all global variables which are uninitialized in C code (I guess as a safety?)
* initialize the other global variables (this part just gets removed if I have no initialized variables in my C code)
[--8<--]
PS: not 100% related, but the STM8 compiler used to create very readable Assembly listings with the associated C code in comments above each small group of Assembly commands.
When it comes to ARM, the only ASM listing I can get from the compiled code is the output of the "fromelf.exe", which shows a lot but not the original C code - is that possible?
Both initialization to zero and to an assigned value for variables defined at file scope (with external or internal linkage) or statically defined in a block scope are standard C semantics, C programmers rely on that behaviour.
All ARM startup code that I have encountered - NXP, ST, TI, Cypress etc. - takes care of this by default.

I'm not familiar with Gigadevice toolchain and startup code, but often a number of basic HW initialisations are also performed (e.g. clocks, watchdog, memory protection unit etc.).
The GD32E230 is a Cortex-m23 device, which also has some security features, some of the initialization you see might be related to that.

If you want to generate the assembler listing and you are using gcc, check the -S option (will generate the assembler and stop) or the --save-temps option, which will keep the .i (preprocessed source) and the .S (assembler) intermediate products, more easily integrated in an existing makefile or other compilation script.
Nandemo wa shiranai wa yo, shitteru koto dake.
 
The following users thanked this post: simonlasnier

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 4802
  • Country: fi
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #4 on: October 20, 2021, 01:50:20 pm »
The whole idea in "implicit" zero-initialization is that it's such commonly used initializer value, you can save program memory because all those zero-initialized values can be clumped into their own group and initialized to zero using simple loop. These zeroes do not need to be stored anywhere.

But all the other values need to be stored, the initialization loop then reads out those stored values from ROM and write them to RAM.

At least with some C compilers, explicitly writing "= 0" causes those variables to end up in .data instead of .bss, so that they reserve space in ROM to store those zeroes. This is why I prefer not to explicitly initialize to zero. Also avoiding excess writing.

You could make a counterargument that being explicit is better and "more readable" than being implicit, but to this I respond this is such basic and well-known feature of C that every competent C programmer should know about it  :).
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1265
  • Country: se
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #5 on: October 20, 2021, 01:50:55 pm »
I could have saved so many hours writing these =0; everywhere for the past couple of years :palm:
:-DD
OTOH, I still write =0 when this is a value I expect to rely upon, and leave it uninitialized when I don't care.
I know it's redundant (though not wrong) and semantically exactly the same thing, it's just for my bad memory: when I go back to a piece of code and see the =0 (or equivalent) I know the code is using the initial value somewhere.

EtA: Standard reference, C11, 6.7.9  Initialization, §10:
Quote
If  an  object  that  has  automatic  storage  duration  is  not  initialized  explicitly, its  value  is indeterminate. If an  object  that  has  static  or  thread  storage  duration  is  not  initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
« Last Edit: October 20, 2021, 01:58:23 pm by newbrain »
Nandemo wa shiranai wa yo, shitteru koto dake.
 
The following users thanked this post: simonlasnier

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1265
  • Country: se
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #6 on: October 20, 2021, 01:53:09 pm »
At least with some C compilers, explicitly writing "= 0" causes those variables to end up in .data instead of .bss, so that they reserve space in ROM to store those zeroes.
Yes that's a worry - does not happen with gcc or clang, AFAICR.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 4802
  • Country: fi
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #7 on: October 20, 2021, 01:58:58 pm »
It was definitely on gcc, just don't remember which version or even architecture (must have been ARM, AVR or x86) but it was almost a decade ago when I noticed this behavior.
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1265
  • Country: se
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #8 on: October 20, 2021, 02:49:49 pm »
It was definitely on gcc, just don't remember which version or even architecture (must have been ARM, AVR or x86) but it was almost a decade ago when I noticed this behavior.
Just checked with arm-none-eabi-gcc 6.3.1:
Code: [Select]
0000000000201010 g     O .data  0000000000000004              a
0000000000201018 g     O .bss   0000000000000004              i
000000000020101c g     O .bss   0000000000000004              u
Where a, i, and u are:
Code: [Select]
int i = 0;
int a = 1;
int u;

int main(void)
{
  return 0;
}
Nandemo wa shiranai wa yo, shitteru koto dake.
 
The following users thanked this post: Siwastaja, simonlasnier

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 625
  • Country: ru
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #9 on: October 20, 2021, 03:34:56 pm »
Startup function names points to Keil (ARM Compiler-based), correct? Unlike GCC, it can compress the .data section and call an appropriate decompressor from __scatterload (choosing the most efficient approach is automatic).
Heap setup is for malloc & friends (so you can drop it if not using dynamic allocation).
__rt_exit is for people who returns from main.
« Last Edit: October 20, 2021, 03:37:14 pm by abyrvalg »
 
The following users thanked this post: simonlasnier

Offline emece67

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: es
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #10 on: October 20, 2021, 05:05:35 pm »
Quote from: simonlasnier on Today at 11:13:15 am
Hi :)

* __user_setup_stackheap sets up the stack and the heap... The stack is already set up by the MSP register... What is there left to do? :o
And for the heap I guess it is only a "software" thing since it is not mentioned anywhere in the ARM Cortex Device Generic User Guide?
* After the call to my main() there is the whole exit and __rt_exit, which should just be a dumb forever loop? (or not needed at all since the main() does not ever end)



Such stuff is described in the compiler docs. See https://developer.arm.com/documentation/dui0475/m/the-arm-c-and-c---libraries/stack-and-heap-memory-allocation-and-the-arm-c-and-c---libraries/stack-pointer-initialization-and-heap-bounds

If you do not use new/malloc() you can forgive it, but if using dynamic memory, you can use __user_setup_stackheap to set their limits and, perhaps, to fill such areas to 0xDEADBEEF or any other value later useful to detect stack-heap collisions or stack/heap usage.

If your programs do not return, you can also forgive the __rt_exit()/_sys_exit() stuff. I use _sys_exit() to signal (with a blinking LED) that someone forgot that you must not return from main().

Regards.

Information must flow.
 
The following users thanked this post: simonlasnier

Offline simonlasnier

  • Frequent Contributor
  • **
  • Posts: 360
  • Country: dk
  • www.midronome.com
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #11 on: October 20, 2021, 06:11:40 pm »
Thank you all for the replies!!  8)

Startup function names points to Keil (ARM Compiler-based), correct? Unlike GCC, it can compress the .data section and call an appropriate decompressor from __scatterload (choosing the most efficient approach is automatic).
Heap setup is for malloc & friends (so you can drop it if not using dynamic allocation).
__rt_exit is for people who returns from main.
You are just SPOT on  ;D
And yes I should have mentioned I am using ARM Keil - I did not think it will be different from one compiler to another, I just thought it was standard ARM.

And so now I can see why this __scatterload code looks a lot more complicated than it should be (I just imagined a loop initializing the variable), it's because it is decompressing!

@emece67: thanks for the link :)

@newbrain: as said I am using Keil and I think they use their own compiler?
Interesting using gcc&make for this. How do you debug though, gdb?
I just thought having Keil will get me up and running faster, and having a graphical view for a debugger really really helps IMO.
Simon,
Creator of the Midronome - a simple and versatile MIDI Master Clock
Check it out on www.midronome.com
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 3601
  • Country: us
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #12 on: October 20, 2021, 08:12:05 pm »
Quote
And for the heap I guess it is only a "software" thing ...
I did not think it will be different from one compiler to another, I just thought it was standard ARM.
Once you get into "only a software thing", initialization can vary a great deal from one compiler to another, depending on what libraries they use and etc.  In particular WRT "heap", different compilers/libraries can have different implementations of malloc()/etc.For CM23, I'd start to expect that some build environments would have some support for the "trustzone" features that ARM is pushing.  Maybe adding overrun protection to the stack in addition to "just set SP to some place in memory", and/or "protecting" un-allocated memory.
 

Offline simonlasnier

  • Frequent Contributor
  • **
  • Posts: 360
  • Country: dk
  • www.midronome.com
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #13 on: October 20, 2021, 08:45:54 pm »
If you want to generate the assembler listing and you are using gcc, check the -S option (will generate the assembler and stop) or the --save-temps option, which will keep the .i (preprocessed source) and the .S (assembler) intermediate products, more easily integrated in an existing makefile or other compilation script.
Thanks for that :) Unfortunately when I add -S or --save-temps it seems I get the same error: "error: use of xxxx is disallowed in this variant of ARM Compiler".
It seems the compiler Keil uses is called armclang and is kinda their own modified version of gcc.
Simon,
Creator of the Midronome - a simple and versatile MIDI Master Clock
Check it out on www.midronome.com
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1265
  • Country: se
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #14 on: October 20, 2021, 09:07:51 pm »
It seems the compiler Keil uses is called armclang and is kinda their own modified version of gcc.
Rather, it's ARM's own version of LLVM/clang.
And, according to the docs, armclang should accept -S and -save-temps as does clang (which is mostly gcc compatible), but I don't know whether Keil uses some special/limited version of it.

Note that I had an extra '-' in my original post, the option is actually -save-temps, not --save-temps.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 625
  • Country: ru
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #15 on: October 20, 2021, 09:45:56 pm »
Looks like a licensing problem. Armclang has tons of optional features controlled by a huge license config file - code size restrictions, various optimization options enabling, output formats restrictions etc.
« Last Edit: October 20, 2021, 09:49:14 pm by abyrvalg »
 

Offline simonlasnier

  • Frequent Contributor
  • **
  • Posts: 360
  • Country: dk
  • www.midronome.com
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #16 on: October 21, 2021, 07:42:20 am »
For CM23, I'd start to expect that some build environments would have some support for the "trustzone" features that ARM is pushing.  Maybe adding overrun protection to the stack in addition to "just set SP to some place in memory", and/or "protecting" un-allocated memory.
Funny you are mentioning this - it seems the TrustZone is optional for Cortex M-23 - https://developer.arm.com/ip-products/processors/cortex-m/cortex-m23
And I cannot figure out whether it is implemented in my GD32E230 or not  :-//

Looking at the datasheet - https://gd32mcu.21ic.com/data/documents/shujushouce/GD32E230xx_Datasheet_Rev1.3.pdf - it only mentions this:

Quote
* Internal Bus Matrix connected with AHB master, Serial Wire Debug Port and Single-cycle
IO port
* Nested Vectored Interrupt Controller (NVIC)
* Breakpoint Unit(BPU)
* Data Watchpoint and Trace (DWT)
* Serial Wire Debug Port

If I compared with the datasheet of the GD32E505 f.x. which has a Cortex M-33 (also with optional TrustZone) - https://www.gigadevice.com/datasheet/gd32e505xxxx-datasheet/ - this one mentions a lot more like the DSP, FPU, MPU and ITM f.x. which are all optional for Cortex M-33.

So my guess is since the TrustZone is not mentioned anywhere it is not implemented in neither of these MCUs?
Simon,
Creator of the Midronome - a simple and versatile MIDI Master Clock
Check it out on www.midronome.com
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 3601
  • Country: us
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #17 on: October 21, 2021, 08:42:10 am »
It looks like even "baseline" ARMv8 have some non-optional stack limit checking:
   
Quote
The simplest ARMv8-M implementation, without any of the optional extensions, is a Baseline implementation, see [color=rgb(7.000000%, 7.000000%, 80.100000%)]ARMv8-M variants [/color][/font][/size][/color][color=rgb(7.000000%, 7.000000%, 80.100000%)]on page A1-27[/color]. The ARMv8-M Baseline offers improvements over previous M-profile architectures in the following areas:
 
  • The optional Security Extension.
     
    An improved, optional, Memory Protection Unit (MPU) model.
     
    Alignment with ARMv8-A and ARMv8-R memory types.
     
    Stack pointer limit checking.
     
    Improved support for multi-processing.
     
    Better alignment with C11 and C11++ standards.
     
    Enhanced debug capabilities.
I am, alas, not surprised that the 76-page datasheet you have is not very complete WRT the information it provides :-((although I've really come to like the section I've seen in SOME datasheets where they go "here are the Cortex-M0+ configuration options, and how they've been set in the particular chip.   It doesn't seem very common, though.  :-( )

[attachimg=1]
   
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 625
  • Country: ru
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #18 on: October 21, 2021, 08:47:43 am »
This https://www.gigadevice.com/press-release/gigadevice-launches-low-cost-gd32e230-mcu-series-featuring-the-cortex-m23-generation-core/[/url] GD32E230 launch note says “Subsequent products can also benefit from TrustZone® technology”, so it looks like “not this time”.
 
The following users thanked this post: simonlasnier

Offline simonlasnier

  • Frequent Contributor
  • **
  • Posts: 360
  • Country: dk
  • www.midronome.com
Re: Understanding ARM startup code (__scatterload stuff)
« Reply #19 on: October 21, 2021, 11:06:27 am »
@westfw: Ha ha yes that would be perfect :)
And just FYI the stack limiting in Cortex M-23 is obligatory in the Secure Mode (if TrustZone is implemented), and not at all possible in non-Secure Mode (or without TrustZone). In M-33 you have MSPLIM and PSPLIM for non-secure mode.

@abyrvalg: well spotted - that confirms it then  ;)
Simon,
Creator of the Midronome - a simple and versatile MIDI Master Clock
Check it out on www.midronome.com
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf