Author Topic: How to create ELF file which contains the normal prog, plus a relocatable block?  (Read 12559 times)

0 Members and 1 Guest are viewing this topic.

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
I am using ST Cube IDE (32F4) and trying to generate a binary image which comprises of my program, plus a 16k block which is linked to run from the start of main RAM.

It would be copied to RAM by a copy loop in the startup .s file - in the same way as data is initialised, bss is zeroed, etc.

A lot of people have been up this path but I wonder what is the best way to achieve this.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
I don't know your linker script.
Assuming that .data is already the first section in RAM, I'd include the .o file containing the 16k block at the begin of the .data secion (in the linker script).
As the .data secion is placed in ROM and copied to RAM ayway, there should be no need for an extra copy then.

If the 16k block is not a *.o file, but a raw binary file, then it needs to be converted to a .o file in the first place, e.g. with objcopy.
[ In this case it also won't contain exported symbols which can be directly referenced from from your main program. So it is up to you then, how to find the entry points (functions) in the code blob. ]

See also
https://stackoverflow.com/questions/17265950/linking-arbitrary-data-using-gcc-arm-toolchain
https://stackoverflow.com/questions/327609/include-binary-file-with-gnu-ld-linker-script/328137
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
"As the .data secion is placed in ROM and copied to RAM ayway, there should be no need for an extra copy then."

That's a cunning solution :)

However, I don't want to waste 16k of RAM. There is just 128k, plus 64k CCM RAM which is already reserved for stuff. The plan is to use the 16k RAM code for code which can't execute from CPU FLASH: Conditionally (having checked some CRC etc), copy a 1MB portion of an SPI FLASH chip to the CPU FLASH (IOW, program the entire CPU from the SPI FLASH).

And that 16k of RAM would then be "dumped", next time the thing boots up.

Linker script:

Code: [Select]
/* Entry Point */
ENTRY(Reset_Handler)

/* Highest address of the main stack */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    /* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
_Min_Heap_Size  = 0xa000;     /* 40k heap - min size; it can grow to end of main RAM  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)  : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)     : ORIGIN = 0x10000000, LENGTH = 64K
}

/* Define output sections */
SECTIONS
{
  /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH


   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH

  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(.fini_array*))
    KEEP (*(SORT(.fini_array.*)))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH

  /* used by the startup to initialize data */
  _sidata = .;

  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data : AT ( _sidata )
  {
    . = ALIGN(4);
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM

  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
 
  /* ._user_heap_stack : */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM
  /*   } >CCMRAM */


  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
 
  /* CCM-RAM section
  *
  * IMPORTANT NOTE!
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section. 
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
   
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}


However, I don't just want to load a chunk of binary data into RAM. That is easy. One could even have a .c file with an initialised uchar array in it; that will automatically end up in initialised data. The challenge is how to implement in effect

ORG 0x20000000
C code goes here

and have this within the same project.

It is obviously possible to generate code located to run from address 0x20000000 if one sets up a separate Cube IDE project and uses that just to generate this boot loader. But then one ends up with two projects to maintain, and switch between them.
« Last Edit: July 12, 2021, 09:19:45 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
Quote
ORG 0x20000000
C code goes here

I think something like

Code: [Select]
SECTIONS {
  ...
  myccode 0x20000000 :
    {
      . = ALIGN(4);
      _smyccode = .;   
      myccode.o
      . = ALIGN(4);
      _emyccode = .;   
    } AT>FLASH
  ...
}

Edit: But keep in mind that your code and .data cannot reside simultaneously at 0x20000000...
Once you copy your code to 0x20000000, .data gets overwritten, so you must ensure that anything from .data is not accessed any more.
« Last Edit: July 12, 2021, 10:36:12 pm by gf »
 

Offline bugnate

  • Regular Contributor
  • *
  • Posts: 58
  • Country: us
It is obviously possible to generate code located to run from address 0x20000000 if one sets up a separate Cube IDE project and uses that just to generate this boot loader. But then one ends up with two projects to maintain, and switch between them.

I had a similar conundrum some years back for a F103 bootloader (copied itself to SRAM and executed from there so that it could overwrite its own flash space, for reasons). In the end I was not able to get a truly satisfying solution but I was able to keep it to a single project by doing the gymnastics in the Makefile (and conditionals in the Linker file) instead of a totally separate project.
« Last Edit: July 13, 2021, 02:00:32 am by bugnate »
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6242
  • Country: fi
    • My home page and email address
I personally would treat the bootloader (or whatever that is that runs from RAM) as a separate "process": set up a stack, copy it to RAM, and call it.  If it ever returns, assume hardware state is still the same, and continue by resetting the stack, copy initialized RAM from Flash, clear the rest of RAM, call constructors, and finally call main(), as normal.

In other words, implement it as a duplicate environment run prior to the proper one.  No shared resources at all.  (Passing data is difficult, but possible: keep it at the middle of the stack, don't let it be cleared, and use a constructor function that calls the library malloc() function, and copies the passed data to that –– assuming you don't want to keep the RAM reserved for that data until shutdown.)

I don't use ST Cube IDE, but it should not need much meddling in the CRT (assembly run prior to main()) and linker files.  I can probably point out the changes if you provide the linker file and the .s source, though.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
At one level I think the answer must be simple.

For example, to locate an an array in the CCM memory, you use

Code: [Select]
#define CCMRAM __attribute__((section(".ccmram")))

// RTOS buffer, used for process stacks etc
CCMRAM uint8_t ucHeap[configTOTAL_HEAP_SIZE];

and then in the linker script (posted above) you have corresponding commands to make it work.

The same method should work for code ("text").

What you won't get, unless you do additional aerobatics, is "data" i.e. initialised data, so e.g.

int x=0;

will not work. You have to use

int x;
x=0;

This is easy to remember. The challenge will be if using libraries written by others, because loads of people do stuff like
for (int x=0; x<10; x++)

It may not work at all if a library is provided as binary. And even if provided as source (e.g. the ST libs) you have to go through it line by line. For example here you see a line right away, and there are plenty more in that HAL function (which is full of crap anyway like all the ST libs)

Code: [Select]
// Send and receive 1 byte
// Returns received byte or zero on failure
static uint8_t at45dbxx_tx_rx_byte(uint8_t data)
{
uint8_t ret = 0;
HAL_SPI_TransmitReceive(&_45DBXX_SPI, &data, &ret, 1, AT45DB_SPI_TIMEOUT_MS);
return ret;
}

If you want to avoid having to go through the code, you have to figure out how to set up a "data" section as well.

In reality this piece of loader code would be used for a specific simple task only: to write the CPU FLASH. In this case the data is read from a serial (SPI) FLASH chip. The code for that is all done and would need to be stripped down / checked / edited to make sure no initialised data section is needed. Then I need to find code to write the CPU FLASH and I am sure many have been up that path.

The two stackoverflow links above are fine if you generate the code using a separate project, but as I say that is basically simple because you could achieve it with a little program which takes in a binary block, or even an .elf file, and generates C source of the form

uint8_t buf[]  = { 0xb5, 0x62, 0x01, 0x35, 0xb5, 0x62, 0x01, 0x35, 0xb5, 0x62, 0x01, 0x35, 0xb5, 0x62, 0x01, 0x35 };

and you just drop that into your C source.

Is there a directive one can use in a C sourcefile to direct initialised data into a section of another name, e.g. "data2"? If that is possible, then it is just another simple bit of asm code in the startup:

Code: [Select]
.syntax unified
  .cpu cortex-m4
  .fpu softvfp
  .thumb

.global  g_pfnVectors
.global  Default_Handler

/* start address for the initialization values of the .data section.
defined in linker script */
.word  _sidata
/* start address for the .data section. defined in linker script */ 
.word  _sdata
/* end address for the .data section. defined in linker script */
.word  _edata
/* start address for the .bss section. defined in linker script */
.word  _sbss
/* end address for the .bss section. defined in linker script */
.word  _ebss
/* stack used for SystemInit_ExtMemCtl; always internal RAM used */

/**
 * @brief  This is the code that gets called when the processor first
 *          starts execution following a reset event. Only the absolutely
 *          necessary set is performed, after which the application
 *          supplied main() routine is called.
 * @param  None
 * @retval : None
*/

    .section  .text.Reset_Handler
  .weak  Reset_Handler
  .type  Reset_Handler, %function
Reset_Handler: 

  ldr   sp, =_estack

/* Copy the data segment initializers from flash to SRAM */ 
  movs  r1, #0
  b  LoopCopyDataInit

CopyDataInit:
  ldr  r3, =_sidata
  ldr  r3, [r3, r1]
  str  r3, [r0, r1]
  adds  r1, r1, #4
   
LoopCopyDataInit:
  ldr  r0, =_sdata
  ldr  r3, =_edata
  adds  r2, r0, r1
  cmp  r2, r3
  bcc  CopyDataInit
  ldr  r2, =_sbss
  b  LoopFillZerobss
/* Zero fill the bss segment. */ 
FillZerobss:
  movs  r3, #0
  str  r3, [r2], #4
   
LoopFillZerobss:
  ldr  r3, = _ebss
  cmp  r2, r3
  bcc  FillZerobss

/* Zero CCM RAM - fills the whole CCM so don't use the stack until afterwards :) */
/* PH 15/5/2021 */
ldr r2, = 0x10000000  /* was _sccmram */
b LoopFillZeroCcm

FillZeroCcm:
movs r3, 0xaaaaaaaa /* this fill intentionally differs from the a5a5a5a5 fill used by FreeRTOS for its stacks */
  str  r3, [r2]
adds r2, r2, #4

LoopFillZeroCcm:
ldr r3, = 0x10010000  /* was _eccmram */
cmp r2, r3
bcc FillZeroCcm

/* Call the clock system initialization function.*/
  bl  SystemInit   
/* Call static constructors */
    bl __libc_init_array
/* Call the application's entry point.*/
  bl  main
  bx  lr   
.size  Reset_Handler, .-Reset_Handler
« Last Edit: July 13, 2021, 08:10:52 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14447
  • Country: fr
I was going to suggest looking at the '-fpic' GCC option, which is supposed to generated relocatable code. I think there are ARM-specific options complementing -fpic.
It can apparently make the code significantly larger than without, though.

If you don't have the compiler specifically generate relocatable code, it may not work if you load it at a location different from where it was defined to be at link-time. In particular, function calls within the module that you want to be relocatable, except functions that are inlined of course, or that could (by chance) be called with relative jumps, but that's not guaranteed if you don't use '-fpic'.

A simpler approach is of course to define a dedicated section in the linker script (in the RAM area you want to run your code from), and use attributes in your code to put it in the right section.
If I got it right, you were concerned that this approach would require reserving a RAM block large enough for this, that would be otherwise not accessible for other needs if said code is not loaded or not used? Well, the way I see it, the only way you could reuse this RAM block would be through dynamic allocation anyway. So you can arrange your linker script to make this area available for heap allocation as well. That may additionally require to write your own _sbrk() function, though.
 
The following users thanked this post: harerod

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
Quote from: peter-h
What you won't get, unless you do additional aerobatics, is "data" i.e. initialised data, so e.g.
int x=0;
will not work

Why do you think that you need to handle the (initialized) .data section of your RAM program different than .text? I don't see the difference. When you activate the RAM program, you need to copy the contents of both, .text and .data, from flash to RAM. You can even combine both input sections, .text and .data (and not to forget several others), into a single output section in the linker script, and copy them as a single chunk from flash to RAM upon activation. Different is rather .bss, because you likely don't want to allocate (waste) zero-filled space for it in flash, but rather zero it at runtime.

If you don't have the compiler specifically generate relocatable code, it may not work if you load it at a location different from where it was defined to be at link-time. In particular, function calls within the module that you want to be relocatable, except functions that are inlined of course, or that could (by chance) be called with relative jumps, but that's not guaranteed if you don't use '-fpic'.

As far as I understand, it is supposed to run at a fixed address anyway. In the linker script, you can specify both, the VMA (virtual memory address) and the LMA (load address) for an output section. VMA is the address where the section is supposed to reside at runtime, and LMA is the address, where the section contents are placed by linker (e.g. in flash, before they are copied to the VMA at runtime). Symbols are relocated by the linker according to the VMA.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14447
  • Country: fr
In the linker script, you can specify both, the VMA (virtual memory address) and the LMA (load address) for an output section. VMA is the address where the section is supposed to reside at runtime, and LMA is the address, where the section contents are placed by linker (e.g. in flash, before they are copied to the VMA at runtime). Symbols are relocated by the linker according to the VMA.

I'll have to look into this. But as far as I understand, that will work only if the locations are fixed (which is probably good enough here for the OP.)

For fully relocatable code that you want to be able to run from anywhere in memory, you'll still need the '-fpic' option. That's probably not useful on a small embedded target though. More so if you're writing an 'OS' that can load code anywhere in RAM.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Relocatable code would be super neat.

"A simpler approach is of course to define a dedicated section in the linker script (in the RAM area you want to run your code from), and use attributes in your code to put it in the right section."

That is how CCM RAM is implemented - see my linker script above. It is easy.

For code, it may be something like

section(".text")

where "text" is the standard section name for executable code. But then one also needs a compiler/linker directive to compile the code to execute from 0x20000000.

This is used in the startup code; a slightly different format since it is assembler



"If I got it right, you were concerned that this approach would require reserving a RAM block large enough for this"

I don't think RAM usage matters, because this code will get copied (by startup) to the base of RAM, 0x20000000, it will do its job, and then will jump to the original startup code, probably Reset_Handler: above. So it doesn't matter if it uses up all 128k of RAM... in fact it may as well use as much as it can because that will speed up FLASH programming.

But the code won't be that big, because it just needs to read the serial FLASH and, subject to CRC check etc, write the CPU FLASH with it. Well, all except a block of CPU FLASH containing the code in question (to prevent bricking the unit if e.g. power was lost during programming). The code for the serial FLASH is all running (in the main project) and even being ex ST HAL bloatware it isn't more than a few k.

One gotcha, apparently, is that the uppermost CPU FLASH is a 128k block, which needs to be erased all at once, so the loader cannot go in there (if you want a totally brick-proof box) :) Well, it can.. this is a secondary issue.

"Why do you think that you need to handle the (initialized) .data section of your RAM program different than .text? I don't see the difference. When you activate the RAM program, you need to copy the contents of both, .text and .data, from flash to RAM. You can even combine both input sections, .text and .data (and not to forget several others), into a single output section in the linker script, and copy them as a single chunk from flash to RAM upon activation."

I probably need an actual example... From above, I reckon

section(".text2")

will place all subsequent C code into "text2" and then the linker can process that, and the startup code can copy a 16k block from that address to 0x20000000.

But how does one do that with initialised data? Normally, if you write
uint32_t i=5;
then the compiler automatically puts a 32 bit value with 0x00000005 in it into a section called "data". How does one override that in GCC, just for the current source file?

I've done a lot of googling and can't find much. There is a lot of little bits around e.g. execution from RAM prevents caching, so the code runs a bit slower according to some reports, but that's irrelevant.


« Last Edit: July 13, 2021, 07:34:52 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14447
  • Country: fr
But how does one do that with initialised data? Normally, if you write
uint32_t i=5;
then the compiler automatically puts a 32 bit value with 0x00000005 in it into a section called "data". How does one override that in GCC, just for the current source file?

What exactly would you like to do?
Put that in another section?
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
The standard startup code contains this



and that needs to be implemented for the block of code which is getting copied into RAM.

Otherwise, that stuff will end up in the FLASH and remain there, but the FLASH is going to be gradually overwritten by this loader. And in any case the FLASH is not accessible during the programming cycle (the CPU gets a few million wait states).

Maybe there is a way to combine initialised data with the code. That should work, given that the whole lot is going to be copied into RAM, and it won't need the above copy loop either. I have seen references to "RAM" compiler directives; maybe that puts initialised data into the code segment (normally "text" but here we will use a different name).
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
Maybe there is a way to combine initialised data with the code. That should work, given that the whole lot is going to be copied into RAM,

That's what I already suggested and where I said that I don't see a difference between the handling of .text and .data.

Quote
I have seen references to "RAM" compiler directives; maybe that puts initialised data into the code segment (normally "text" but here we will use a different name).

Input sections can be flexibly re-arranged, combined and placed at the desired VMA and LMA in the linker script, so I don't think that many custom annotations in the C file are necessary (if at all).
I suggest to study the ld coumentation in deep detail, in order to understand linker scripts and the linker's capabilities.

I rather wonder: Is your "RAM program" self-contained (in the sense of not requiring linking with any external libraries)?
Since you want to overwrite flash, your program must not call any library functions which are placed by the linker somewhere in flash.
And vice versa, your program must not define and export any (library) symbols which happen to be referenced from any code residing in flash (except for the entry point).
If this is not granted, a crash is inevitable.

If you cannot renounce the use of external libraries, then I guess that you may need to pre-link your "RAM program" with the required libs, and strip it, in order to obtain a self-contained .o file w/o external symbol references, before linking the latter into the image.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Yes it will be totally self contained.

I am working on it now.

Turns out that within the C code you can put a function into a particular section (e.g. "code_in_ram" by declaring it with:

attribute ((long_call, section (".code_in_ram"))) void myfunction(void)                           
{                                                                               
}

this is the same feature as I posted above, to define variables in the CCRAM. For functions you also need the long_call attribute so that it uses a long address to call code in another section.

Still not sure what happens with initialised data. However, it's just been pointed out to me that any instance within a function is fine because the compiler inserts code to do that. It is only initilised data outside any function which needs this, and I can avoid that. I knew that but forgot :)
« Last Edit: July 14, 2021, 07:40:58 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4028
  • Country: nz
Putting long_call there seems archaic. Why not let the compiler/linker figure it out?  You can use -mlong_calls in the compiler switches to make the long/short call decision automatic. If the flash and RAM addresses are less than 64 MB apart then it's unnecessary anyway.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Let's say I have a file, say loader.c, and in it is a collection of functions, one of which is loader().

Actually loader() is at the very end of the file (easier, since all supporting functions in it are static and w/o prototypes).

What is the simplest way to copy that code, say 16k regardless of how much smaller than 16k it actually is, to some address in RAM, and then jump to it?

I want to do it from a statement at the start of loader(). IOW, from the C source file.

Presumably it would use memcpy.

IOW, not have the flash-> RAM copying code in startupxxx.s, which is what everybody else is doing. The reason for this is that I have a load of code in main() which sets up I/O, peripherals, etc, and loader() will get called from main() before interrupts are enabled. If the loader is activated from startup, it needs to contain duplicates of all the I/O etc config, which is daft and looking for trouble later.

I can think of truly horrible ways to find the start address of the block of code, e.g. by declaring a unique string inside a function at the start of it, and searching the flash for it ;)

When loader() is finished it will never return to main(); it will reboot the CPU, and (assuming the flashing operation completed ok) it won't get run again, so it doesn't matter how much mess it makes in the RAM. The SP is set to the top of CCM in startup, which is miles away.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
Below is how I guess that it may work.


Here is a dummy loader.c, containing code, initialized and uninitialized data. Does not do anything usefuls - just for test of section placement.
Compile with: arm-none-eabi-gcc -c loader.c (don't optimize this dummy loader, otherwise the optimizer eliminated .data and .bss segments)
Eventually replace this with the actual loader.

Code: [Select]
static const char* pp = "abcdef";
static char s[] = "sadfasfasdf";
static char buffer[1024];

void loader(void)
{
    const char *p1 = pp;
    const char *p2 = s;
    const char *p3 = buffer;
}


Suggested modification for your linker script, for placing loader.o at proper VMA and LMA.

Code: [Select]
/* Entry Point */
ENTRY(Reset_Handler)

/* Reference loader to ensure that is gets linked-in */
EXTERN(loader)

/* Highest address of the main stack */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    /* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
_Min_Heap_Size  = 0xa000;     /* 40k heap - min size; it can grow to end of main RAM  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)  : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)     : ORIGIN = 0x10000000, LENGTH = 64K
}

/* Define output sections */
SECTIONS
{
 /* loader_bss and loader sections must come first, in order to override the wildcards in subsequent sections */
 loader_bss _loader_end : {
  _loader_bss_start = .;
  loader.o(.bss*)
  . = ALIGN(4);
  _loader_bss_end = .;
 }
 loader 0x20000000 : AT(_loader_loadaddr) {
  _loader_start = .;
  loader.o
  . = ALIGN(4);
  _loader_end = .;
 }
 _loader_size = SIZEOF(loader);

  /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH


   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH

  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(.fini_array*))
    KEEP (*(SORT(.fini_array.*)))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH

  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
    /* . = ALIGN(4); */
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */
    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT >FLASH

  /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

 /* dummy placeholder in flash for loader section, to count flash usage */
 .loader : {
  . = . + SIZEOF(loader);
 } AT >FLASH
 _loader_loadaddr = LOADADDR(.loader);

  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)
    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
 
  /* ._user_heap_stack : */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM
  /*   } >CCMRAM */

  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
 
  /* CCM-RAM section
  *
  * IMPORTANT NOTE!
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section.
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
   
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}


And that's how I think it could be invoked from your main():

Code: [Select]
extern char _loader_start;
extern char _loader_end;
extern char _loader_loadaddr;
extern char _loader_bss_start;
extern char _loader_bss_end;
extern void loader();

void main()
{
    /* ensure that interrupts are disabled here */

    /* copy loader to ram */
    memcpy(&_loader_start, &_loader_loadaddr, &_loader_end - &_loader_start);

    /* clear loader's bss */
    memset(&_loader_bss_start, 0, &_loader_bss_end - &_loader_bss_start);

    /* keep interrupts disabled in loader(), since RAM is already overwritten now, and
       since you intend to overwrite the vectors and ISRs in flash from loader(). */
    loader();
}

 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14447
  • Country: fr
Still not sure what happens with initialised data. However, it's just been pointed out to me that any instance within a function is fine because the compiler inserts code to do that. It is only initilised data outside any function which needs this, and I can avoid that. I knew that but forgot :)

For initialized data, the thing to consider is this: in the linker script, the (usually) '.data' section is a particular one, as it requires two separate locations. One is the memory area where the variables themselves will be stored (usually somewhere in RAM), and the other for the initializers, which will end up somewhere else (usually in Flash memory for typical microcontrollers.)

This is defined this way:
Code: [Select]
.data :
    {
    } >RAM AT>FLASH

(assuming RAM is the memory area defined for RAM, and FLASH the one defined for Flash memory.)

If you don't mind your variables to all end up in the same section, then you have nothing special to do.
You'll put, as you mentioned, the code in its own section (separate from '.text'), and you can leave variables declarations as is without doing anything. The only problem with this approach is that variables for your alternate piece of code will be allocated with all the other variables, and thus will take up static space at all times. If variables for this alternate code do not take up much space, I would just leave it like this and not bother. But if you don't want to waste RAM with those variables once the alternate code is not used anymore, then you can define yet another section for them. That would be for instance, a '.data2' section, that you could arrange to start at the same location as the '.data' one, so they would overlay. I think this is possible in linker scripts, but I'm not too sure the linker would let you create overlaid sections like this.

A cleaner approach, IMO, would be to link this alternate code separately, and then program both object codes on the target specifying the right start address for each to the programmer.

 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
I am having a bit of trouble, with the linker saying it can't find loader.o.

I wonder if there is a name conflict. The actual file is called KDE_loader.c (and it is KDE_loader.o it says it can't find) and the function within it is loader(). I am not sure what happens if you have e.g. loader.c containing loader() but I am avoiding that.

This is my linker script now

Code: [Select]

/* Entry Point */
ENTRY(Reset_Handler)

/* Reference loader to ensure that is gets linked-in */
EXTERN(KDE_loader)    

/* Highest address of the main stack */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    /* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
_Min_Heap_Size  = 0xa000;     /* 40k heap - min size; it can grow to end of main RAM  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)  : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)     : ORIGIN = 0x10000000, LENGTH = 64K
}

/* Define output sections */
SECTIONS
{
/* loader_bss and loader sections must come first, in order to override the wildcards in subsequent sections */
 loader_bss _loader_end : {
  _loader_bss_start = .;
  KDE_loader.o(.bss*)
  . = ALIGN(4);
  _loader_bss_end = .;
 }
 KDE_loader 0x20000000 : AT(_loader_loadaddr) {
  _loader_start = .;
  KDE_loader.o
  . = ALIGN(4);
  _loader_end = .;
 }
 _loader_size = SIZEOF(KDE_loader);

  /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH


   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH

  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(.fini_array*))
    KEEP (*(SORT(.fini_array.*)))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH

 
  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
 /*   . = ALIGN(4); */
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT >FLASH

 /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

 /* dummy placeholder in flash for loader section, to count flash usage */
 .KDE_loader : {
  . = . + SIZEOF(KDE_loader);
 } AT >FLASH
 _loader_loadaddr = LOADADDR(.KDE_loader);

  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
 
  /* ._user_heap_stack : */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM
  /*   } >CCMRAM */


  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
 
  /* CCM-RAM section
  *
  * IMPORTANT NOTE!
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section. 
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
   
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}


I did check it contains the suggested mods, using file compare in notepad++.

One regular issue with Cube IDE is that it doesn't pick up some edits so one has to do a Clean Project / Build project etc.
« Last Edit: July 14, 2021, 09:12:15 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de

I wonder if there is a name conflict. The actual file is called KDE_loader.c (and it is KDE_loader.o it says it can't find) and the function within it is loader(). I am not sure what happens if you have e.g. loader.c containing loader() but I am avoiding that.

The function name does not need to be the same as the module name.

This statement references the function name:
Code: [Select]
EXTERN(KDE_loader)

If the function name is loader() then it should be:
Code: [Select]
EXTERN(loader)

I've just added the EXTERN to be on the safe side, in order that the section does not get garbage collected when it is not referenced otherwise (depending on whether you invoke the linker with the option --gc-sections).
Eventually it is supposed to be referenced from main() anyway, when you call loader() from main().

Edit:
I suggest to invoke the linker with option -Map=outputfilename.map and to check the map file for correct assignment of the sections. Some fine tuning may be necessary, depending on the secions actually present in KDE_loader.o.

The section assignment in the linker script was a bit tricky. Obviously the order of the output sections matters for resolving the wildcards. So I had to put the loader_bss and loader sections at the beginning, although their load address is not known before .data has been placed. As workaround I introduced the dummy placeholder section. Aim was that loader.c can still use the regular section names (.text, data, .bss,...), so that not each function and each global variable need to be annotated with __attribute__().
« Last Edit: July 14, 2021, 09:43:10 pm by gf »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
That's done but still same error.

Sounds like KDE_loader.c is not getting compiled, which is bizzare.

It is referenced from main()

   // At this point, interrupts should still be disabled
   // Execute loader

   extern void loader(void);
   loader();

Just seen your edit. Will need to dig around to find where that is.
« Last Edit: July 14, 2021, 09:47:29 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
Sorry, don't know your IDE and don't know what you need to do that it gets compiled. Or does the .o file possibly exist, but in the wrong direectory?
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
The .o file exists, in the same dir as all the others, so I guess it is not getting supplied to the linker. Yeah, I have no idea how this IDE works at that level, either, and google turns up loads of hits on others having the same problem. I do know the IDE is buggy - like everything from ST - in that one sometimes needs to do a Clean project / Build project / Reindex project for no obvious reason. I will dig around... In the distant past, all command line, makefiles, etc, a linker would be supplied, on its command line, with a file containing the files to link, and I guess this one is not getting included in that file. And I have added lots of new files to this project, so the basic method must be working.

arm-none-eabi-g++ -o "KDE.elf" @"objects.list"   -mcpu=cortex-m4 -T"C:\KDE\Project1\LinkerScript.ld" --specs=nosys.specs -Wl,-Map="KDE.map" -Wl,--gc-sections -static  -mfpu=fpv4-sp-d16 -mfloat-abi=hard -mthumb -u _printf_float -u _scanf_float -Wl,--start-group -lc -lm -lstdc++ -lsupc++ -Wl,--end-group

and objects.list contains the .o file. I wonder if there is something in the linker .ld file which is causing it to be excluded? It is the only source file which is explicitly named in the linkfile script, because the code is being linked to run at a different address.

Edit: someone suggested putting a * before the filename in the linker .ld file, and it links! No idea why.



Apparently, the reference for the above is here: https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_chapter/ld_3.html

The code unfortunately bombs. It does a hard fault on the memcpy. OK, maybe it is not a great idea to load the code at 0x20000000 because that is the very base of RAM and even though I can't think of what might be there, at this early point in main(), I tried different addresses in this



but all that I have tried produce section overlap errors e.g.



The loader cannot be that big... from 0x2001000 to 0x20098f2b.

Practically any value other than 0x20000000 produces some overlap error e.g. with 0x20000100:



Interestingly looking in the .map file, it really is huuuge
                0x0000000000088f2c                _loader_size = SIZEOF (KDE_loader)

Code: [Select]
loader_bss      0x000000002008902c        0x0
                0x000000002008902c                _loader_bss_start = .
 *KDE_loader.o(.bss*)
                0x000000002008902c                . = ALIGN (0x4)
                0x000000002008902c                _loader_bss_end = .

KDE_loader      0x0000000020000100    0x88f2c load address 0x000000000803b6a0
                0x0000000020000100                _loader_start = .
 *KDE_loader.o()
 .data          0x0000000020000100       0x18 src/KDE_loader.o
                0x0000000020000100                GPIO_PORT_X
 .rodata        0x0000000020000118        0xc src/KDE_loader.o
                0x0000000020000118                GPIO_PIN_X
 .text.HAL_GPIO_WritePin
                0x0000000020000124       0x32 src/KDE_loader.o
 *fill*         0x0000000020000156        0x2
 .text.KDE_LED_On
                0x0000000020000158       0x38 src/KDE_loader.o
 .text.KDE_LED_Off
                0x0000000020000190       0x38 src/KDE_loader.o
 .text.hang_around
                0x00000000200001c8       0x26 src/KDE_loader.o
 .text.loader   0x00000000200001ee       0x1c src/KDE_loader.o
                0x00000000200001ee                loader
 .debug_info    0x000000002000020a      0x372 src/KDE_loader.o
 .debug_abbrev  0x000000002000057c      0x193 src/KDE_loader.o
 .debug_aranges
                0x000000002000070f       0x40 src/KDE_loader.o
 .debug_ranges  0x000000002000074f       0x30 src/KDE_loader.o
 .debug_macro   0x000000002000077f       0xc8 src/KDE_loader.o
 .debug_macro   0x0000000020000847      0x112 src/KDE_loader.o
 .debug_line    0x0000000020000959      0x50e src/KDE_loader.o
 .debug_str     0x0000000020000e67    0x880ae src/KDE_loader.o
                                      0x88234 (size before relaxing)
 .comment       0x0000000020088f15       0x53 src/KDE_loader.o
                                         0x54 (size before relaxing)

and these are the values when loading at 0x20000000. No errors this time, but the code crashes in memcpy() - unsurprisingly given the debug_str is 0x880ae in size. No idea where these huge items are coming from

Code: [Select]
KDE_loader      0x0000000020000000    0x88f2c load address 0x000000000803b6a0
                0x0000000020000000                _loader_start = .
 *KDE_loader.o()
 .data          0x0000000020000000       0x18 src/KDE_loader.o
                0x0000000020000000                GPIO_PORT_X
 .rodata        0x0000000020000018        0xc src/KDE_loader.o
                0x0000000020000018                GPIO_PIN_X
 .text.HAL_GPIO_WritePin
                0x0000000020000024       0x32 src/KDE_loader.o
 *fill*         0x0000000020000056        0x2
 .text.KDE_LED_On
                0x0000000020000058       0x38 src/KDE_loader.o
 .text.KDE_LED_Off
                0x0000000020000090       0x38 src/KDE_loader.o
 .text.hang_around
                0x00000000200000c8       0x26 src/KDE_loader.o
 .text.loader   0x00000000200000ee       0x1c src/KDE_loader.o
                0x00000000200000ee                loader
 .debug_info    0x000000002000010a      0x372 src/KDE_loader.o
 .debug_abbrev  0x000000002000047c      0x193 src/KDE_loader.o
 .debug_aranges
                0x000000002000060f       0x40 src/KDE_loader.o
 .debug_ranges  0x000000002000064f       0x30 src/KDE_loader.o
 .debug_macro   0x000000002000067f       0xc8 src/KDE_loader.o
 .debug_macro   0x0000000020000747      0x112 src/KDE_loader.o
 .debug_line    0x0000000020000859      0x50e src/KDE_loader.o
 .debug_str     0x0000000020000d67    0x880ae src/KDE_loader.o
                                      0x88234 (size before relaxing)
 .comment       0x0000000020088e15       0x53 src/KDE_loader.o
                                         0x54 (size before relaxing)
 .debug_frame   0x0000000020088e68       0x90 src/KDE_loader.o
 .ARM.attributes
                0x0000000020088ef8       0x34 src/KDE_loader.o
                0x0000000020088f2c                . = ALIGN (0x4)
                0x0000000020088f2c                _loader_end = .
                0x0000000000088f2c                _loader_size = SIZEOF (KDE_loader)

Of course, lots of people have been here before e.g. https://stackoverflow.com/questions/52439702/why-object-files-debug-str-so-big

but I have no idea how to sort this. I know debug data bloats the code a bit but this is huge.

Further digging finds that debug_str is everywhere in the map file, in massive sizes, but clearly is just stored locally, for debugging purposes in the IDE. In this case, kde_loader.c, it is getting included in my binary file. Or more likely just somehow its size is incorrectly being added into the loader size.

Anyway, changing the memcpy size to 16k stops it crashing



but it hangs when it calls loader(). The assembler shows that the address of loader() is clearly wrong - it is still in flash (0x803a730)

« Last Edit: July 15, 2021, 05:10:35 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
When I tested the linker script entries, all .o files were in the current directory. I think that's the reason it worked without the * wildcard in front of the filename.

Since the debug sections are not required for running the code, they don't need to be included in the KDE_loader segment.
Instead of including all input sections of KDE_loader.o, you can be more selective:

Code: [Select]
KDE_loader 0x20000000 : AT(_loader_loadaddr) {
  _loader_start = .;
  *KDE_loader.o(.text*)
  *KDE_loader.o(.rodata*)
  *KDE_loader.o(.data*)
  *KDE_loader.o(.ARM.attributes)
  . = ALIGN(4);
  _loader_end = .;
 }

That's what I meant with "fine tuning" the linker script after inspecting the map.
Not sure if .ARM.attributes is needed, but it is not so big.

Quote
The assembler shows that the address of loader() is clearly wrong - it is still in flash (0x803a730)

The called address here is not loader, but __loader_veneer. This is obviously a long jump trampoline generated by the linker, which still resides in close distance the caller (i.e. flash is correct), and the code at the address __loader_veneer does the actual jump to loader() then.

Can you single-step the next few instructions?

Edit:

In order to avoid the linker-generated veneer, you could try to change the extern declaration of loader() in main.c from
Code: [Select]
extern void loader();
to
Code: [Select]
extern void loader() __attribute__((long_call));

My understanding is that gcc generates an indirect call to loader() then, avoiding the need for the veneer. At the end it still should not make much diference.
« Last Edit: July 15, 2021, 08:35:19 pm by gf »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
OK thanks. Yes that worked. Loader is 0x140 in size:

Code: [Select]
KDE_loader      0x0000000020000000      0x140 load address 0x000000000803b698
                0x0000000020000000                _loader_start = .
 *KDE_loader.o(.text*)
 .text.HAL_GPIO_WritePin
                0x0000000020000000       0x32 src/KDE_loader.o
 *fill*         0x0000000020000032        0x2
 .text.KDE_LED_On
                0x0000000020000034       0x38 src/KDE_loader.o
 .text.KDE_LED_Off
                0x000000002000006c       0x38 src/KDE_loader.o
 .text.hang_around
                0x00000000200000a4       0x26 src/KDE_loader.o
 .text.loader   0x00000000200000ca       0x1c src/KDE_loader.o
                0x00000000200000ca                loader
 *KDE_loader.o(.rodata*)
 *fill*         0x00000000200000e6        0x2
 .rodata        0x00000000200000e8        0xc src/KDE_loader.o
                0x00000000200000e8                GPIO_PIN_X
 *KDE_loader.o(.data*)
 .data          0x00000000200000f4       0x18 src/KDE_loader.o
                0x00000000200000f4                GPIO_PORT_X
 *KDE_loader.o(.ARM.attributes)
 .ARM.attributes
                0x000000002000010c       0x34 src/KDE_loader.o
                0x0000000020000140                . = ALIGN (0x4)
                0x0000000020000140                _loader_end = .
                0x0000000000000140                _loader_size = SIZEOF (KDE_loader)

Is "extern char _loader_start;" right? I know this is a dumb Q but char can hold only a byte, not a 32 bit address. When I step through it, hovering on that value shows decimal 16, which is obviously garbage. Trying "extern _loader_start;" doesn't help.

BUT the transfer to loader() does work! I am stepping through the code, and the PC shows 0x2... so it is running in RAM!

and THANK YOU this runs:

Code: [Select]
void loader(void)
{


for (;;)
{
KDE_LED_On(KDE_LED2);
hang_around(200);
KDE_LED_Off(KDE_LED2);
hang_around(200);
}

}

The values of these, when hovered over in single stepping

Code: [Select]
extern char _loader_start;
extern char _loader_end;
extern char _loader_loadaddr;
extern char _loader_bss_start;
extern char _loader_bss_end;

are

0x20000000
0x20000140
empty
empty
empty

However, when I single step through the asm code I see the correct values in all the registers. BSS size=0, FWIW, even if I add
uint8_t fred[256];
or
volatile uint8_t fred[256];
outside any loader function, which should generate a bss of 256 bytes. The zero value is confirmed by

loader_bss      0x0000000020000140        0x0
                0x0000000020000140                _loader_bss_start = .
 *KDE_loader.o(.bss*)
                0x0000000020000140                . = ALIGN (0x4)
                0x0000000020000140                _loader_bss_end = .

Anyway, it is running :)

I don't need the bss, because I can keep all arrays etc within functions, but it would be nice to fix it. And the non-functioning could be hiding another issue.

The veneer removal syntax worked, too.

This is brilliant, not least because it avoids maintaining a separate project just to generate the loader.

When I am done, I will post the details, because it is bound to help others. The net is full of people who struggled with RAM code, usually for the purpose of rewriting the CPU FLASH.

As an aside, I timed that delay function, and RAM code runs 12.5% SLOWER than FLASH code, so the "cache accelerator" works perfectly well for code which fits into the cache. This is contrary to what has been posted elsewhere - at least for fairly compact code. I did not expect it to run slower though; it is supposed to run with zero wait states!

Code: [Select]
// Hang around for delay in ms. Approximate but doesn't need interrupts etc working.
// Tweaked for RAM resident code which runs a little slower (!) than FLASH resident code.

static void hang_around(uint32_t delay)
{

uint32_t fred = 15100*delay;

while (fred>0)
{
fred--;
}

}
« Last Edit: July 15, 2021, 08:56:52 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
Is "extern char _loader_start;" right?

It is just a dummy variable, in order that that the linker-defined symbol _loader_start (which is an address) can be referenced from C as if it were the address of an extern varialbe.
The type matters only for the size calculation &_loader_end - &_loader_start. If the pointer difference should be in bytes, then type char is a good idea.
You could also declare it extern char _loader_start[]; Then you need to omit the & for the memcpy() and memset() arguments.

Edit:

Which processor model is it?
I'm also wondering, whether a Instruction Synchronization Barrier (ISB) were actually required, before calling loader().

Edit:

Quote
However, when I single step through the asm code I see the correct values in all the registers. BSS size=0, FWIW, even if I add
uint8_t fred[256];
or
volatile uint8_t fred[256];
outside any loader function, which should generate a bss of 256 bytes. The zero value is confirmed by

Can you do objdump -h KDE_loader.o?
Is a .bss with 256 bytes present in the .o file?
Or did the compiler possibly generate a common block instead?
« Last Edit: July 15, 2021, 09:45:35 pm by gf »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Thank you.

It's a 32F417.

The bss stuff I will try tomorrow or the day after - running around a bit :)

What an excellent forum this is.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
It is also possible that fred[] gets garbage-collected if it is not used (i.e. not referenced from anywhere).
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
That's why I tried "volatile".

However, other unused things don't get removed e.g. I declare a 48k uint8_t array in main.c which goes in CCM and is used for the FreeRTOS workspace, and another 16k dummy one, not referenced, to fill that out to 64k, so that CCM usage shows as "64k" in the CubeIDE project usage numbers, to show that CCM is all full :) The last 16k is used for the general stack for ISRs etc (SP set to top of CCM) but the IDE has no way of knowing about that, and would show the last 16k as available, which might confuse the hell out of me if I have to revisit it in 5 years' time :)

I will do more tests tomorrow.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
That's why I tried "volatile".

However, other unused things don't get removed e.g. I declare a 48k uint8_t array in main.c which goes in CCM and is used for the FreeRTOS workspace, and another 16k dummy one, not referenced, to fill that out to 64k, so that CCM usage shows as "64k" in the CubeIDE project usage numbers, to show that CCM is all full :) The last 16k is used for the general stack for ISRs etc (SP set to top of CCM) but the IDE has no way of knowing about that, and would show the last 16k as available, which might confuse the hell out of me if I have to revisit it in 5 years' time :)

I will do more tests tomorrow.

As said, do an objdump -x on the .o file to see what's actually present in the object file.
If the compiler generates a common block for the global data you need to add

loader_bss _loader_end : {
  _loader_bss_start = .;
  *KDE_loader.o(.bss*)
  *KDE_loader.o(COMMON)
  . = ALIGN(4);
  _loader_bss_end = .;
 }


[ Actually, you can add this in general. ]
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
Why are you doing this? Your RAM code has nothing common with the main code - separate memory map, init, library functions set, lifecycle. It can’t use code/data of the main part and vice versa. What are you trying to do now is implementing two isolated projects built as one. What’s the point? Move the RAM code to a separate project, configure it to output a .bin, then use INCBIN directive in the main part (so there will be no custom build steps, just build RAM code, then build the main and RAM code will be included automatically), memcpy() it to a fixed address, call it and hang.
If you need to pass any data between the parts declare a struct placed in a separate noinit section at the same fixed address in both projects (or use some regs like BKP->DRx if the data is small).
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
It was the OP's requirement:

However, I don't just want to load a chunk of binary data into RAM. That is easy. One could even have a .c file with an initialised uchar array in it; that will automatically end up in initialised data. The challenge is how to implement in effect

ORG 0x20000000
C code goes here

and have this within the same project.
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
Sure, I’ve read the "ends up with two projects to maintain, and switch between them" remark. But this alternative two-in-one solution would be error prone and even harder to maintain it the end, IMO. And there are "bigger picture" questions, i.e. what happens if the power is cut in the middle of SFlash->Flash copy? Answering that could move even further away from two-in-one solution, like moving the flash copy code to some kind of bootloader that survives update failures and repeats the copy process on the next power on.
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
Sure, the usual solution is rather to have a separate bootloader in a dedicated region of flash, which is not overwritten when a new image is flashed, so that it keeps functional when the flash update is aborted somewhere in the middle.
 
The following users thanked this post: harerod

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
"Why are you doing this? "

Firstly, in my general business I often have to revisit old projects. Right now I am doing an update of a job last done c. 1997 (analog only, but quite tricky). This is why I am using Protel PCB 2.8 (1995) :) So I am very careful to do stuff in a way which makes it as easy as possible to do this. This is something 99% of designers don't need to worry about because they will move on every few years, or more often. But this is my business and I have to look after it, because in the end there is only me. And you know how hard it is to get into old software. Most people who are paid to do that really hate it. And there are multiple reasons for doing just one project e.g. archiving, documentation, etc. Project archiving is a particular problem which I have struggled with many times...

"the usual solution is rather to have a separate bootloader in a dedicated region of flash, which is not overwritten when a new image is flashed, so that it keeps functional when the flash update is aborted somewhere in the middle."

This is how it will be done. The loader will be written into the top 16k of the 1MB FLASH (it will be originally written there during factory programming, when the whole 1MB will be written, using SWD) and it (well, the RAM resident copy of it) will never write into this top 16k. But there is a little problem: on the 32F417 the top 128k has to be erased as a single block, so there will be a window of opportunity for bricking the product. Very unlikely, because the flashing will start at the bottom and by the time you get to the top 128k you have had error-free writes to all the lower blocks. Also the erasure of the 128k block will necessitate the top 16k of it to be temporarily saved in RAM and then immediately written back. The way to avoid this "brick window" is to have the loader somewhere other than the top 128k, but that causes other issues.

"Your RAM code has nothing common with the main code"

That isn't actually the case. There is a huge amount of common data e.g. the huge .h files full of port addresses, pin names, etc. These come from the ST libraries. This stuff can be #included in both the main code (many .c files) and in the loader.c file.

"If you need to pass any data between the parts declare a struct placed in a separate noinit section "

There will be some "data passing" involved because e.g. the loader will be executed at every power-up but it may need to perform different actions. The plan is to get main() to shove some data into a serial (SPI) flash which this product also has, and the loader can pick it up. Or one could store data in the 32F4's RTC data storage area; that is less good because it will be lost if the RTC backup battery (a supercap) is not charged, or not even fitted, and there is a power-down involved. The data passing has to survive a power-down, for reasons not easy to explain, but the amount of data passed to the loader can be done in a single byte.

I actually have to do something else. I need a copy of that loader to be compiled to execute at the top 16k of the FLASH, and arranged so that the SWD writes it there. I realise, from reading various things around that this may generate a huge .elf where most of the 1MB is 0x00 or some such, but that's ok because it still takes only seconds to write it. And that loader will be what gets copied to RAM. Alternatively I could have the loader anywhere in FLASH and a bit of code which writes it to FLASH if not already there, but writing it there using SWD is the cleanest way. The entry point of the FLASH  based loader will obviously need to be at the start of the 16k block, so the loader.c file will need to start with a function which just contains a jump/call to the real loader which does the work.

So I need two copies of the loader, one compiled/linked to execute at "1MB minus 16k" and the other at 0x20000000. Both will be actually run from those addresses. Relocatable code would be a neat solution...
« Last Edit: July 17, 2021, 07:22:36 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline harerod

  • Frequent Contributor
  • **
  • Posts: 449
  • Country: de
  • ee - digital & analog
    • My services:
"Why are you doing this? "
...
"the usual solution is rather to have a separate bootloader in a dedicated region of flash, which is not overwritten when a new image is flashed, so that it keeps functional when the flash update is aborted somewhere in the middle."

This is how it will be done. The loader will be written into the top 16k of the 1MB FLASH (it will be originally written there during factory programming, when the whole 1MB will be written, using SWD) and it (well, the RAM resident copy of it) will never write into this top 16k. But there is a little problem: on the 32F417 the top 128k has to be erased as a single block, so there will be a window of opportunity for bricking the product. Very unlikely, because the flashing will start at the bottom and by the time you get to the top 128k you have had error-free writes to all the lower blocks. Also the erasure of the 128k block will necessitate the top 16k of it to be temporarily saved in RAM and then immediately written back. The way to avoid this "brick window" is to have the loader somewhere other than the top 128k, but that causes other issues.
...

I love this thread, because it shows many interesting facets of STM32 operation. Running dynamically relocatable code on a MCU optimized for flash operation - I love that.
What I don't understand is why you need to put the loader into the top sector. The boring and time proven concept is using some of the small sectors at the beginning of the flash memory to either trampoline to the app or execute loader functionality. Again, I am certain you have your reason for doing this and I enjoy reading the information users have put into this thread.
« Last Edit: July 17, 2021, 12:32:25 pm by harerod »
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
"Your RAM code has nothing common with the main code"

That isn't actually the case. There is a huge amount of common data e.g. the huge .h files full of port addresses, pin names, etc. These come from the ST libraries. This stuff can be #included in both the main code (many .c files) and in the loader.c file.
I mean the resulting binaries. There is nothing wrong in including same headers/linking same libraries while building a separate binary. And you don't need it to be in some unrelated place, put it into a subdir of the main project, archive everything together, include/link files from the main project freely, but output a separate .elf/.bin - this will ensure that there is no code/data shared between the two parts, no matter what you'll change in the sources. You are ok with .ld script tweaking, why not do the same with the Makefile? Link each of the two bins separately (from it's own file set and with own .ld). Of course you can do it your way, but you are working against the nature of linker now.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
I don't appear to have objdump.exe (as a part of ST Cub IDE, or anywhere else). Anyway, the .map file shows fred123[256] in the Common section



Regarding doing two copies of the loader, compiled/loaded for two addresses

Top 16k block of CPU FLASH: 0x08100000 minus 16k = 0x80FC000
Base of RAM:         0x20000000

it seems to me that I can have two .c files, called say loader_flash.c and loader_ram.c, and #include the same loader file in both, say loader_common.c. Then have two sections in the link script, loading loader_flash.0 at 0x80FC000 and loader_ram.0 at 0x20000000. Can anyone think of a problem with this? I think it should be loader_common.txt otherwise the makefile creation script will try to compile it, but I just want to #include it as a block of text.




« Last Edit: July 17, 2021, 06:07:24 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14447
  • Country: fr
I don't appear to have objdump.exe (as a part of ST Cub IDE, or anywhere else). Anyway, the .map file shows fred123[256] in the Common section

I'm pretty sure it uses GCC as a compiler, so I'd be pretty suprised if it didn't come with objdump.
Look for 'arm-none-eabi-objdump.exe', as it's normally the exact file name for this utility for ARM Cortex-M targets.
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
I also think so. Like the names of the other cross tools it is likely prefixed with arm-none-eabi-, i.e. arm-none-eabi-objdump, arm-none-eabi-gcc, etc.

Potential issue regarding multiple inclusion might be duplicate symbols (at least global ones; local symbols are hopefully resolved by the linker inside the same .o file only).
The two modules should neither rererence extern symbols (in order to be self-contained), nor export global symbols. I.e. declare all functions and all variables outside functions as static.
Hopepully the included HAL, etc. header files don't define/reference any non-static global/extern data and functions either -- do you know and/or can you ensure this?
If the one or other symbol must still be exported -- like loader() -- then these symbols must get different names in the two modules, e.g. loader1() and loader2().

Is the 2nd loader instance also copied to 0x20000000 prior to invocation?
« Last Edit: July 17, 2021, 06:30:44 pm by gf »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Hmmm very good points.

Both loaders (unless possible to do relocatable, which nobody has yet reported as possible) will have to start life in the CPU FLASH (obviously). And since neither may be overwritten by any code intentionally flashing the CPU, both will have to live in that 16k block. So the budget is 8k each :)

The one loaded to run at 0x080FC000 (loader_flash.c) will only ever live at 0x080FC000 (the base of the uppermost 16k block). Its entry point will be 0x080FC000.

The one loaded to run at 0x20000000 (loader_ram.c) will live in the top half of the 16k block i.e. at 0x080FE000. Its entry point will be 0x080FE000. It will be copied to RAM by loader_flash.c.

The two loaders won't be exactly the same because only loader_flash.c will do the copying to RAM.

The whole 16k block will be written during factory config, using SWD.

I will try this next and see whether I get a huge .elf file, or some other problem.

This scheme should address the requirements:

- have a non-brickable* product, which can always restore working CPU software from one stored in serial SPI FLASH
- be able to re-flash the CPU using RAM-executed code
- generate both versions of the loader within the same one project
« Last Edit: July 17, 2021, 07:11:28 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14447
  • Country: fr
Whatever you do, having a fixed, non-erasable minimal loader somewhere sounds like a good idea. Now a problem with this is always: what do you do if some bug is found in this loader once in the field? (You better test it thoroughly before releasing your product ;D )
« Last Edit: July 17, 2021, 07:19:15 pm by SiliconWizard »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
I don't think there is any way to fix such a loader (in the field) other than by having a second CPU.

I need a bit more help with the linker script... I am testing just loader_flash (currently called KDE_loader.c) which is compiled for 0x080FC000 and it is supposed to end up in the FLASH at 0x080FC000. But this (the bold bit) is not happening).

My linker script

Code: [Select]
/*


/* Entry Point */
ENTRY(Reset_Handler)

/* Reference loader to ensure that is gets linked-in */
EXTERN(loader)    

/* Highest address of the main stack */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    /* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
_Min_Heap_Size  = 0xa000;     /* 40k heap - min size; it can grow to end of main RAM  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)  : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)     : ORIGIN = 0x10000000, LENGTH = 64K
}

/* Define output sections */
SECTIONS
{
/* loader_bss and loader sections must come first, in order to override the wildcards in subsequent sections */
 loader_bss _loader_end : {
  _loader_bss_start = .;
  *KDE_loader.o(.bss*)
  *KDE_loader.o(COMMON)
  . = ALIGN(4);
  _loader_bss_end = .;
 }
 
 KDE_loader 0x080FC000  : AT(_loader_loadaddr) {
  _loader_start = .;
  *KDE_loader.o(.text*)
  *KDE_loader.o(.rodata*)
  *KDE_loader.o(.data*)
  *KDE_loader.o(.ARM.attributes)
  . = ALIGN(4);
  _loader_end = .;
 }
 
  _loader_size = SIZEOF(KDE_loader);

  /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH


   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH

  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(.fini_array*))
    KEEP (*(SORT(.fini_array.*)))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH

 
  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
 /*   . = ALIGN(4); */
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT >FLASH

 /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

 /* dummy placeholder in flash for loader section, to count flash usage */
 .KDE_loader : {
  . = . + SIZEOF(KDE_loader);
 } AT >FLASH
 _loader_loadaddr = LOADADDR(.KDE_loader);

  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
 
  /* ._user_heap_stack : */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM
  /*   } >CCMRAM */


  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
 
  /* CCM-RAM section
  *
  * IMPORTANT NOTE!
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section. 
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
   
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}


I reckon I need to somehow add its sections into this



Note that I am not writing code which will program the loader into the FLASH at 0x080FC000. I could do that, but I want this to be done during factory config, using SWD, when the whole 1MB gets written.

I've tried this

Code: [Select]
  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
   
    *KDE_loader.o(.text*)
  *KDE_loader.o(.rodata*)
  *KDE_loader.o(.data*)
  *KDE_loader.o(.ARM.attributes)
 
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH



and sure enough the call to loader() in main() does go to the right address but there is nothing there so the debugger didn't program the FLASH:

« Last Edit: July 17, 2021, 08:35:50 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
My understanding is that you can't include the sections of KDE_loader.o twice.
Once an input section, say KDE_loader.o(.text), has ben included in one output section (say KDE_loader), it is considered "consumed" and won't be included in a different output section (say .text) any more.
So I think you need two object files, e.g. KDE_loader_flash.o and KDE_loader_ram.o.

Btw, at which address are data and bss of loader_flash supposed to reside? Also at 0x20000000? Or do they need to coexist simultaneously with the data of the main program (without overlap)?
« Last Edit: July 17, 2021, 09:55:52 pm by gf »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
"So I think you need two object files, e.g. KDE_loader_flash.o and KDE_loader_ram.o."

Yes.

"at which address are data and bss of loader_flash supposed to reside? Also at 0x20000000? "

loader_flash needs to be self contained at 0x080FC000-0x080FDFFF (lower 8k of uppermost 16k of CPU FLASH). It executes there.

loader_ram needs to be self contained at 0x080FE000-0x080FFFFF (upper 8k of uppermost 16k of CPU FLASH). It executes at 0x20000000.

"Or do they need to coexist simultaneously with the data of the main program (without overlap)?"

The main program is not needed for this scheme to work. If necessary I can avoid any need for initialised data or bss, because at this stage I have loads of stack space.

Well... if you completely trash the bottom of the CPU FLASH, where the vector table is, then loader_flash will never get run, but I can't see any way around that, with just a single CPU. If you had a second CPU then you could boot the main CPU using one of the non-writable loaders (BOOT0=1, I think) and use the 2nd CPU to feed the 1st one with bytes of code via a serial port, SPI, CAN, etc. One can minimise the risk of this situation by never writing (in the field) the 1st 4k block. And of course never writing (in the field) the topmost 16k block.

Currently what I appear to be missing is getting the SWD debugger to write the code into 0x080FC000+

Now that the principle of running code in RAM has been proven, I am going back a step to getting loader_flash (in file KDE_loader_FLASH.c) to be SWD-programmed to 0x080FC000 and do something (flash some LED) there. When that works I will move to the RAM copy of it. My current linker script is this:

Code: [Select]
/* Entry Point */
ENTRY(Reset_Handler)

/* Reference loader to ensure that is gets linked-in */
EXTERN(loader_flash)    

/* Highest address of the main stack */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    /* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
_Min_Heap_Size  = 0xa000;     /* 40k heap - min size; it can grow to end of main RAM  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)  : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)     : ORIGIN = 0x10000000, LENGTH = 64K
}

/* Define output sections */
SECTIONS
{
/* loader_flash_bss and loader_flash sections must come first, in order to override the wildcards in subsequent sections */
 loader_flash_bss _loader_flash_end : {
  _loader_flash_bss_start = .;
  *KDE_loader_FLASH.o(.bss*)
  *KDE_loader_FLASH.o(COMMON)
  . = ALIGN(4);
  _loader_flash_bss_end = .;
 }
 
 KDE_loader_FLASH 0x080FC000  : AT(_loader_flash_loadaddr) {
  _loader_flash_start = .;
  *KDE_loader_FLASH.o(.text*)
  *KDE_loader_FLASH.o(.rodata*)
  *KDE_loader_FLASH.o(.data*)
  *KDE_loader_FLASH.o(.ARM.attributes)
  . = ALIGN(4);
  _loader_flash_end = .;
 }
 
  _loader_flash_size = SIZEOF(KDE_loader_FLASH);

  /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH

   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH

  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(.fini_array*))
    KEEP (*(SORT(.fini_array.*)))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH

 
  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
 /*   . = ALIGN(4); */
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT >FLASH

 /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

 /* dummy placeholder in flash for loader section, to count flash usage */
 .KDE_loader_FLASH : {
  . = . + SIZEOF(KDE_loader_FLASH);
 } AT >FLASH
 _loader_flash_loadaddr = LOADADDR(.KDE_loader_FLASH);

  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
 
  /* ._user_heap_stack : */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM
  /*   } >CCMRAM */


  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
 
  /* CCM-RAM section
  *
  * IMPORTANT NOTE!
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section. 
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
   
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}


You can see R3 is the right value but those addresses are all 0xFF



It must be something simple. I thought that if you create an "output section" then the debugger will simply pick that up and program the CPU FLASH with it. The .map file suggests the addresses are correct:

Code: [Select]
loader_flash_bss
                0x00000000080fc140        0x0
                0x00000000080fc140                _loader_flash_bss_start = .
 *KDE_loader_FLASH.o(.bss*)
 *KDE_loader_FLASH.o(COMMON)
                0x00000000080fc140                . = ALIGN (0x4)
                0x00000000080fc140                _loader_flash_bss_end = .

KDE_loader_FLASH
                0x00000000080fc000      0x140 load address 0x000000000803b668
                0x00000000080fc000                _loader_flash_start = .
 *KDE_loader_FLASH.o(.text*)
 .text.HAL_GPIO_WritePin
                0x00000000080fc000       0x32 src/KDE_loader_FLASH.o
 *fill*         0x00000000080fc032        0x2
 .text.KDE_LED_On
                0x00000000080fc034       0x38 src/KDE_loader_FLASH.o
 .text.KDE_LED_Off
                0x00000000080fc06c       0x38 src/KDE_loader_FLASH.o
 .text.hang_around
                0x00000000080fc0a4       0x26 src/KDE_loader_FLASH.o
 .text.loader_flash
                0x00000000080fc0ca       0x1c src/KDE_loader_FLASH.o
                0x00000000080fc0ca                loader_flash
 *KDE_loader_FLASH.o(.rodata*)
 *fill*         0x00000000080fc0e6        0x2
 .rodata        0x00000000080fc0e8        0xc src/KDE_loader_FLASH.o
                0x00000000080fc0e8                GPIO_PIN_X
 *KDE_loader_FLASH.o(.data*)
 .data          0x00000000080fc0f4       0x18 src/KDE_loader_FLASH.o
                0x00000000080fc0f4                GPIO_PORT_X
 *KDE_loader_FLASH.o(.ARM.attributes)
 .ARM.attributes
                0x00000000080fc10c       0x34 src/KDE_loader_FLASH.o
                0x00000000080fc140                . = ALIGN (0x4)
                0x00000000080fc140                _loader_flash_end = .
                0x0000000000000140                _loader_flash_size = SIZEOF (KDE_loader_FLASH)
« Last Edit: July 18, 2021, 06:53:43 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
Try this. Given that that the loaders have fixed addresses now, I tried to organize a bit cleaner.

Code: [Select]
/* Entry Point */
ENTRY(Reset_Handler)

/* Reference loaders to ensure that they get linked-in */
EXTERN(loader_flash)
EXTERN(loader_ram)

/* Highest address of the main stack */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    /* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
_Min_Heap_Size  = 0xa000;     /* 40k heap - min size; it can grow to end of main RAM  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH (rx)              : ORIGIN = 0x08000000, LENGTH = 1024K - 16K
  RAM (xrw)               : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)          : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)             : ORIGIN = 0x10000000, LENGTH = 64K

  /* flash regions for loaders */
  FLASH_LOADER_FLASH (rx) : ORIGIN = 0x080FC000, LENGTH = 8K
  FLASH_LOADER_RAM (rx)   : ORIGIN = 0x080FE000, LENGTH = 8K

  /* ram regions for loaders, overlap with RAM */
  RAM_LOADER_FLASH (rw)   : ORIGIN = 0x20000000, LENGTH = 128K
  RAM_LOADER_RAM (rw)     : ORIGIN = 0x20000000, LENGTH = 128K
}

/* Define output sections */
SECTIONS
{
  /* loader_flash_bss and loader_flash sections must come first,
     in order to override the wildcards in subsequent sections */

  KDE_loader_FLASH : {
     *KDE_loader_FLASH.o(.text*)
     *KDE_loader_FLASH.o(.rodata*)
     . = ALIGN(4);
  } >FLASH_LOADER_FLASH

  KDE_loader_FLASH_data : {
     _loader_flash_data_start = .;
     *KDE_loader_FLASH.o(.data*)
     . = ALIGN(4);
     _loader_flash_data_end = .;
  } >RAM_LOADER_FLASH AT >FLASH_LOADER_FLASH

  _loader_flash_data_loadaddr = LOADADDR(KDE_loader_FLASH_data);

  KDE_loader_FLASH_bss : {
     _loader_flash_bss_start = .;
     *KDE_loader_FLASH.o(.bss*)
     *KDE_loader_FLASH.o(COMMON)
     . = ALIGN(4);
     _loader_flash_bss_end = .;
  } >RAM_LOADER_FLASH

  /* ============================================================*/

  KDE_loader_RAM : {
     _loader_ram_start = .;
     *KDE_loader_RAM.o(.text*)
     *KDE_loader_RAM.o(.rodata*)
     *KDE_loader_RAM.o(.data*)
     . = ALIGN(4);
     _loader_ram_end = .;
  } >RAM_LOADER_RAM AT >FLASH_LOADER_RAM

  _loader_ram_loadaddr = LOADADDR(KDE_loader_RAM);

  KDE_loader_RAM_bss : {
     _loader_ram_bss_start = .;
     *KDE_loader_RAM.o(.bss*)
     *KDE_loader_RAM.o(COMMON)
     . = ALIGN(4);
     _loader_ram_bss_end = .;
  } >RAM_LOADER_RAM

  /* ============================================================*/

  /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH

   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH

  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(.fini_array*))
    KEEP (*(SORT(.fini_array.*)))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH

 
  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
 /*   . = ALIGN(4); */
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT >FLASH

 /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
 
  /* ._user_heap_stack : */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM
  /*   } >CCMRAM */


  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
 
  /* CCM-RAM section
  *
  * IMPORTANT NOTE!
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section.
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
   
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}

 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
Well... if you completely trash the bottom of the CPU FLASH, where the vector table is, then loader_flash will never get run
You can create a separate vector table (it is just a const void *[] array) for loader and place loader+VT in the bottom sector to run before the main part (starting at the next free sector) and be independent.

BTW, looks like Cube has a "subproject" concept for building multiple separate bins from a combination of private/shared sources within a single project (used for multicore CPUs normally): https://community.st.com/s/question/0D53W000003xwtZ/is-it-possible-to-have-cubeide-project-with-multiple-main-functions.
 

Offline harerod

  • Frequent Contributor
  • **
  • Posts: 449
  • Country: de
  • ee - digital & analog
    • My services:
Quote from: peter-h on Today at 06:08:57
...
BTW, looks like Cube has a "subproject" concept for building multiple separate bins from a combination of private/shared sources within a single project (used for multicore CPUs normally): https://community.st.com/s/question/0D53W000003xwtZ/is-it-possible-to-have-cubeide-project-with-multiple-main-functions.

I prefer to maintain the bootloader and the application within the same project. A complex bootloader may even use some of the same library source (e.g. wear levelling file system, network stack) as the main application.
One could use CubeIDE build options to switch between different builds, for instance: app standalone, bootloader, app with bootloader. Build options only vary in some #defines, linker scripts and source files ("exclude from build").
One can start from one of the standard Release/Debug builds and modify settings as needed.
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8168
  • Country: fi
It's very usual that the bootloader or flasher shares the requirements with the app. Of course, if it works through UART it's no big deal to replicate the UART "driver" which is 10 lines of code. But the whole point of custom developed bootloaders/flashers is they use whatever interfaces the application is using, allowing more convenient FW update compared to working with a dedicated flashing cable (JTAG, SWD or UART). If both the application and the bootloader work through CAN, or even more relevantly, TCP/IP over Ethernet or wireless, then sharing the communication code is obvious, and maintaining the whole shebang in one "project" is likely easier.
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
Well... if you completely trash the bottom of the CPU FLASH, where the vector table is, then loader_flash will never get run
You can create a separate vector table (it is just a const void *[] array) for loader and place loader+VT in the bottom sector to run before the main part (starting at the next free sector) and be independent.

Sure, the boot loader and the main program can have their own vector tables, but how does that help at power-on, once the flash at 0x08000000 is bricked?
I guess the only way to recover is either via the on-chip boot loader in system memory (after asserting the BOOTx pins), or programming via SWD.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Now that this point has been raised about trashing the bottom 4k, I will make sure the bottom 4k is constant, has nothing it in of value, the vectors all point to addresses 4k higher up, and the next 4k has at its base a vector "reflection" table whose vectors point where the bottom ones did.

And now that the bottom 4k is constant (4k of mostly empty space), my loader can always rewrite it (if say its CRC has changed). Especially if an "unbrick me" button is held down at power-up, etc. This will ensure that loader_flash can always be executed.

But I can arrange my loader to never write the bottom 4k or the top 16k, so the only ways to brick this box will be

- somebody writes some code which deliberately writes these areas, and then they will learn a lesson ;)
- the power fails during the programming of the bottom 4k block (risk can be much reduced by programming it last, etc)

There are CPUs which can self-load from an SPI FLASH chip. No idea why ST didn't do that. Xilinx FPGAs had self-loading in the 1990s, although of course they had to: they were just SRAM :)

All the other loading methods (serial, etc) need another CPU, so you are just replacing one QA problem with another. Nokia used a "carefully validated coprocessor" for the GSM stack in their phones, until they went to Symbian one of whose objectives was do deliver a reliable RTOS which could run the GSM stack also. It worked, except that some 3G/HSPA packets crashed the phone :)

Thank you again gf I will give that a go.
« Last Edit: July 18, 2021, 01:12:57 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
The constant (or almost constant) part at 0x08000000 is typically the custom bootloader, which is updated very rarely, if ever.
Size must be 16k (not 4k), though, since the flash is divided (unevenly) into sectors of 4 x 16k + 1 x 64x + 7 x 128k, according to the reference manual.
I.e. you can't erase starting at offset 4k, but you can erase starting starting at offset 16k (-> 2nd 16k sector).
After reset the bootloader is entered in the first place. If no update is pending, then it immediately invokes the main program.

Edit:

AFAIK, there also exist STM32 models with dual-bank flash. My understanding is, while one bank is active, the inactive bank can be upgraded. Once the upgrade finished successfully, the banks are swapped. Don't know any details, though.

Edit:

Quote
And now that the bottom 4k is constant (4k of mostly empty space), my loader can always rewrite it

There will always be a time gap between erasing and re-programming. So if you re-write it, there is always a residual risk that it could crash in this time gap.
Also it does not suffice that the vectors per se are intact, but the vectors point to functions, and the code of these functions need to be intact as well. At least for the reset vector this must be granted.
« Last Edit: July 18, 2021, 01:58:58 pm by gf »
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8168
  • Country: fi
You don't need dual banking to be able to upgrade one version while retaining another and then do the swap, although the banking makes the swap operation easier since you can change where 0x0800 0000 actually maps.

But really, it's enough to have at least 3 sectors: bootloader, app1, app2.

Really, what dual banks offers is parallelism; you can erase and write one bank while running your app off the other bank. Not that useful IMHO, you can usually accept FW updates to be blocking operations.

There are devices with one sector only, like the STM32H750. It's cheaper than STM32H743 and obviously designed to hold the bootloader in the sole sector then run the app off external flash, but if you can accept the small risk of bricking if power loss happens during FW or configuration update, these parts are great bang-for-buck working from the single sector alone.

Small amounts of data storage is possible without erase operations by advancing the storage pointer every time so you write to fresh, erased bits every time.
« Last Edit: July 18, 2021, 01:51:30 pm by Siwastaja »
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
But really, it's enough to have at least 3 sectors: bootloader, app1, app2.

App1 and app2 can alternate, but how do you re-write bootloader in a fail-safe way, then?
(Just in the rare case that it needs to be updated, too)

Quote
Small amounts of data storage is possible without erase operations by advancing the storage pointer every time so you write to fresh, erased bits every time.

Nice trick, indeed ;)
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8168
  • Country: fi
App1 and app2 can alternate, but how do you re-write bootloader in a fail-safe way, then?
(Just in the rare case that it needs to be updated, too)

Obviously, you don't; you play with the probabilities, bootloader can be made a Finished Product with manageable amount of internal testing, so having to update it is much more unlikely; and even if it needs to be done once, it's lower risk than updating the actual app a hundred times. Besides, bootloaders are small; it's a small sector, the erase operation might be just some hundred of milliseconds. Also, the bootloader code fits in RAM so you can receive and verify all of it before starting erase-write. It's colossally bad luck if power loss happens during the last 100ms of this multi-second process.

Swappable banks do not help here. You can always end up doing the mistake of configuring the wrong bank to boot so that it can't recover. Something can always go wrong, it's all about minimizing this probability.
« Last Edit: July 18, 2021, 02:11:59 pm by Siwastaja »
 
The following users thanked this post: harerod

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
"AFAIK, there also exist STM32 models with dual-bank flash. My understanding is, while one bank is active, the inactive bank can be upgraded. Once the upgrade finished successfully, the banks are swapped. Don't know any details, though."

Yes; the 2MB version of the 32F417 can run code from one 1MB while programming the other 1MB.

I looked at this chip but every "nice thing" is another couple of quid and soon you are looking at real money :)

Also 2MB of real code is a huge amount of code.

Thanks for pointing out the bottom block is 16k. I think I will get the loader to write from top down (skipping the uppermost 128k) so the bottom block is written last, and then the top block except its top 16k. Very bad luck then if that fails.
« Last Edit: July 18, 2021, 03:14:11 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Some progress.

The KDE_loader_FLASH code is written (SWD) to 0x080FC000. This is excellent.
The KDE_loader_RAM code is not written (SWD) to 0x080FE000. This must be a simple thing in the linker script.

The KDE_loader_FLASH code does run in that I can step through it, and it behaves normally, but the LED doesn't flash. Looking at the values being passed, I have

static void KDE_LED_On(KDE_LED_NUM led)
{
  HAL_GPIO_WritePin(GPIO_PORT_X[led], GPIO_PIN_X[led], GPIO_PIN_RESET);      // set to 0 to turn on
}

and looking at GPIO_PORT_X I see a basically meaningless value for the port, of 0x20000000 when it should be 0x40020c00. I can't get my head around why this is wrong, because this comes from the #included .h file.

However I did have weird problems which may explain it.

Each of the two loaders has a #include .h file of the same name as the .c file. Both .h files contain this

Code: [Select]
GPIO_TypeDef* GPIO_PORT_X[LEDn] = {LED2_GPIO_PORT,
                                 LED3_GPIO_PORT,
LED4_GPIO_PORT,
                                 LED5_GPIO_PORT,
                                 LED6_GPIO_PORT,
LED7_GPIO_PORT};


const uint16_t GPIO_PIN_X[LEDn] = {LED2_PIN,
LED3_PIN,
LED4_PIN,
LED5_PIN,
                                 LED6_PIN,
LED7_PIN};


but I was getting linker errors on duplicated definitions of GPIO_PORT_X and GPIO_PIN_X, which doesn't make sense. The errors went away when I used _Y instead of _X for the RAM loader.

I cannot see why you cannot have a definition for the same thing in multiple .h files, using the same name. The cause must be that symbols like GPIO_PORT_X are being exported to the linker (to the .o files; I think that means) instead of remaining private to the .c file. So what is different about these two .c files? Most of my .c files have the same #includes e.g. stdio.h etc etc. It's not a problem but I can't get my head around this.

The FLASH version of the loader is:

Code: [Select]
/*
 * KDE_loader_FLASH.c
 *
 *  Created on: 13 Jul 2021
 *      Author: peter
 *
 * This module is self-contained at 0x080FC000-0x080FDFFF (lower 8k of uppermost 16k of CPU FLASH).
 * It is linked to execute there also.
 * Its job is to load the RAM version of itself into RAM. That RAM version lives in the upper 8k
 * i.e. 0x080FE000-0x080FFFFF and is linked to run at base of RAM 0x20000000,
 *
 * All functions here are static, and no functions outside this file can be called by the code here.
 *
 * The 45DB code here is a copy of (most of) 45dbxx.c, with every function having "static" in front of it. Same for LEDs, etc.
 *
 *
 */


#include "stm32f417xx.h"
#include "KDE_loader_FLASH.h"



extern void loader_FLASH_entry();

void loader_flash(void);

// This function must be at the start of this module - this is the entry point.
void loader_FLASH_entry(void)
{
loader_flash();
}


/**
  * @brief  Sets or clears the selected data port bit.
  *
  * @note   This function uses GPIOx_BSRR register to allow atomic read/modify
  *         accesses. In this way, there is no risk of an IRQ occurring between
  *         the read and the modify access.
  *
  * @param  GPIOx where x can be (A..K) to select the GPIO peripheral for STM32F429X device or
  *                      x can be (A..I) to select the GPIO peripheral for STM32F40XX and STM32F427X devices.
  * @param  GPIO_Pin specifies the port bit to be written.
  *          This parameter can be one of GPIO_PIN_x where x can be (0..15).
  * @param  PinState specifies the value to be written to the selected bit.
  *          This parameter can be one of the GPIO_PinState enum values:
  *            @arg GPIO_PIN_RESET: to clear the port pin
  *            @arg GPIO_PIN_SET: to set the port pin
  * @retval None
  */
static void HAL_GPIO_WritePin(GPIO_TypeDef* GPIOx, uint16_t GPIO_Pin, GPIO_PinState PinState)
{

  if(PinState != GPIO_PIN_RESET)
  {
    GPIOx->BSRR = GPIO_Pin;
  }
  else
  {
    GPIOx->BSRR = (uint32_t)GPIO_Pin << 16U;
  }
}


/**
  * @brief  Turns selected LED On.
  * @param  Led: Specifies the Led to be set on.
  *   This parameter can be one of following parameters:
  *     @arg KDE_LED2
  *     @arg KDE_LED3
  *     @arg KDE_LED4
  *     @arg KDE_LED3
  *     @arg KDE_LED5
  *     @arg KDE_LED6
  *     @arg KDE_LED7
  */
static void KDE_LED_On(KDE_LED_NUM led)
{
  HAL_GPIO_WritePin(GPIO_PORT_X[led], GPIO_PIN_X[led], GPIO_PIN_RESET); // set to 0 to turn on
}

/**
  * @brief  Turns selected LED Off.
  * @param  Led: Specifies the Led to be set off.
  *   This parameter can be one of following parameters:
  *     @arg KDE_LED2
  *     @arg KDE_LED3
  *     @arg KDE_LED4
  *     @arg KDE_LED3
  *     @arg KDE_LED5
  *     @arg KDE_LED6
  *     @arg KDE_LED7
  */
static void KDE_LED_Off(KDE_LED_NUM led)
{
  HAL_GPIO_WritePin(GPIO_PORT_X[led], GPIO_PIN_X[led], GPIO_PIN_SET); // set to 1 to turn off
}

/**
  * @brief  Toggles the selected LED.
  * @param  Led: Specifies the Led to be toggled.
  *   This parameter can be one of following parameters:
  *     @arg KDE_LED2
  *     @arg KDE_LED3
  *     @arg KDE_LED4
  *     @arg KDE_LED3
  *     @arg KDE_LED5
  *     @arg KDE_LED6
  *     @arg KDE_LED7
  */



// Hang around for delay in ms. Approximate but doesn't need interrupts etc working.
// Tweaked for FLASH resident code which runs a little faster (!) than RAM resident code.

static void hang_around(uint32_t delay)
{

uint32_t fred = 17000*delay;

while (fred>0)
{
fred--;
}

}

volatile uint8_t fred123[256];

// This is where it all happens...

void loader_flash(void)
{


for (;;)
{
KDE_LED_On(KDE_LED2);
hang_around(200);
KDE_LED_Off(KDE_LED2);
hang_around(200);
}

}


The RAM version of the loader is:

Code: [Select]
/*
 * KDE_loader_RAM.c
 *
 *  Created on: 13 Jul 2021
 *      Author: peter
 *
 * This module is self-contained at 0x080FE000-0x080FFFFF (upper 8k of uppermost 16k of CPU FLASH).
 * It is linked to execute at 0x20000000 (base of RAM).
 * Its job is to load a copy of itself into RAM.
 *
 * All functions here are static, and no functions outside this file can be called by the code here.
 *
 * The 45DB code here is a copy of (most of) 45dbxx.c, with every function having "static" in front of it. Same for LEDs, etc.
 *
 *
 */


#include "stm32f417xx.h"
#include "KDE_loader_RAM.h"


static void loader_ram(void);

extern char _loader_ram_start;
extern char _loader_ram_end;
extern char _loader_ram_loadaddr;
extern char _loader_ram_bss_start;
extern char _loader_ram_bss_end;


// This function must be at the start of this module - this is the entry point.
void loader_RAM_entry()
{

    // copy loader_ram to ram
memcpy(&_loader_ram_start, &_loader_ram_loadaddr, &_loader_ram_end - &_loader_ram_start);

// clear loader's bss
memset(&_loader_ram_bss_start, 0, &_loader_ram_bss_end - &_loader_ram_bss_start);

loader_ram();
}


/**
  * @brief  Sets or clears the selected data port bit.
  *
  * @note   This function uses GPIOx_BSRR register to allow atomic read/modify
  *         accesses. In this way, there is no risk of an IRQ occurring between
  *         the read and the modify access.
  *
  * @param  GPIOx where x can be (A..K) to select the GPIO peripheral for STM32F429X device or
  *                      x can be (A..I) to select the GPIO peripheral for STM32F40XX and STM32F427X devices.
  * @param  GPIO_Pin specifies the port bit to be written.
  *          This parameter can be one of GPIO_PIN_x where x can be (0..15).
  * @param  PinState specifies the value to be written to the selected bit.
  *          This parameter can be one of the GPIO_PinState enum values:
  *            @arg GPIO_PIN_RESET: to clear the port pin
  *            @arg GPIO_PIN_SET: to set the port pin
  * @retval None
  */
static void HAL_GPIO_WritePin(GPIO_TypeDef* GPIOx, uint16_t GPIO_Pin, GPIO_PinState PinState)
{

  if(PinState != GPIO_PIN_RESET)
  {
    GPIOx->BSRR = GPIO_Pin;
  }
  else
  {
    GPIOx->BSRR = (uint32_t)GPIO_Pin << 16U;
  }
}


/**
  * @brief  Turns selected LED On.
  * @param  Led: Specifies the Led to be set on.
  *   This parameter can be one of following parameters:
  *     @arg KDE_LED2
  *     @arg KDE_LED3
  *     @arg KDE_LED4
  *     @arg KDE_LED3
  *     @arg KDE_LED5
  *     @arg KDE_LED6
  *     @arg KDE_LED7
  */
static void KDE_LED_On(KDE_LED_NUM led)
{
  HAL_GPIO_WritePin(GPIO_PORT_Y[led], GPIO_PIN_Y[led], GPIO_PIN_RESET); // set to 0 to turn on
}

/**
  * @brief  Turns selected LED Off.
  * @param  Led: Specifies the Led to be set off.
  *   This parameter can be one of following parameters:
  *     @arg KDE_LED2
  *     @arg KDE_LED3
  *     @arg KDE_LED4
  *     @arg KDE_LED3
  *     @arg KDE_LED5
  *     @arg KDE_LED6
  *     @arg KDE_LED7
  */
static void KDE_LED_Off(KDE_LED_NUM led)
{
  HAL_GPIO_WritePin(GPIO_PORT_Y[led], GPIO_PIN_Y[led], GPIO_PIN_SET); // set to 1 to turn off
}

/**
  * @brief  Toggles the selected LED.
  * @param  Led: Specifies the Led to be toggled.
  *   This parameter can be one of following parameters:
  *     @arg KDE_LED2
  *     @arg KDE_LED3
  *     @arg KDE_LED4
  *     @arg KDE_LED3
  *     @arg KDE_LED5
  *     @arg KDE_LED6
  *     @arg KDE_LED7
  */



// Hang around for delay in ms. Approximate but doesn't need interrupts etc working.
// Tweaked for RAM resident code which runs a little slower (!) than FLASH resident code.

static void hang_around(uint32_t delay)
{

uint32_t fred = 15100*delay;

while (fred>0)
{
fred--;
}

}

volatile uint8_t fred123[256];

// This is where it all happens...

void loader_ram(void)
{


for (;;)
{
KDE_LED_On(KDE_LED2);
hang_around(200);
KDE_LED_Off(KDE_LED2);
hang_around(200);
}

}


Trying to work out the linkfile, I wonder if this is initialised data being placed into COMMON instead of DATA.
« Last Edit: July 18, 2021, 05:54:57 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
Quote
and looking at GPIO_PORT_X I see a basically meaningless value for the port

The KDE_loader_FLASH's code runs from flash, but its (initialized) data segment still needs to be copied to RAM, and its bss segment needs to be zeroed.
The data and bss segments cannot reside in FLASH at runtime, because at this location they were not writable.

The memcpy() and the memset() are actually supposed to be done in the main program, outside the loaders, before calling the entry points (this apllies to both, KDE_loader_FLASH and KDE_loader_RAM).

Btw, I did rather assume that loader_flash() and loader_ram() are the etry points.

Quote
I cannot see why you cannot have a definition for the same thing in multiple .h files, using the same name.

You can't if it is a symbol with global/extern linkage. If it were static, there should not be a conflict.
Didn't I already mention that multiply defined global symbols can become a potential problem?
Try to declare all functions, and all data outside functions static, except for the entry point.

Quote
The KDE_loader_RAM code is not written (SWD) to 0x080FE000.

What does the .map file say?
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
"What does the .map file say?"

.text.loader_RAM_entry
                0x0000000000000000       0x40 src/KDE_loader_RAM.o

"The memcpy() and the memset() are actually supposed to be done in the main program, outside the loaders, before calling the entry points (this apllies to both, KDE_loader_FLASH and KDE_loader_RAM)."

I now have this in main()

Code: [Select]
// === At this point, interrupts should still be disabled ====
// Execute loader

extern char _loader_flash_start;
extern char _loader_flash_end;
extern char _loader_flash_loadaddr;
extern char _loader_flash_bss_start;
extern char _loader_flash_bss_end;

extern char _loader_ram_start;
extern char _loader_ram_end;
extern char _loader_ram_loadaddr;
extern char _loader_ram_bss_start;
extern char _loader_ram_bss_end;

   // copy loader_ram to ram
memcpy(&_loader_ram_start, &_loader_ram_loadaddr, &_loader_ram_end - &_loader_ram_start);

// clear loader_flash's bss
memset(&_loader_flash_bss_start, 0, &_loader_flash_bss_end - &_loader_flash_bss_start);

// clear loader_ram's bss
memset(&_loader_ram_bss_start, 0, &_loader_ram_bss_end - &_loader_ram_bss_start);

extern void loader_FLASH_entry() __attribute__((long_call));
loader_FLASH_entry();

but I don't think the Data (or Common?) sections are getting set up. It does step through the little LED toggling loop ok but with the wrong I/O port address of 0x20000000.

"Try to declare all functions, and all data outside functions static, except for the entry point."

Yes; I failed to spot that these two symbols are initialised data outside of any function :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
It is either

Code: [Select]
memcpy(&_loader_ram_start, &_loader_ram_loadaddr, &_loader_ram_end - &_loader_ram_start);
memset(&_loader_ram_bss_start, 0, &_loader_ram_bss_end - &_loader_ram_bss_start);
__ISB();
loader_ram();

or

Code: [Select]
memcpy(&_loader_flash_data_start, &_loader_flash_data_loadaddr, &_loader_flash_data_end - &_loader_flash_data_start);
memset(&_loader_flash_bss_start, 0, &_loader_flash_bss_end - &_loader_flash_bss_start);
loader_flash();

But not both. The RAM regions of both loaders overlap and are not disjoint, so you need to decide which of them to copy and execute. If you copy/clear both, then the last memcpy/memset wins.

Quote
"What does the .map file say?"

.text.loader_RAM_entry
                0x0000000000000000       0x40 src/KDE_loader_RAM.o

I rather mean the whole map file, to see what was included (and at which addresses), and what was skipped.

Edit:
And don't forget to disable interrupts and to abort any DMA operations in progress before invoking memcpy(), since data accessed by interrupt handlers in the main program or DMA transfers may clash with the loader's RAM region, too.
« Last Edit: July 18, 2021, 08:41:35 pm by gf »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
I am finding it hard to follow that.

I have gone back to review why exactly we have two copies of the loader:

I need a copy of that loader to be compiled to execute at the top 16k of the FLASH, and arranged so that the SWD writes it there. I realise, from reading various things around that this may generate a huge .elf where most of the 1MB is 0x00 or some such, but that's ok because it still takes only seconds to write it. And that loader will be what gets copied to RAM. Alternatively I could have the loader anywhere in FLASH and a bit of code which writes it to FLASH if not already there, but writing it there using SWD is the cleanest way. The entry point of the FLASH based loader will obviously need to be at the start of the 16k block, so the loader.c file will need to start with a function which just contains a jump/call to the real loader which does the work.
So I need two copies of the loader, one compiled/linked to execute at "1MB minus 16k" and the other at 0x20000000. Both will be actually run from those addresses. Relocatable code would be a neat solution...


The two copies of the loader are called loader_flash and loader_ram.

And the flash based one is needed to enable recovery should all the lower FLASH (well except the bottom block) be corrupted. To enable recovery, all that is needed is the initial vector which runs the startupxxx.s code and then runs main(), and the first bit of main() which sets up the basic hardware. I suspect most people doing loaders don't bother with this precaution!

But, the bottom 16k has to be erased as one block anyway, so why not just forget the two loaders, and have the RAM-based loader in that 16k block? Which is what I had previously, and had it running. The code, located to run at 0x20000000, can sit anywhere in the flash, and the bottom 16k is just fine because I can plan to keep that 16k constant.



The "system memory" is what gets run if you set BOOT0/BOOT1 to 1 at startup. That is the other way to do a boot loader, but you need a user-accessible jumper, which is not easy on a boxed product. But then you still need a 2nd CPU if you want to cover all possible scenarios.

Can anyone think of any real problem with having just one loader? I may have forgotten something. It seems safe enough provided the bottom block is written only if previous blocks succeeded; the chance of power loss then is tiny.

How would one arrange the order in which object files are linked? The start vector goes to the assembler startup, which calls main(), but main() could be anywhere within the 1MB. I really want main() to be at the bottom, and the loader right after it. In the old days one arranged the order by editing the linkfile but in this case the IDE auto generates all this stuff.

I think the best way to do this is to get startup.s to call main() which sets up the I/O, SPI, etc, does not enable interrupts, does all the dirty work to determine if loader recovery is needed (a CRC over the CPU FLASH, a CRC over the SPI FLASH to see if a valid program is in there, etc) and if so, it copies the loader to RAM at 0x20000000 and runs it. The RAM code will then restore the CPU FLASH from the SPI FLASH. And one can even pass simple parameters to the RAM loader in the normal way of calling a function AFAICS - the stuff gets pushed onto the stack.

We need to place main.c into the bottom block. Currently the startup code goes in the bottom because it shares the .s file with the reset vectors, but there doesn't seem to be anything ensuring that main.c goes right after it, and the loader.c file will have to go right after that.

Code: [Select]
/* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH

p.s. if you want to have a laugh, you can read this 417-page ST document on their boot loader operation :)
https://www.st.com/resource/en/application_note/cd00167594-stm32-microcontroller-system-memory-boot-mode-stmicroelectronics.pdf
« Last Edit: July 19, 2021, 11:02:03 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
OK I am back to a single loader :)

Linker script:

Code: [Select]
/* Entry Point */
ENTRY(Reset_Handler)

/* Reference loader to ensure that it gets linked-in */
EXTERN(KDE_loader)


/* Highest address of the main stack */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    /* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
_Min_Heap_Size  = 0xa000;     /* 40k heap - min size; it can grow to end of main RAM  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)  : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)     : ORIGIN = 0x10000000, LENGTH = 64K
}

/* Define output sections */
SECTIONS
{
 /* loader_bss and loader sections must come first, in order to override the wildcards in subsequent sections */
 loader_bss _loader_end : {
  _loader_bss_start = .;
  *KDE_loader.o(.bss*)
  . = ALIGN(4);
  _loader_bss_end = .;
 }
 
 KDE_loader 0x20000000 : AT(_loader_loadaddr) {
  _loader_start = .;
  *KDE_loader.o(.text*)
  *KDE_loader.o(.rodata*)
  *KDE_loader.o(.data*)
  *KDE_loader.o(.common*)
  *KDE_loader.o(.ARM.attributes)
  . = ALIGN(4);
  _loader_end = .;
 }
 
  _loader_size = SIZEOF(KDE_loader);


    /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH

   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH

  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(.fini_array*))
    KEEP (*(SORT(.fini_array.*)))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH

 
  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
 /*   . = ALIGN(4); */
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT >FLASH

 /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

/* dummy placeholder in flash for loader section, to count flash usage */
 .KDE_loader : {
  . = . + SIZEOF(KDE_loader);
 } AT >FLASH
 _loader_loadaddr = LOADADDR(.KDE_loader);
 
 
  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
 
  /* ._user_heap_stack : */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM
  /*   } >CCMRAM */


  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
 
  /* CCM-RAM section
  *
  * IMPORTANT NOTE!
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section.
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
   
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}


Main.c relevant bit:

Code: [Select]
int main(void)
{
HAL_Init();
SystemClock_Config();
HAL_NVIC_SetPriorityGrouping(NVIC_PRIORITYGROUP_4);

// Stop IWDG if single stepping
__HAL_DBGMCU_FREEZE_IWDG();

// TaskHandle_t xLED_Tasks[4];

// Initialise clocks. Without this one cannot set up any I/O pins etc.
HAL_RCC_MCOConfig(RCC_MCO2, RCC_MCO2SOURCE_HSE, RCC_MCODIV_1);

// Allocate and immediately free the entire heap memory.
// This is a workaround for an malloc bug which became apparent in polarssl.
// BUT TODO if this is true then we need to malloc the real max heap size, not just 48k etc.
// void *dummy = malloc(48*1024);
// free(dummy);

/* GPIO Ports Clock Enable */
__HAL_RCC_GPIOH_CLK_ENABLE();
__HAL_RCC_GPIOC_CLK_ENABLE();
__HAL_RCC_GPIOB_CLK_ENABLE();
__HAL_RCC_GPIOA_CLK_ENABLE();

GPIO_InitTypeDef  GPIO_InitStruct;

// Configure GPIO PIN for dataflash Chip select
GPIO_InitStruct.Pin  = _45DBXX_CS_PIN;
GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
GPIO_InitStruct.Pull = GPIO_NOPULL;
HAL_GPIO_Init(_45DBXX_CS_GPIO, &GPIO_InitStruct);
HAL_GPIO_WritePin(_45DBXX_CS_GPIO, _45DBXX_CS_PIN, GPIO_PIN_SET);

// Config SPI2 fully for serial flash
KDE_Init_SPI2();

// Config SPI3 basic function (pins etc) - there is a sep. SPI controller init for each device on SPI3
KDE_Init_SPI3();

// For 655 board: initialise STLED316 chip, and set up SPI3 for STLED316 mode
KDE_display_init();

// Configure GPIO PIN for relay output
KDE_DAC_init_relay();

// Initialise both internal ADCs
// Values allowed for the interval are 3 15 28 56 84 112 144 480
// ADC1, if unconnected, measures 1/11 of the 5V rail
// ADC2, if unconnected, measures 1/2 of the 3.3V rail

MX_ADC1_Init(ADC_SAMPLETIME_56CYCLES);
MX_ADC2_Init(ADC_SAMPLETIME_56CYCLES);

// Initialise both internal DACs
MX_DAC_Init();

// set both DAC outputs to zero
KDE_DAC_set_value(1, 0);
KDE_DAC_set_value(2, 0);

//  Set relay output mode to false
KDE_DAC_set_relay(false);

// Initialise analog control signals (internal and "655" board)
// Note the onboard and external subsystems have these signals in opposite polarity

ADS1118_an_2p5v_onboard(1);  // turn off TC mode (would lift AIN3 by 20mV)
ADS1118_an_2p5v_external(0);  // turn on 2.5V for RTD
ADS1118_an_pullup_onboard(0);  // turn off TC mode (would lift AIN3 by 20mV)
ADS1118_an_pullup_external(1); // turn on 2.5V for RTD

// Initialise hex switch and pushbutton switch
KDE_init_switches();

// Initialise RTC, and reset date/time if corrupted (uses a magic value stored in backup RAM)
KDE_rtc_init();

// Initialise LEDs - also turns them all off
KDE_LED_Init(KDE_LED2);
KDE_LED_Init(KDE_LED3);
KDE_LED_Init(KDE_LED4);
KDE_LED_Init(KDE_LED5);
KDE_LED_Init(KDE_LED6);
KDE_LED_Init(KDE_LED7);

// Do a pattern on five LEDs and then turn them all off

KDE_LED_On(KDE_LED2);
hang_around(200);
KDE_LED_On(KDE_LED3);
hang_around(200);
KDE_LED_On(KDE_LED4);
hang_around(200);
KDE_LED_On(KDE_LED5);
hang_around(200);
KDE_LED_On(KDE_LED6);
hang_around(200);
KDE_LED_On(KDE_LED7);
hang_around(500);
KDE_LED_Off(KDE_LED7);
hang_around(200);
KDE_LED_Off(KDE_LED6);
hang_around(200);
KDE_LED_Off(KDE_LED5);
hang_around(200);
KDE_LED_Off(KDE_LED4);
hang_around(200);
KDE_LED_Off(KDE_LED3);
hang_around(200);
KDE_LED_Off(KDE_LED2);

// === At this point, interrupts should still be disabled ====
// Execute loader
// At this point, interrupts should still be disabled
// Execute loader

extern char _loader_start;
extern char _loader_end;
extern char _loader_loadaddr;
extern char _loader_bss_start;
extern char _loader_bss_end;

    // copy loader to ram
memcpy(&_loader_start, &_loader_loadaddr, &_loader_end - &_loader_start);

// clear loader's bss
memset(&_loader_bss_start, 0, &_loader_bss_end - &_loader_bss_start);

extern void loader_entry() __attribute__((long_call));
loader_entry();


KDE_loader.c:

Code: [Select]
/*
 * KDE_loader.c
 *
 *  Created on: 13 Jul 2021
 *      Author: peter
 *
 * This module is self-contained and lives in the bottom 16k of CPU FLASH.
 * It is linked to execute at base of RAM: 0x20000000.
 * Its job is to load itself into RAM.
 *
 * All functions and initialised data (data outside of functions) here are static, and no functions outside this file can be called by the code here.
 *
 * The 45DB code here is a copy of (most of) 45dbxx.c, with every function having "static" in front of it. Same for LEDs, etc.
 *
 *
 */



#include "stm32f417xx.h"
#include <KDE_loader.h>


// This is exported
extern void loader_entry();


// Prototype
void real_loader(void);

// This is the entry point for when this module ends up in RAM at 0x20000000. Must be the first thing here.
void loader_entry(void)
{
real_loader();
}


/**
  * @brief  Sets or clears the selected data port bit.
  *
  * @note   This function uses GPIOx_BSRR register to allow atomic read/modify
  *         accesses. In this way, there is no risk of an IRQ occurring between
  *         the read and the modify access.
  *
  * @param  GPIOx where x can be (A..K) to select the GPIO peripheral for STM32F429X device or
  *                      x can be (A..I) to select the GPIO peripheral for STM32F40XX and STM32F427X devices.
  * @param  GPIO_Pin specifies the port bit to be written.
  *          This parameter can be one of GPIO_PIN_x where x can be (0..15).
  * @param  PinState specifies the value to be written to the selected bit.
  *          This parameter can be one of the GPIO_PinState enum values:
  *            @arg GPIO_PIN_RESET: to clear the port pin
  *            @arg GPIO_PIN_SET: to set the port pin
  * @retval None
  */
static void HAL_GPIO_WritePin(GPIO_TypeDef* GPIOx, uint16_t GPIO_Pin, GPIO_PinState PinState)
{

  if(PinState != GPIO_PIN_RESET)
  {
    GPIOx->BSRR = GPIO_Pin;
  }
  else
  {
    GPIOx->BSRR = (uint32_t)GPIO_Pin << 16U;
  }
}


/**
  * @brief  Turns selected LED On.
  * @param  Led: Specifies the Led to be set on.
  *   This parameter can be one of following parameters:
  *     @arg KDE_LED2
  *     @arg KDE_LED3
  *     @arg KDE_LED4
  *     @arg KDE_LED3
  *     @arg KDE_LED5
  *     @arg KDE_LED6
  *     @arg KDE_LED7
  */
static void KDE_LED_On(KDE_LED_NUM led)
{
  HAL_GPIO_WritePin(GPIO_PORT[led], GPIO_PIN[led], GPIO_PIN_RESET); // set to 0 to turn on
}

/**
  * @brief  Turns selected LED Off.
  * @param  Led: Specifies the Led to be set off.
  *   This parameter can be one of following parameters:
  *     @arg KDE_LED2
  *     @arg KDE_LED3
  *     @arg KDE_LED4
  *     @arg KDE_LED3
  *     @arg KDE_LED5
  *     @arg KDE_LED6
  *     @arg KDE_LED7
  */
static void KDE_LED_Off(KDE_LED_NUM led)
{
  HAL_GPIO_WritePin(GPIO_PORT[led], GPIO_PIN[led], GPIO_PIN_SET); // set to 1 to turn off
}

/**
  * @brief  Toggles the selected LED.
  * @param  Led: Specifies the Led to be toggled.
  *   This parameter can be one of following parameters:
  *     @arg KDE_LED2
  *     @arg KDE_LED3
  *     @arg KDE_LED4
  *     @arg KDE_LED3
  *     @arg KDE_LED5
  *     @arg KDE_LED6
  *     @arg KDE_LED7
  */



// Hang around for delay in ms. Approximate but doesn't need interrupts etc working.
// Tweaked for RAM resident code which runs a little faster (!) than RAM resident code.

static void hang_around(uint32_t delay)
{

uint32_t fred = 15100*delay;

while (fred>0)
{
fred--;
}

}

volatile uint8_t fred123[256]; // test for init data



// This is where it all happens...

void real_loader(void)
{

for (;;)
{
KDE_LED_On(KDE_LED2);
hang_around(200);
KDE_LED_Off(KDE_LED2);
hang_around(200);
}

}


What I need is a mod to the linker script (or other cleanups - all ideas appreciated :) ) which puts main.o right after the startup, and the kde_loader.o right after that, so hopefully everything ends up in the bottom 16k.

There is another problem: everything has to be in that bottom 16k, i.e. you can't use memcpy, memset, etc, etc. Even stuff like HAL_init() cannot be called. These functions

Code: [Select]
HAL_Init();
SystemClock_Config();
HAL_NVIC_SetPriorityGrouping(NVIC_PRIORITYGROUP_4);

// Stop IWDG if single stepping
__HAL_DBGMCU_FREEZE_IWDG();

// TaskHandle_t xLED_Tasks[4];

// Initialise clocks. Without this one cannot set up any I/O pins etc.
HAL_RCC_MCOConfig(RCC_MCO2, RCC_MCO2SOURCE_HSE, RCC_MCODIV_1);


need to also go in the bottom 16k, as does everything which they call. Is there some way to do a verification to make sure something has not been missed?

Having to place so much stuff in the bottom 16k, or even the bottom 32/64k if needed, is not wasting space because this ST HAL junk has to go "somewhere".

I seem to have found how to place modules in order into FLASH:

Code: [Select]
/* Place modules into FLASH, starting at the bottom
 
  /* Startup code */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector))
    . = ALIGN(4);
  } >FLASH
 
  /* main.o */
  main.o :
  {
    . = ALIGN(4);
    KEEP(*(.main.o))
    . = ALIGN(4);
  } >FLASH
 
   /* KDE_SPI.o */
  KDE_SPI.o :
  {
    . = ALIGN(4);
    KEEP(*(.KDE_SPI.o))
    . = ALIGN(4);
  } >FLASH
 
  /* stm32f4xx_hal_cortex.o */
  stm32f4xx_hal_cortex.o :
  {
    . = ALIGN(4);
    KEEP(*(.stm32f4xx_hal_cortex.o))
    . = ALIGN(4);
  } >FLASH


but surely there must be a way with fewer lines? The linker manual https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_chapter/ld_3.html is not easy to work out.

Digging around more, it looks like one can define a custom memory block e.g.

MEMORY
{
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)  : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)     : ORIGIN = 0x10000000, LENGTH = 64K
  BOTTOM_BLOCK : ORIGIN = 0x00000000, LENGTH = 16K
}

and then one could load the modules into that e.g.

 /* stm32f4xx_hal_cortex.o */
  stm32f4xx_hal_cortex.o :
  {
    . = ALIGN(4);
    KEEP(*(.stm32f4xx_hal_cortex.o))
    . = ALIGN(4);
  } >BOTTOM_BLOCK

but that doesn't really help. Is there some way to do something like

.isr_vector
main.o
KDE_SPI.o
stm32f4xx_hal_cortex.o
> FLASH



Thank you all!
« Last Edit: July 19, 2021, 03:58:43 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
Define two regions instead of one FLASH:
Code: [Select]
BOOT (rx)      : ORIGIN = 0x08000000, LENGTH = 16K
FLASH (rx)      : ORIGIN = 0x08004000, LENGTH = 1024K-16K
and place all boot-related files >BOOT. But there is a big "but": if your main part (placed to FLASH) uses some of the functions placed to BOOT and you update the FLASH region while keeping the first 16K - the main part will be broken (because after rebuild function addresses in newly generated BOOT can creep, but you are keeping the old one). That's (also) why I'm talking about separate linking.
If separate projects or subprojects are absolutely not an option - use the approach suggested by harerod on page 2 (single project with two build configs conditionally excluding some files from build and selecting different .ld files). The boot part could be built straight for RAM address then, with a simple copy from PC-relative "where we are now" to RAM done in startup.s. All problems arising from duplicate symbols would be gone too.
« Last Edit: July 19, 2021, 05:06:31 pm by abyrvalg »
 
The following users thanked this post: harerod

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
I played around with this

Code: [Select]
MEMORY
{
  FLASH (rx)          : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (xrw)           : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)      : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)         : ORIGIN = 0x10000000, LENGTH = 64K
  LOADER_BLOCK (rx)   : ORIGIN = 0x08000000, LENGTH = 16K
}

/* Define output sections */
SECTIONS
{
 /* loader_bss and loader sections must come first, in order to override the wildcards in subsequent sections */
 loader_bss _loader_end : {
  _loader_bss_start = .;
  *KDE_loader.o(.bss*)
  . = ALIGN(4);
  _loader_bss_end = .;
 }
 
 KDE_loader 0x20000000 : AT(_loader_loadaddr) {
  _loader_start = .;
  *KDE_loader.o(.text*)
  *KDE_loader.o(.rodata*)
  *KDE_loader.o(.data*)
  *KDE_loader.o(.common*)
  *KDE_loader.o(.ARM.attributes)
  . = ALIGN(4);
  _loader_end = .;
 }
 
  _loader_size = SIZEOF(KDE_loader);


  /* Place modules into FLASH, starting at the bottom
 
    /* Startup code */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector))
    . = ALIGN(4);
  } >LOADER_BLOCK
 
  /* main.o */
  main.o :
  {
    . = ALIGN(4);
    KEEP(*(.main.o))
    . = ALIGN(4);
  } >LOADER_BLOCK
 
   /* KDE_SPI.o */
  KDE_SPI.o :
  {
    . = ALIGN(4);
    KEEP(*(.KDE_SPI.o))
    . = ALIGN(4);
  } >LOADER_BLOCK
 
  /* stm32f4xx_hal_cortex.o */
  stm32f4xx_hal_cortex.o :
  {
    . = ALIGN(4);
    KEEP(*(.stm32f4xx_hal_cortex.o))
    . = ALIGN(4);
  } >LOADER_BLOCK
 
  LOADER_BLOCK :
  {
  } > FLASH
 

which ought to generate a warning if LOADER_BLOCK is bigger than 16k. Then this

Code: [Select]
  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH

   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH

collects all other code, into the remaining FLASH. Doesn't link though - section overlap error.
« Last Edit: July 19, 2021, 05:22:15 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
Sure, you need to move FLASH start up behind the LOADER_BLOCK end and remove the LOADER_BLOCK : {} > FLASH.
Are you planning to update the LOADER_BLOCK each time the main part is updated? 
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
In the factory init, yes.

In the field, new code will skip the bottom 16k.

Edit: I think I get your drift. The addresses of the functions located in the boot block will move around, so they can't be called by other code unless done via a table of jumps, which is a big hassle (I've done that with two user-programmable products I designed years ago). I think what I will do is make that loader block totally standalone, with its own copies of everything. So e.g.

HAL_Init();
SystemClock_Config();

will be local copies, different names, but they can #include the same (huge) .h files as the existing funcs. This hacking around will take me some time; I don't suppose there is any automated way of checking that something isn't calling a function outside the module. I guess one could comment out the #includes and if you get no compile errors then everything is local. I will also need local versions of memcpy and memset but that's easy.

I am having some trouble seeing how much space modules are taking up. Some big ones don't seem to use up any "text".
« Last Edit: July 19, 2021, 07:23:50 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
I need a bit of help with the link script.

I have the vectors definitely at 0x08000000.

Then the bottom 16k block contains

b_main - control transfers to this from the .s startup; it sets up some I/O, SPI2 for the serial flash, and copies the loader to 0x20000000
loader   - the bit which runs from RAM
and some other stuff later

and then I want to waste the rest of the bottom 16k, and have the real main() to be 16k up i.e. 0x08004000.

Current script:

Code: [Select]
/*
*****************************************************************************
**
**  File        : LinkerScript.ld
**
**  Abstract    : Linker script for STM32F417VG Device with
**                1024KByte FLASH, 128KByte RAM + 64kbyte in CCM
**
**                Set heap size, stack size and stack location according
**                to application requirements.
**
**  Target      : STMicroelectronics STM32F417
**
* 18/7/2021 PH mods to support a loader which executes at base of RAM i.e. 0x20000000.
*

*/

/* Entry Point */
ENTRY(Reset_Handler)

/* Reference loader to ensure that it gets linked-in */
EXTERN(KDE_loader)


/* Highest address of the main stack */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    /* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
_Min_Heap_Size  = 0xa000;     /* 40k heap - min size; it can grow to end of main RAM  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH (rx)          : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (xrw)           : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)      : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)         : ORIGIN = 0x10000000, LENGTH = 64K
}

/* Define output sections */
SECTIONS
{
 /* loader_bss and loader sections must come first, in order to override the wildcards in subsequent sections */
 loader_bss _loader_end : {
  _loader_bss_start = .;
  *KDE_loader.o(.bss*)
  . = ALIGN(4);
  _loader_bss_end = .;
 }
 
 KDE_loader 0x20000000 : AT(_loader_loadaddr) {
  _loader_start = .;
  *KDE_loader.o(.text*)
  *KDE_loader.o(.rodata*)
  *KDE_loader.o(.data*)
  *KDE_loader.o(.common*)
  *KDE_loader.o(.ARM.attributes)
  . = ALIGN(4);
  _loader_end = .;
 }
 
  _loader_size = SIZEOF(KDE_loader);


  /* Place modules into FLASH, starting at the bottom
 
    /* Startup code */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector))
    . = ALIGN(4);
  } >FLASH
 
  /* b_main.o */
  b_main.o :
  {
    . = ALIGN(4);
    KEEP(*(.main.o))
    . = ALIGN(4);
  } >FLASH

/* other bottom-16k stuff will go here */
 
  /* KDE_loader.o - goes last in the bottom block which makes it easier to check the block usage */
  KDE_loader.o :
  {
    . = ALIGN(4);
    KEEP(*(.KDE_loader.o))
    . = ALIGN(4);
  } >FLASH
 
 
  /* The real main() is at base of FLASH plus 16k */
  . = 0x08004000;
  main.o :
  {
    . = ALIGN(4);
    KEEP(*(.main.o))
    . = ALIGN(4);
  } >FLASH
 
 
 
  /*LOADER_BLOCK :
  {
  } > FLASH */
 
 
  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH

   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH

/*
  Stuff for __libc_init_array (C++)
  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(.fini_array*))
    KEEP (*(SORT(.fini_array.*)))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH
*/

 
  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
 /*   . = ALIGN(4); */
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT >FLASH

 /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

/* dummy placeholder in flash for loader section, to count flash usage */
 .KDE_loader : {
  . = . + SIZEOF(KDE_loader);
 } AT >FLASH
 _loader_loadaddr = LOADADDR(.KDE_loader);
 
 
  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
 
  /* ._user_heap_stack : */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM
  /*   } >CCMRAM */


  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
 
  /* CCM-RAM section
  *
  * IMPORTANT NOTE!
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section.
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
   
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}


So main.o should start at 0x08004000, but it starts much lower. I wonder if the linker is treating the name main.c specially?

Reading the GCC linker doc, this should do it, by setting the current load address to the desired value, but it doesn't seem to

« Last Edit: July 20, 2021, 07:43:33 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
If you go the separate bins way you shoudn't care about the app part layout from the bootloader at all (those addresses will creep away in the next build and boot vector table will point to nonsense). Let each part have it's own vector table instead and set SCB->VTOR to point to the app VT in the app part.
Everything becomes quite simple:
boot part at 08000000-08003FFF, set FLASH region in boot.ld to 08000000, 16K
app part at 08004000+, set FLASH region in app.ld to 08004000, 1M-16K
In the boot part when you want to start the app do something like this:
Code: [Select]
#define FLASH_APP_BASE 0x08004000 //or define it in .ld and import

typedef struct //to cast first two VT entries to correct types for convenience
{
    uint32_t sp;
    void (*pc)(void);
} VT_t;

void StartApp(void)
{
    VT_t *app_vt = (VT_t *)FLASH_APP_BASE;
    __set_MSP(app_vt->sp); //gcc builtin func to set SP
    app_vt->pc(); //you can even pass some params here and collect them at app's startup.s
}

and in the app do SCB->VTOR = &g_pfnVectors (the label at the start of VT in startup.s). This way the two parts will be decoupled from each other.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Thank you.

I solved the immediate problem with the crude hack of align(0x4000)



which obviously does place main.o at 0x08004000, but will equally happily place it at 0x08008000 (etc) without generating a warning :)

If I use the commented-out version above, this is the error message which to me is meaningless



The RAM based loader will not touch the bottom 16k; the "application" being loaded will be linked to run from 0x08004000 and the entry point will be that. This avoids constructing jump tables.

The downside is that the bottom 16k must be totally self contained so I am slowly extracting the required code snippets from the ST HAL bloatware :)

I am now trying this:



with

Code: [Select]

    /* Startup code */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector))
    . = ALIGN(4);
  } >FLASH_BOOT
 
  /* b_main.o */
  b_main.o :
  {
    . = ALIGN(4);
    KEEP(*(.b_main.o))
    . = ALIGN(4);
  } >FLASH_BOOT
 
  /* other bottom-16k stuff will go here */
 
  /* KDE_loader.o - goes last in the bottom block which makes it easier to check the block usage */
  KDE_loader.o :
  {
    . = ALIGN(4);
    KEEP(*(.KDE_loader.o))
    . = ALIGN(4);
  } >FLASH_BOOT
 
 
  /* The real main() is at base of FLASH plus 16k */
   main.o :
  {
    . = ALIGN(4);
    KEEP(*(.main.o))
    . = ALIGN(4);
  } >FLASH_APP
 
   
  /* This collects all other stuff, which gets loaded into FLASH after main.o above */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH_APP

 
  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
 /*   . = ALIGN(4); */
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT >FLASH_BOOT

 /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

/* dummy placeholder in flash for loader section, to count flash usage */
 .KDE_loader : {
  . = . + SIZEOF(KDE_loader);
 } AT >FLASH_BOOT
 _loader_loadaddr = LOADADDR(.KDE_loader);
 


It looks like you end up with two lots of initialised data. One for the code in the bottom 16k and one for the remaining code. I can however probably avoid initialised data (outside of functions) in the bottom 16k.

Anyway, after tons of googling all over the place, and seeing loads of people trying this and failing, I now think it is impossible to group modules in say the bottom 16k unless you also use the _attribute_ directive in the source files. OTOH this https://www.silabs.com/community/mcu/32-bit/knowledge-base.entry.html/2018/06/27/how_to_locate_anent-Uyrh suggests that you can.
« Last Edit: July 20, 2021, 11:11:37 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
In general, placing main.o at fixed address doesn't guarantee that the main() function will be right at that address (i.e. if there are more functions in main.c, not just main() they can be placed first). And you need fixed addresses not just for main(), but for all ISRs too (in your current setup they all are pointed to by boot VT entries, so you can't update the app part without updating the boot sector).
Placing all .data to >RAM AT >FLASH_BOOT hits the same problem - you have a part of app (that can change in the next version) in the boot sector.
All these points can be worked around of course. I.e. you can create two startups defining two VTs and two .data init sections with different names, specify alternative section attribute for each code/data item of boot and place everything correctly in .ld, but all this turns into a "workaround of a workaround". This could be ok for a "build once and forget" project, but just the necessity of manual checking boot/app reference graphs for intersections is a big "no go" for anything maintained IMO.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
" placing main.o at fixed address doesn't guarantee that the main() function will be right at that address (i.e. if there are more functions in main.c, not just main() they can be placed first). And you need fixed addresses not just for main(), but for all ISRs too (in your current setup they all are pointed to by boot VT entries, so you can't update the app part without updating the boot sector)."

Sure; what I do is this, in say main.c:

void real_main(void);

void main(void)
{
  real_main();
}

func1()
func2()
etc

static void real_main(void)
{
  the real stuff
}

and then main() will definitely be at whatever address the .o file got loaded. I am checking this and so far it has always worked.

As you can probably tell I have no idea what this bit is doing:




I found that this hack, specifically the  *KDE_loader.o (.text .text*) bits, which again I don't really understand, works:

Code: [Select]
 
    /* Startup code */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector))
    . = ALIGN(4);
  } >FLASH_BOOT
 
  /* b_main.o */
  .b_main.o :
  {
    . = ALIGN(4);
    KEEP(*(.b_main.o))
    *b_main.o (.text .text*)
    . = ALIGN(4);
  } >FLASH_BOOT
 
  /* other bottom-16k stuff will go here */
 
  /* KDE_loader.o - goes last in the bottom block which makes it easier to check the block usage */
  .KDE_loader.o :
  {
    . = ALIGN(4);
    KEEP(*(.KDE_loader.o))
    *KDE_loader.o (.text .text*)
    . = ALIGN(4);
     _loader_end_in_flash = .;
  } >FLASH_BOOT
 
 
  /* The real main() is at base of FLASH plus 16k */
  .main.o :
  {
    . = ALIGN(4);
    KEEP(*(.main.o))
    *main.o (.text .text*)
    . = ALIGN(4);
  } >FLASH_APP
 
   
  /* This collects all other stuff, which gets loaded into FLASH after main.o above */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)


The  .isr_vector : portion doesn't have that line, and I think the reason that stuff does get placed at 0x08000000 is because the link script starts with
entry(reset_handler) and that is referenced in the .s startup.

You can see it here that startup, b_main, loader get stacked up from 0x08000000, and main.o gets placed at 0x08004000 correctly, and everything else follows after that AFAICT.

Code: [Select]
.isr_vector     0x0000000008000000      0x188
                0x0000000008000000                . = ALIGN (0x4)
 *(.isr_vector)
 .isr_vector    0x0000000008000000      0x188 startup/startup_stm32f407xx.o
                0x0000000008000000                g_pfnVectors
                0x0000000008000188                . = ALIGN (0x4)

.b_main.o       0x0000000008000188      0x61c
                0x0000000008000188                . = ALIGN (0x4)
 *(.b_main.o)
 *b_main.o(.text .text*)
 .text.SystemClock_Config
                0x0000000008000188       0xf4 src/b_main.o
 .text.B_SetSysClock
                0x000000000800027c       0xec src/b_main.o
 .text.B_SystemInit
                0x0000000008000368       0x68 src/b_main.o
                0x0000000008000368                B_SystemInit
 .text.B_HAL_Init
                0x00000000080003d0       0x20 src/b_main.o
 .text.B_HAL_SPI_Init
                0x00000000080003f0      0x148 src/b_main.o
 .text.KDE_Init_SPI2
                0x0000000008000538       0x60 src/b_main.o
 .text.hang_around
                0x0000000008000598       0x26 src/b_main.o
 *fill*         0x00000000080005be        0x2
 .text.B_main   0x00000000080005c0      0x1e4 src/b_main.o
                0x00000000080005c0                B_main
                0x00000000080007a4                . = ALIGN (0x4)

.KDE_loader.o   0x00000000080007a4        0x0
                0x00000000080007a4                . = ALIGN (0x4)
 *(.KDE_loader.o)
 *KDE_loader.o(.text .text*)
                0x00000000080007a4                . = ALIGN (0x4)
                0x00000000080007a4                _loader_end_in_flash = .

.main.o         0x0000000008004000      0x234
                0x0000000008004000                . = ALIGN (0x4)
 *(.main.o)
 *main.o(.text .text*)
 .text.ITM_SendChar
                0x0000000008004000       0x44 src/main.o
 .text.main     0x0000000008004044        0xa src/main.o
                0x0000000008004044                main
 *fill*         0x000000000800404e        0x2
 .text.USBThread
                0x0000000008004050       0x48 src/main.o


The whole linker script, which is probably a mess, is:

Code: [Select]
/* Entry Point */
ENTRY(Reset_Handler)

/* Reference boot block stuff to ensure that it gets linked-in */
EXTERN(b_boot)
EXTERN(KDE_loader)


/* Highest address of the main stack */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    /* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
_Min_Heap_Size  = 0xa000;     /* 40k heap - min size; it can grow to end of main RAM  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH_BOOT (rx)     : ORIGIN = 0x08000000, LENGTH = 16K
  FLASH_APP (rx)      : ORIGIN = 0X08004000, LENGTH = 0x07FFC000
  RAM (xrw)           : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)      : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)         : ORIGIN = 0x10000000, LENGTH = 64K
}

/* Define output sections */
SECTIONS
{
 /* loader_bss and loader sections must come first, in order to override the wildcards in subsequent sections */
 loader_bss _loader_end : {
  _loader_bss_start = .;
  *KDE_loader.o(.bss*)
  . = ALIGN(4);
  _loader_bss_end = .;
 }
 
 KDE_loader 0x20000000 : AT(_loader_loadaddr) {
  _loader_start = .;
  *KDE_loader.o(.text*)
  *KDE_loader.o(.rodata*)
  *KDE_loader.o(.data*)
  *KDE_loader.o(.common*)
  *KDE_loader.o(.ARM.attributes)
  . = ALIGN(4);
  _loader_end = .;
 }
 
  _loader_size = SIZEOF(KDE_loader);


  /* Place modules into FLASH, starting at the bottom
 
   
    /* Startup code */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector))
    . = ALIGN(4);
  } >FLASH_BOOT
 
  /* b_main.o */
  .b_main.o :
  {
    . = ALIGN(4);
    KEEP(*(.b_main.o))
    *b_main.o (.text .text*)
    . = ALIGN(4);
  } >FLASH_BOOT
 
  /* other bottom-16k stuff will go here */
 
  /* KDE_loader.o - goes last in the bottom block which makes it easier to check the block usage */
  .KDE_loader.o :
  {
    . = ALIGN(4);
    KEEP(*(.KDE_loader.o))
    *KDE_loader.o (.text .text*)
    . = ALIGN(4);
     _loader_end_in_flash = .;
  } >FLASH_BOOT
 
 
  /* The real main() is at base of FLASH plus 16k */
  .main.o :
  {
    . = ALIGN(4);
    KEEP(*(.main.o))
    *main.o (.text .text*)
    . = ALIGN(4);
  } >FLASH_APP
 
   
  /* This collects all other stuff, which gets loaded into FLASH after main.o above */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH_APP

/*   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH_APP
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH_APP  */


  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
 /*   . = ALIGN(4); */
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT >FLASH_APP

 /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

/* dummy placeholder in flash for loader section, to count flash usage */
 .KDE_loader : {
  . = . + SIZEOF(KDE_loader);
 } AT >FLASH_BOOT
 _loader_loadaddr = LOADADDR(.KDE_loader);
 
 
  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss section */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
 
  /* ._user_heap_stack : */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM
  /*   } >CCMRAM */


  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
 
  /* CCM-RAM section
  *
  * IMPORTANT NOTE!
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section.
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
   
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}


I have played with linkfiles intermittently since the 1980s and once you get them working you don't touch them. The syntax is absolutely horrible, with everybody I know googling all over the place for examples, but most people don't post the eventual solution which worked for them.

"just the necessity of manual checking boot/app reference graphs for intersections is a big "no go" for anything maintained IMO."

Of course; the intention here is to just have the basics in the bottom 16k (even if most of the block is wasted) and leave it alone. All I need in there is

1) - code to set up basic hardware (no interrupts)

2) - code to flash LEDs (which unfortunately needs init. data to work, due to some typedefs (ST coders just love millions of typedefs all over the place even for absolute trivia)) defining RAM constants, read switches, etc, and SPI2 and SPI FLASH

3) - the loader, copied to 0x20000000 (which needs the LED code, plus SPI2, plus the SPI FLASH code)

I reckon the loader 3) can use the funcs in the other modules, because that whole 16k block will always be written together.

How to set up data and bss for this lot I don't have a clue! I would happily give somebody 100 quid for going over this stuff.
« Last Edit: July 20, 2021, 01:16:15 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
Ok, here is it: a minimalist PoC project with two "worlds" placed separately. Any .c/.s/.sx file with name starting with boot_ will be placed into the BOOT area. Any other file will be placed into the APP area. Also entire BOOT area gets relocated to RAM before execution (no need to worry about loader_xx placement/copy/bss init). Makefile-based (I don't use Cube), but you should get the idea, the startups are "stock" ST ones with minor tweaks.

Edit: added some comments in attached sources.

Edit: if your "stock" startup.s and .ld differs too much from mine - post your files and I'll make the same thing based on them.
« Last Edit: July 20, 2021, 02:54:55 pm by abyrvalg »
 
The following users thanked this post: peter-h

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
I've PMd you with them. I think most here are bored with me posting files :)

I will post the finished and working versions.

Currently the loader works but the main() (which runs if the loader doesn't) no longer runs properly and I am suspecting init. data is not getting initialised. I think the linkfile got broken...
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26896
  • Country: nl
    • NCT Developments
Likely you'll need to call the init from the loader. Another option is to do the initialisation from main itself and not depend on assembly. It only takes a memset and memcpy from main. That is how my software for Cortex-Mx controllers work. There is not a single line of assembly involved.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
I think most here are bored with me posting files :)
Not bored, but sorry, had to hold back the last while due to lack of spare time - a couple of other things are waiting to be done, too.
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
Likely you'll need to call the init from the loader. Another option is to do the initialisation from main itself and not depend on assembly. It only takes a memset and memcpy from main.

Indeed. Since the loader is not expected to know the size of the main program's .data and .bss, and the loadaddress of main's .data in flash, I'd rather tend to do it from main itself.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
I have both the loader mode, and the 'loader not needed' mode, running.

One thing I found is that the 'user application' (the bit which the loader will load from the SPI FLASH) will have to set up its RAM stuff i.e. zero bss and copy across 'data'.
Currently, the boot loader is doing that for the entire program loaded via SWD, which is fine, with the funny observation that loading the loader into RAM overwrites all that stuff, which doesn't matter because the loader does its own thing and then will always reboot the unit.

To my surprise, initialised data is being set up apparently correctly for both the FLASH code (loaded via SWD) and for the RAM based loader. The latter is done with memcpy but I had to drop in a local memcpy (and memset for the bss), from some sources I found, because the 16k bottom block has to be totally self sufficient. I've spent much of today stepping through the boot code and making sure the PC value never goes outside the 16k block :) So I've put in local copies of a lot of the HAL bloatware, de-bloated somewhat.

Currently my real main (which has an entry point at 16k i.e. 0x08004000) doesn't run once one gets to more complex functions like osDelay; not sure why. Will have to poke about. The #1 candidate is 'data' but it seems to be there (I tested it with stuff like uint8_t fred[]={"abcd"} and checking it is in the right place and is actually in RAM. OTOH I am seeing initialised data from some modules appearing under 'common' and my linker script was loading these with bss, so I moved common to data, but it has not helped. Probably interrupts are buggered.



Anyway, amusingly, I am now back to what others suggested early on: putting the boot loader etc in the bottom 16k block rather than the uppermost block :)

Edit: interrupts were indeed messed up. It almost runs now, but RTOS doesn't start the processes.


« Last Edit: July 20, 2021, 10:08:57 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Well, I got some ideas from others and started digging around a particular area and found it!



I had FLASH_APP there before.



I don't know what was actually happening but _sidata (the location of initialised data in flash from where the startup code copies it to RAM) was in RAM (0x20000000) which was obviously nonsense. It was not getting set up for main.c but was getting set up for subsequently loaded modules. No idea how...



Does anybody know what the RAM AT > FLASH_BOOT mean? I can see AT in the manual but how changing from FLASH_APP to FLASH_BOOT could change _sidata?

The fundamental learning point here - obvious in retrospect - is that any program loaded by the boot loader (into cpu flash, into ram, anywhere really) needs to set up its own initialised data and clear its own bss. I thought I was doing that, but the linker script is impenetrable in these subtleties no matter how much I read the GCC manual.

The other thing was that a test statement like

uint8_t fred[]={"abcd"};

got optimised away and did not generate any 'data' section, unless one also puts in some crap like

uint32_t addr=&fred[0];

It still says it is not used but it doesn't remove it.

Now, according to a switch setting, I can run the loader (from RAM) or the existing FLASH based program which was loaded by SWD.

A cunning approach is to copy the entire loader into RAM (with a small assembler stub) and then it is all a lot simpler. Thank you abyrvalg for that!
« Last Edit: July 21, 2021, 08:41:45 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1166
  • Country: de
Quote
Does anybody know what the "RAM AT > FLASH_BOOT" mean?

outsection : { .... } >RAM AT > FLASH_BOOT

means that outsection's VMA is set to the next free address in the RAM region, and outsection's LMA is set to the next free address in the FLASH_BOOT region.
Then the current fill level of both regions, RAM and FLASH_BOOT is advanced by the SIZEOF(outsection).

 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
OK thanks.

Could someone please take a look at my linkfile and tell me whether it makes sense, or is working by luck?

Code: [Select]
/*
*****************************************************************************
**
**  File        : LinkerScript.ld
**
**  Author      : Peter
**
**  Abstract    : Linker script for STM32F417VG Device with
**                1024KByte FLASH, 128KByte RAM + 64kbyte in CCM
**
**
**  Target      : 32F407/417
**
* 18/7/2021   PH mods to support a loader which executes at base of RAM i.e. 0x20000000.
*   21/7/21 PH loader RAM address changed to base+64k i.e. 0x20010000, and heap check removed (reason unclear but didn't
*                   do anything useful because the heap grows downwards at runtime anyway and the linker can't see that)
*

*/

/* Entry Point */
ENTRY(Reset_Handler)

/* Reference boot block stuff to ensure that it gets linked-in - not sure if these lines do anything */
EXTERN(b_main)
EXTERN(KDE_loader)


/* Highest address of the main stack. This stack is only for startup code, boot loader, and ISRs. The RTOS has its own. */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    /* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
 
_Min_Heap_Size  = 0x0000;     /* 21/7/21 set to zero to prevent heap/stack conflict check from blowing up due to boot loader area  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH_BOOT (rx)     : ORIGIN = 0x08000000, LENGTH = 16K
  FLASH_APP (rx)      : ORIGIN = 0x08004000, LENGTH = 1024K-16K
  RAM (xrw)           : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)      : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)         : ORIGIN = 0x10000000, LENGTH = 64K
}

/* Define output sections */
SECTIONS
{
 /* loader_bss and loader sections must come first, in order to override the wildcards in subsequent sections */
 loader_bss _loader_end : {
  _loader_bss_start = .;
  *KDE_loader.o(.bss*)
  . = ALIGN(4);
  _loader_bss_end = .;
 }
 
 /* RAM based loader is linked to execute here */
 KDE_loader 0x20010000 : AT(_loader_loadaddr)
 {
  _loader_start = .;
  *KDE_loader.o(.text*)
  *KDE_loader.o(.rodata*)
  *KDE_loader.o(.data*)
  *KDE_loader.o(.common*)
  *KDE_loader.o(.ARM.attributes)
  . = ALIGN(4);
  _loader_end = .;
 }
 
  _loader_size = SIZEOF(KDE_loader);


  /* Place modules into FLASH, starting at the bottom (0x08000000)
     The line "*b_main.o (.text .text* .rodata .rodata*)" is important, but for some reason
     (probably the reset_handler reference) is not needed for isr_vector */
 
   
    /* Startup code */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector))
    . = ALIGN(4);
  } >FLASH_BOOT
 
  /* b_main.o */
  .b_main.o :
  {
    . = ALIGN(4);
    KEEP(*(.b_main.o))
    *b_main.o (.text .text* .rodata .rodata*)
    . = ALIGN(4);
  } >FLASH_BOOT
 
  /* === other bottom-16k stuff will go here === */
 
 
  /* KDE_loader.o - goes last in the bottom block which makes it easier to check the block usage
     by checking _loader_end_in_flash in the .map file */
 
  .KDE_loader.o :
  {
    . = ALIGN(4);
    KEEP(*(.KDE_loader.o))
    *KDE_loader.o (.text .text* .rodata .rodata*)
    . = ALIGN(4);
     _loader_end_in_flash = .;
  } >FLASH_BOOT
 
 
  /* The rest of the KDE code goes here, at base+16k, starting with the real main() */
 
  .main.o :
  {
    . = ALIGN(4);
    KEEP(*(.main.o))
    *main.o (.text .text* .rodata .rodata*)
    . = ALIGN(4);
  } >FLASH_APP
 
   
  /* This collects all other stuff, which gets loaded into FLASH after main.o above */
 
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbol at end of code */
  } >FLASH_APP

/*   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH_APP
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH_APP  */


  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
    . = ALIGN(4);
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */
 
    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM  AT >FLASH_BOOT

 /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

/* dummy placeholder in flash for loader section, to count flash usage */
 .KDE_loader : {
  . = . + SIZEOF(KDE_loader);
 } AT >FLASH_BOOT
 _loader_loadaddr = LOADADDR(.KDE_loader);
 
 
  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss section */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
 
  /* ._user_heap_stack : */
  /* This check is basically disabled because _Min_Heap_Size=0 above */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM
  /*   } >CCMRAM */


  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
 
  /* CCM-RAM section
  *
  * IMPORTANT NOTE!
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section.
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
   
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}


To recap, the bottom 16k is dedicated to the loader, and the RAM based portion of it (currently about 8k) gets copied to RAM at 0x20010000. The stack is at the top of CCM. The real main.c starts 16k up i.e. 0x08004000.

The data and bss stuff ends up at base of RAM i.e. 0x20000000, which is why the loader goes higher up. It is calling some functions in the bottom 16k so I don't want to screw up their data areas.

I had some problem with overlaps between the loader and the heap, which makes no practical sense since the two will never co-exist. I don't think there is any point in checking that in the linkfile, provided it is properly documented.

Some stuff in the original linkfile was related only to C++ so I removed it.
« Last Edit: July 22, 2021, 09:59:44 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Just an update for anyone trying to implement this.

I have been doing lots of testing on the boot loader, including corrupting the entire CPU FLASH (by writing a 1MB-32k jpeg into it :) ) to make sure the boot block is able to recover the working image from the serial FLASH.

Well it took a while to squash all the bugs, because the ST libs call all kinds of sh*t all over the place, and anything outside the boot block will obviously crash it. Some of the stuff was hidden in macros which actually generated code.

The funny one was in the startup... .s file which contains the standard asm init code

Code: [Select]
    .section  .text.Reset_Handler
  .weak  Reset_Handler
  .type  Reset_Handler, %function
Reset_Handler: 

  ldr   sp, =_estack

/* Copy the data segment initializers from flash to SRAM */ 
  movs  r1, #0
  b  LoopCopyDataInit

CopyDataInit:
  ldr  r3, =_sidata
  ldr  r3, [r3, r1]
  str  r3, [r0, r1]
  adds  r1, r1, #4
   
LoopCopyDataInit:
  ldr  r0, =_sdata
  ldr  r3, =_edata
  adds  r2, r0, r1
  cmp  r2, r3
  bcc  CopyDataInit
  ldr  r2, =_sbss
  b  LoopFillZerobss
/* Zero fill the bss segment. */ 
FillZerobss:
  movs  r3, #0
  str  r3, [r2], #4
   
LoopFillZerobss:
  ldr  r3, = _ebss
  cmp  r2, r3
  bcc  FillZerobss

/* Initialise CCM RAM - fills the whole CCM so don't use the stack until afterwards :) */
/* PH 15/5/2021 */
ldr r2, = 0x10000000  /* was _sccmram */
b LoopFillZeroCcm

FillZeroCcm:
movs r3, 0xaaaaaaaa /* this fill intentionally differs from the a5a5a5a5 fill used by FreeRTOS for its stacks */
  str  r3, [r2]
adds r2, r2, #4

LoopFillZeroCcm:
ldr r3, = 0x10010000  /* was _eccmram */
cmp r2, r3
bcc FillZeroCcm

/* Call the clock system initialization function. Moded to b_main.c*/
//  bl  B_SystemInit
/* Call static constructors */
/*  bl B_libc_init_array */

/* Call the application's entry point - in this case the main() in the boot loader */
  bl  B_main
  bx  lr   
.size  Reset_Handler, .-Reset_Handler

Notice this code has its very own section .text.Reset_Handler. So with the standard linkfile this stuff will end up absolutely anywhere in the FLASH. I had to make sure it got loaded into the boot block, with this in the linkfile

Code: [Select]

  /* Place modules into FLASH, starting at the bottom (0x08000000)
     The line "*b_main.o (.text .text* .rodata .rodata*)" is important, but for some reason
     is not needed for isr_vector */
 
   
    /* Startup code */
   
 
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector))
    . = ALIGN(4);
  } >FLASH_BOOT
 
  .text.Reset_Handler :
  {
    . = ALIGN(4);
    KEEP(*(.text.Reset_Handler))
    *text.Reset_Handler (.text .text* .rodata .rodata*)
    . = ALIGN(4);
  } >FLASH_BOOT
 
  .main.o :
  {
    . = ALIGN(4);
    KEEP(*(.main.o))
    *main.o (.text .text* .rodata .rodata*)
    . = ALIGN(4);
  } >FLASH_BOOT
 
  .45dbxx.o :
  {
    . = ALIGN(4);
    KEEP(*(.45dbxx.o))
    *45dbxx.o (.text .text* .rodata .rodata*)
    . = ALIGN(4);
  } >FLASH_BOOT
 
  /* === other boot block stuff will go here === */
 
 
  /* loader.o - goes last in the bottom block which makes it easier to check the block usage
     by checking _loader_end_in_flash in the .map file */
 
  .loader.o :
  {
    . = ALIGN(4);
    KEEP(*(.loader.o))
    *loader.o (.text .text* .rodata .rodata*)
    . = ALIGN(4);
      _loader_end_in_flash = .;
  } >FLASH_BOOT
 
  /* The rest of the code goes here, loaded at base+32k */
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Just been watching this video


Around 16:30 he shows a curious way to implement a RAM resident boot loader and some inline assembler to jump to it. I can't say I understand it; perhaps he is using a compiler option to generate position-independent (relocatable) code.

Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
That thing at 16:30 is a jump from boot to the main part built as a separate binary (w/ base addr 08005000) with it's own vector table (extracting SP and PC values from it), not your case. You can find a month-old example of the same thing on the page 3 of this thread.

Edit: interesting, this construct coud fail (especially when compiled with lower optimization levels) if the compiler decides to place the app entry point var onto the stack. __set_MSP() drops the current stack frame completely, the only vars surviving are those in registers already.
« Last Edit: August 23, 2021, 12:30:24 am by abyrvalg »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Yes; I have no idea why there is a need to change the stack pointer, just because you are jumping to another part of the same memory map.

Then, by all, means, set SP to some new value later.

But there is in any case no need to change the stack at all, from power-up. For example I set the SP to the top of the CCM at startup and just leave it there. The whole product has the stack (16k) at the top of CCM (which is 64k). Later the RTOS starts up and uses the lower 48k of CCM for its stuff. The original stack then is used only by main() and by ISRs. I fail to see any need to complicate matters.

The way I did it, I have a call (which never returns) to main() and main() is linked to be at 0x08008000 (base+32k), and it would be perfectly possible to pass parameters to main(). In fact I do pass some parms to main() but by writing them into a serial (SPI) FLASH, but that is done because they need to be nonvolatile as well.
« Last Edit: August 23, 2021, 06:40:54 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
why there is a need to change the stack pointer
To have less dependencies between the bootloader and the app. App can have a completely different RAM map in his case, reusing areas previously occupied by boot. That construct essentially simulates a cold start with vector table moved to 08005000 (impossible with a true hw reset since VTOR gets reset), that’s what core does after reset - fetch SP and PC from +0 and +4 locations.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Yes; true, although if the boot loader stack usage is insignificant, it doesn't really matter, especially if the boot loader doesn't use interrupts.

Actually I wonder how people solve the interrupt issue. The vectors reside in the bottom of the CPU FLASH

Code: [Select]
/******************************************************************************
*
* The minimal vector table for a Cortex M3. Note that the proper constructs
* must be placed on this to ensure that it ends up at physical address
* 0x0000.0000.
*
*******************************************************************************/
   .section  .isr_vector,"a",%progbits
  .type  g_pfnVectors, %object
  .size  g_pfnVectors, .-g_pfnVectors
   
   
g_pfnVectors:
  .word  _estack
  .word  Reset_Handler
  .word  NMI_Handler
  .word  HardFault_Handler
  .word  MemManage_Handler
  .word  BusFault_Handler
  .word  UsageFault_Handler
  .word  0
  .word  0
  .word  0
  .word  0
  .word  SVC_Handler
  .word  DebugMon_Handler
  .word  0
  .word  PendSV_Handler
  .word  SysTick_Handler
 
  /* External Interrupts */
  .word     WWDG_IRQHandler                   /* Window WatchDog              */                                       
  .word     PVD_IRQHandler                    /* PVD through EXTI Line detection */                       
  .word     TAMP_STAMP_IRQHandler             /* Tamper and TimeStamps through the EXTI line */           
  .word     RTC_WKUP_IRQHandler               /* RTC Wakeup through the EXTI line */                     
  .word     FLASH_IRQHandler                  /* FLASH                        */                                         
  .word     RCC_IRQHandler                    /* RCC                          */                                           
  .word     EXTI0_IRQHandler                  /* EXTI Line0                   */                       
  .word     EXTI1_IRQHandler                  /* EXTI Line1                   */                         
  .word     EXTI2_IRQHandler                  /* EXTI Line2                   */                         
  .word     EXTI3_IRQHandler                  /* EXTI Line3                   */                         
  .word     EXTI4_IRQHandler                  /* EXTI Line4                   */                         
  .word     DMA1_Stream0_IRQHandler           /* DMA1 Stream 0                */                 
  .word     DMA1_Stream1_IRQHandler           /* DMA1 Stream 1                */                   
  .word     DMA1_Stream2_IRQHandler           /* DMA1 Stream 2                */                   
  .word     DMA1_Stream3_IRQHandler           /* DMA1 Stream 3                */                   
  .word     DMA1_Stream4_IRQHandler           /* DMA1 Stream 4                */                   
  .word     DMA1_Stream5_IRQHandler           /* DMA1 Stream 5                */                   
  .word     DMA1_Stream6_IRQHandler           /* DMA1 Stream 6                */                   
  .word     ADC_IRQHandler                    /* ADC1, ADC2 and ADC3s         */                   
  .word     CAN1_TX_IRQHandler                /* CAN1 TX                      */                         
  .word     CAN1_RX0_IRQHandler               /* CAN1 RX0                     */                         
  .word     CAN1_RX1_IRQHandler               /* CAN1 RX1                     */                         
  .word     CAN1_SCE_IRQHandler               /* CAN1 SCE                     */                         
  .word     EXTI9_5_IRQHandler                /* External Line[9:5]s          */                         
  .word     TIM1_BRK_TIM9_IRQHandler          /* TIM1 Break and TIM9          */         
  .word     TIM1_UP_TIM10_IRQHandler          /* TIM1 Update and TIM10        */         
  .word     TIM1_TRG_COM_TIM11_IRQHandler     /* TIM1 Trigger and Commutation and TIM11 */
  .word     TIM1_CC_IRQHandler                /* TIM1 Capture Compare         */                         
  .word     TIM2_IRQHandler                   /* TIM2                         */                   
  .word     TIM3_IRQHandler                   /* TIM3                         */                   
  .word     TIM4_IRQHandler                   /* TIM4                         */                   
  .word     I2C1_EV_IRQHandler                /* I2C1 Event                   */                         
  .word     I2C1_ER_IRQHandler                /* I2C1 Error                   */                         
  .word     I2C2_EV_IRQHandler                /* I2C2 Event                   */                         
  .word     I2C2_ER_IRQHandler                /* I2C2 Error                   */                           
  .word     SPI1_IRQHandler                   /* SPI1                         */                   
  .word     SPI2_IRQHandler                   /* SPI2                         */                   
  .word     USART1_IRQHandler                 /* USART1                       */                   
  .word     USART2_IRQHandler                 /* USART2                       */                   
  .word     USART3_IRQHandler                 /* USART3                       */                   
  .word     EXTI15_10_IRQHandler              /* External Line[15:10]s        */                         
  .word     RTC_Alarm_IRQHandler              /* RTC Alarm (A and B) through EXTI Line */                 
  .word     OTG_FS_WKUP_IRQHandler            /* USB OTG FS Wakeup through EXTI line */                       
  .word     TIM8_BRK_TIM12_IRQHandler         /* TIM8 Break and TIM12         */         
  .word     TIM8_UP_TIM13_IRQHandler          /* TIM8 Update and TIM13        */         
  .word     TIM8_TRG_COM_TIM14_IRQHandler     /* TIM8 Trigger and Commutation and TIM14 */
  .word     TIM8_CC_IRQHandler                /* TIM8 Capture Compare         */                         
  .word     DMA1_Stream7_IRQHandler           /* DMA1 Stream7                 */                         
  .word     FSMC_IRQHandler                   /* FSMC                         */                   
  .word     SDIO_IRQHandler                   /* SDIO                         */                   
  .word     TIM5_IRQHandler                   /* TIM5                         */                   
  .word     SPI3_IRQHandler                   /* SPI3                         */                   
  .word     UART4_IRQHandler                  /* UART4                        */                   
  .word     UART5_IRQHandler                  /* UART5                        */                   
  .word     TIM6_DAC_IRQHandler               /* TIM6 and DAC1&2 underrun errors */                   
  .word     TIM7_IRQHandler                   /* TIM7                         */
  .word     DMA2_Stream0_IRQHandler           /* DMA2 Stream 0                */                   
  .word     DMA2_Stream1_IRQHandler           /* DMA2 Stream 1                */                   
  .word     DMA2_Stream2_IRQHandler           /* DMA2 Stream 2                */                   
  .word     DMA2_Stream3_IRQHandler           /* DMA2 Stream 3                */                   
  .word     DMA2_Stream4_IRQHandler           /* DMA2 Stream 4                */                   
  .word     ETH_IRQHandler                    /* Ethernet                     */                   
  .word     ETH_WKUP_IRQHandler               /* Ethernet Wakeup through EXTI line */                     
  .word     CAN2_TX_IRQHandler                /* CAN2 TX                      */                         
  .word     CAN2_RX0_IRQHandler               /* CAN2 RX0                     */                         
  .word     CAN2_RX1_IRQHandler               /* CAN2 RX1                     */                         
  .word     CAN2_SCE_IRQHandler               /* CAN2 SCE                     */                         
  .word     OTG_FS_IRQHandler                 /* USB OTG FS                   */                   
  .word     DMA2_Stream5_IRQHandler           /* DMA2 Stream 5                */                   
  .word     DMA2_Stream6_IRQHandler           /* DMA2 Stream 6                */                   
  .word     DMA2_Stream7_IRQHandler           /* DMA2 Stream 7                */                   
  .word     USART6_IRQHandler                 /* USART6                       */                   
  .word     I2C3_EV_IRQHandler                /* I2C3 event                   */                         
  .word     I2C3_ER_IRQHandler                /* I2C3 error                   */                         
  .word     OTG_HS_EP1_OUT_IRQHandler         /* USB OTG HS End Point 1 Out   */                   
  .word     OTG_HS_EP1_IN_IRQHandler          /* USB OTG HS End Point 1 In    */                   
  .word     OTG_HS_WKUP_IRQHandler            /* USB OTG HS Wakeup through EXTI */                         
  .word     OTG_HS_IRQHandler                 /* USB OTG HS                   */                   
  .word     DCMI_IRQHandler                   /* DCMI                         */                   
  .word     0                                 /* CRYP crypto                  */                   
  .word     HASH_RNG_IRQHandler               /* Hash and Rng                 */
  .word     FPU_IRQHandler                    /* FPU                          */
                         
                         
/*******************************************************************************
*
* Provide weak aliases for each Exception handler to the Default_Handler.
* As they are weak aliases, any function with the same name will override
* this definition.
*
*******************************************************************************/
   .weak      NMI_Handler
   .thumb_set NMI_Handler,Default_Handler
 
   .weak      HardFault_Handler
   .thumb_set HardFault_Handler,Default_Handler
 
   .weak      MemManage_Handler
   .thumb_set MemManage_Handler,Default_Handler
 
   .weak      BusFault_Handler
   .thumb_set BusFault_Handler,Default_Handler

   .weak      UsageFault_Handler
   .thumb_set UsageFault_Handler,Default_Handler

   .weak      SVC_Handler
   .thumb_set SVC_Handler,Default_Handler

   .weak      DebugMon_Handler
   .thumb_set DebugMon_Handler,Default_Handler

   .weak      PendSV_Handler
   .thumb_set PendSV_Handler,Default_Handler

   .weak      SysTick_Handler
   .thumb_set SysTick_Handler,Default_Handler             
 
   .weak      WWDG_IRQHandler                   
   .thumb_set WWDG_IRQHandler,Default_Handler     
                 
   .weak      PVD_IRQHandler     
   .thumb_set PVD_IRQHandler,Default_Handler
               
   .weak      TAMP_STAMP_IRQHandler           
   .thumb_set TAMP_STAMP_IRQHandler,Default_Handler
           
   .weak      RTC_WKUP_IRQHandler                 
   .thumb_set RTC_WKUP_IRQHandler,Default_Handler
           
   .weak      FLASH_IRQHandler         
   .thumb_set FLASH_IRQHandler,Default_Handler
                 
   .weak      RCC_IRQHandler     
   .thumb_set RCC_IRQHandler,Default_Handler
                 
   .weak      EXTI0_IRQHandler         
   .thumb_set EXTI0_IRQHandler,Default_Handler
                 
   .weak      EXTI1_IRQHandler         
   .thumb_set EXTI1_IRQHandler,Default_Handler
                     
   .weak      EXTI2_IRQHandler         
   .thumb_set EXTI2_IRQHandler,Default_Handler
                 
   .weak      EXTI3_IRQHandler         
   .thumb_set EXTI3_IRQHandler,Default_Handler
                       
   .weak      EXTI4_IRQHandler         
   .thumb_set EXTI4_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream0_IRQHandler               
   .thumb_set DMA1_Stream0_IRQHandler,Default_Handler
         
   .weak      DMA1_Stream1_IRQHandler               
   .thumb_set DMA1_Stream1_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream2_IRQHandler               
   .thumb_set DMA1_Stream2_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream3_IRQHandler               
   .thumb_set DMA1_Stream3_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream4_IRQHandler             
   .thumb_set DMA1_Stream4_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream5_IRQHandler               
   .thumb_set DMA1_Stream5_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream6_IRQHandler               
   .thumb_set DMA1_Stream6_IRQHandler,Default_Handler
                 
   .weak      ADC_IRQHandler     
   .thumb_set ADC_IRQHandler,Default_Handler
               
   .weak      CAN1_TX_IRQHandler   
   .thumb_set CAN1_TX_IRQHandler,Default_Handler
           
   .weak      CAN1_RX0_IRQHandler                 
   .thumb_set CAN1_RX0_IRQHandler,Default_Handler
                           
   .weak      CAN1_RX1_IRQHandler                 
   .thumb_set CAN1_RX1_IRQHandler,Default_Handler
           
   .weak      CAN1_SCE_IRQHandler                 
   .thumb_set CAN1_SCE_IRQHandler,Default_Handler
           
   .weak      EXTI9_5_IRQHandler   
   .thumb_set EXTI9_5_IRQHandler,Default_Handler
           
   .weak      TIM1_BRK_TIM9_IRQHandler           
   .thumb_set TIM1_BRK_TIM9_IRQHandler,Default_Handler
           
   .weak      TIM1_UP_TIM10_IRQHandler           
   .thumb_set TIM1_UP_TIM10_IRQHandler,Default_Handler
     
   .weak      TIM1_TRG_COM_TIM11_IRQHandler     
   .thumb_set TIM1_TRG_COM_TIM11_IRQHandler,Default_Handler
     
   .weak      TIM1_CC_IRQHandler   
   .thumb_set TIM1_CC_IRQHandler,Default_Handler
                 
   .weak      TIM2_IRQHandler           
   .thumb_set TIM2_IRQHandler,Default_Handler
                 
   .weak      TIM3_IRQHandler           
   .thumb_set TIM3_IRQHandler,Default_Handler
                 
   .weak      TIM4_IRQHandler           
   .thumb_set TIM4_IRQHandler,Default_Handler
                 
   .weak      I2C1_EV_IRQHandler   
   .thumb_set I2C1_EV_IRQHandler,Default_Handler
                     
   .weak      I2C1_ER_IRQHandler   
   .thumb_set I2C1_ER_IRQHandler,Default_Handler
                     
   .weak      I2C2_EV_IRQHandler   
   .thumb_set I2C2_EV_IRQHandler,Default_Handler
                 
   .weak      I2C2_ER_IRQHandler   
   .thumb_set I2C2_ER_IRQHandler,Default_Handler
                           
   .weak      SPI1_IRQHandler           
   .thumb_set SPI1_IRQHandler,Default_Handler
                       
   .weak      SPI2_IRQHandler           
   .thumb_set SPI2_IRQHandler,Default_Handler
                 
   .weak      USART1_IRQHandler     
   .thumb_set USART1_IRQHandler,Default_Handler
                     
   .weak      USART2_IRQHandler     
   .thumb_set USART2_IRQHandler,Default_Handler
                     
   .weak      USART3_IRQHandler     
   .thumb_set USART3_IRQHandler,Default_Handler
                 
   .weak      EXTI15_10_IRQHandler               
   .thumb_set EXTI15_10_IRQHandler,Default_Handler
               
   .weak      RTC_Alarm_IRQHandler               
   .thumb_set RTC_Alarm_IRQHandler,Default_Handler
           
   .weak      OTG_FS_WKUP_IRQHandler         
   .thumb_set OTG_FS_WKUP_IRQHandler,Default_Handler
           
   .weak      TIM8_BRK_TIM12_IRQHandler         
   .thumb_set TIM8_BRK_TIM12_IRQHandler,Default_Handler
         
   .weak      TIM8_UP_TIM13_IRQHandler           
   .thumb_set TIM8_UP_TIM13_IRQHandler,Default_Handler
         
   .weak      TIM8_TRG_COM_TIM14_IRQHandler     
   .thumb_set TIM8_TRG_COM_TIM14_IRQHandler,Default_Handler
     
   .weak      TIM8_CC_IRQHandler   
   .thumb_set TIM8_CC_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream7_IRQHandler               
   .thumb_set DMA1_Stream7_IRQHandler,Default_Handler
                     
   .weak      FSMC_IRQHandler           
   .thumb_set FSMC_IRQHandler,Default_Handler
                     
   .weak      SDIO_IRQHandler           
   .thumb_set SDIO_IRQHandler,Default_Handler
                     
   .weak      TIM5_IRQHandler           
   .thumb_set TIM5_IRQHandler,Default_Handler
                     
   .weak      SPI3_IRQHandler           
   .thumb_set SPI3_IRQHandler,Default_Handler
                     
   .weak      UART4_IRQHandler         
   .thumb_set UART4_IRQHandler,Default_Handler
                 
   .weak      UART5_IRQHandler         
   .thumb_set UART5_IRQHandler,Default_Handler
                 
   .weak      TIM6_DAC_IRQHandler                 
   .thumb_set TIM6_DAC_IRQHandler,Default_Handler
               
   .weak      TIM7_IRQHandler           
   .thumb_set TIM7_IRQHandler,Default_Handler
         
   .weak      DMA2_Stream0_IRQHandler               
   .thumb_set DMA2_Stream0_IRQHandler,Default_Handler
               
   .weak      DMA2_Stream1_IRQHandler               
   .thumb_set DMA2_Stream1_IRQHandler,Default_Handler
                 
   .weak      DMA2_Stream2_IRQHandler               
   .thumb_set DMA2_Stream2_IRQHandler,Default_Handler
           
   .weak      DMA2_Stream3_IRQHandler               
   .thumb_set DMA2_Stream3_IRQHandler,Default_Handler
           
   .weak      DMA2_Stream4_IRQHandler               
   .thumb_set DMA2_Stream4_IRQHandler,Default_Handler
           
   .weak      ETH_IRQHandler     
   .thumb_set ETH_IRQHandler,Default_Handler
                 
   .weak      ETH_WKUP_IRQHandler                 
   .thumb_set ETH_WKUP_IRQHandler,Default_Handler
           
   .weak      CAN2_TX_IRQHandler   
   .thumb_set CAN2_TX_IRQHandler,Default_Handler
                           
   .weak      CAN2_RX0_IRQHandler                 
   .thumb_set CAN2_RX0_IRQHandler,Default_Handler
                           
   .weak      CAN2_RX1_IRQHandler                 
   .thumb_set CAN2_RX1_IRQHandler,Default_Handler
                           
   .weak      CAN2_SCE_IRQHandler                 
   .thumb_set CAN2_SCE_IRQHandler,Default_Handler
                           
   .weak      OTG_FS_IRQHandler     
   .thumb_set OTG_FS_IRQHandler,Default_Handler
                     
   .weak      DMA2_Stream5_IRQHandler               
   .thumb_set DMA2_Stream5_IRQHandler,Default_Handler
                 
   .weak      DMA2_Stream6_IRQHandler               
   .thumb_set DMA2_Stream6_IRQHandler,Default_Handler
                 
   .weak      DMA2_Stream7_IRQHandler               
   .thumb_set DMA2_Stream7_IRQHandler,Default_Handler
                 
   .weak      USART6_IRQHandler     
   .thumb_set USART6_IRQHandler,Default_Handler
                       
   .weak      I2C3_EV_IRQHandler   
   .thumb_set I2C3_EV_IRQHandler,Default_Handler
                       
   .weak      I2C3_ER_IRQHandler   
   .thumb_set I2C3_ER_IRQHandler,Default_Handler
                       
   .weak      OTG_HS_EP1_OUT_IRQHandler         
   .thumb_set OTG_HS_EP1_OUT_IRQHandler,Default_Handler
               
   .weak      OTG_HS_EP1_IN_IRQHandler           
   .thumb_set OTG_HS_EP1_IN_IRQHandler,Default_Handler
               
   .weak      OTG_HS_WKUP_IRQHandler         
   .thumb_set OTG_HS_WKUP_IRQHandler,Default_Handler
           
   .weak      OTG_HS_IRQHandler     
   .thumb_set OTG_HS_IRQHandler,Default_Handler
                 
   .weak      DCMI_IRQHandler           
   .thumb_set DCMI_IRQHandler,Default_Handler
                                   
   .weak      HASH_RNG_IRQHandler                 
   .thumb_set HASH_RNG_IRQHandler,Default_Handler   

   .weak      FPU_IRQHandler                 
   .thumb_set FPU_IRQHandler,Default_Handler 


I have "inherited" this project so haven't been digging into this part yet. AFAICT the ARM peripherals don't contain an ISR address register; they always go to these vectors, but since these vectors are in FLASH, the only ways to install your own handler are

- make most of the vector table point to one in RAM, which then points to where the FLASH one pointed to
- provide a means of rewriting the vector table - the whole bottom 16k block of necessity, but (I measured this) that takes only 300ms to erase and program.

But maybe the "weak" construct helps; I am not sure if it is useful when the original function is in FLASH.
« Last Edit: August 23, 2021, 09:16:21 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
"although if the boot loader stack usage is insignificant, it doesn't really matter"
- you may want to change app's RAM layout radically, like placing a heap above the stack or whatever, that's about freedom to redefine those things later
- your current setup also makes all RAM regions allocated to boot vars and that part running from RAM unavailable to the main part (because everything is linked in one step and linker sees that area as occupied)

"Actually I wonder how people solve the interrupt issue"
They just don't create it. With the app part being a separate binary (like in that guy's project) it has it's own vector table (and many other things like own startup code initializing app's .data/.bss correctly) at a different flash address (08005000 in his case). All he needs to do is to change SCB->VTOR register to point to that second VT and all interrupts will land there.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
This is what I currently have

Code: [Select]
/* Configure the Vector Table location add offset address ------------------*/
#define VECT_TAB_OFFSET  0x00 /*!< Vector Table base offset field. This value must be a multiple of 0x200. */
SCB->VTOR = FLASH_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal FLASH */

and that seems an easy way to do it: put a copy of the existing table at the start of where the boot loader will jump to. Currently this is base+32k i.e. 0x08008000 and there is a tiny module called main_stub.c which calls (never to return) main.c and which contains no #includes etc (to make sure no code can sneak before the entry point) and which is linked to be at 0x0808000. The VT needs to be on a 0x200 boundary so I will probably put it after that call to main.c, with some suitable align directive.

Is the above SCB->VTOR assignment atomic and if not, would one do a "DI/EI" around it? Actually interrupts are not enabled at that stage anyway.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
"I will probably put it after that call to main.c, with some suitable align directive"
What have you decided at the end? - are you updating the main part w/o replacing the bootloader or reflashing everything?
In the 1st case you'll need to be careful with the main VT placement (to have it in the updateable part).
In the 2nd case you don't need a separate VT at all, just put all handlers into the default (boot) VT and enable interrupts when you reach main().

VTOR assignment is atomic, it is a single uint32 mem write.
BTW, even with interrupts disabled the VT still could be used by the CPU to invoke exception handlers (if something wrong happens). In some applications requiring "graceful" fault handling (i.e. motor control) it is important to think about VT life cycle (i.e. don’t leave VTOR pointing at a flash being rewritten, don’t reassign VTOR from boot to app before app’s .data/.bss init etc).
« Last Edit: August 24, 2021, 11:14:29 pm by abyrvalg »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Yes - very interesting things to consider.

I was going to have just one VT and have all the interrupt handlers in the boot block, but it may be necessary for the "user code" to hook up some interrupt and this isn't possible if the entire interrupt handling is in FLASH.

So the plan now is to create a copy of the VT at the base of the "user code" and set VTOR to point to that. And have interrupt handlers in the "user code".

I've been looking at some of the code around the VT, which ST provide, and I am not sure all of it makes sense. This is the relevant part:

Code: [Select]
   
  .syntax unified
  .cpu cortex-m4
  .fpu softvfp
  .thumb

.global  g_pfnVectors
.global  Default_Handler

/* start address for the initialization values of the .data section.
defined in linker script */
.word  _sidata
/* start address for the .data section. defined in linker script */ 
.word  _sdata
/* end address for the .data section. defined in linker script */
.word  _edata
/* start address for the .bss section. defined in linker script */
.word  _sbss
/* end address for the .bss section. defined in linker script */
.word  _ebss
/* stack used for SystemInit_ExtMemCtl; always internal RAM used */

/**
 * @brief  This is the code that gets called when the processor first
 *          starts execution following a reset event. Only the absolutely
 *          necessary set is performed, after which the application
 *          supplied main() routine is called.
 * @param  None
 * @retval : None
*/

    .section  .text.Reset_Handler
  .weak  Reset_Handler
  .type  Reset_Handler, %function
Reset_Handler: 

  ldr   sp, =_estack

/* Copy the data segment initializers from flash to SRAM */ 
  movs  r1, #0
  b  LoopCopyDataInit

CopyDataInit:
  ldr  r3, =_sidata
  ldr  r3, [r3, r1]
  str  r3, [r0, r1]
  adds  r1, r1, #4
   
LoopCopyDataInit:
.
.
.
.


   .section  .isr_vector,"a",%progbits
  .type  g_pfnVectors, %object
  .size  g_pfnVectors, .-g_pfnVectors
   
   
g_pfnVectors:
  .word  _estack
  .word  Reset_Handler
  .word  NMI_Handler
  .word  HardFault_Handler
  .word  MemManage_Handler
  .word  BusFault_Handler
  .word  UsageFault_Handler
  .word  0
  .word  0
  .word  0
  .word  0
  .word  SVC_Handler
  .word  DebugMon_Handler
  .word  0
  .word  PendSV_Handler
  .word  SysTick_Handler
 
  /* External Interrupts */
  .word     WWDG_IRQHandler                   /* Window WatchDog              */                                       
  .word     PVD_IRQHandler                    /* PVD through EXTI Line detection */                       
  .word     TAMP_STAMP_IRQHandler             /* Tamper and TimeStamps through the EXTI line */           
  .word     RTC_WKUP_IRQHandler               /* RTC Wakeup through the EXTI line */                     
  .word     FLASH_IRQHandler                  /* FLASH                        */                                         
  .word     RCC_IRQHandler                    /* RCC                          */                                           
  .word     EXTI0_IRQHandler                  /* EXTI Line0                   */                       
  .word     EXTI1_IRQHandler                  /* EXTI Line1                   */                         
  .word     EXTI2_IRQHandler                  /* EXTI Line2                   */                         
  .word     EXTI3_IRQHandler                  /* EXTI Line3                   */                         
  .word     EXTI4_IRQHandler                  /* EXTI Line4                   */                         
  .word     DMA1_Stream0_IRQHandler           /* DMA1 Stream 0                */                 
  .word     DMA1_Stream1_IRQHandler           /* DMA1 Stream 1                */                   
  .word     DMA1_Stream2_IRQHandler           /* DMA1 Stream 2                */                   
  .word     DMA1_Stream3_IRQHandler           /* DMA1 Stream 3                */                   
  .word     DMA1_Stream4_IRQHandler           /* DMA1 Stream 4                */                   
  .word     DMA1_Stream5_IRQHandler           /* DMA1 Stream 5                */                   
  .word     DMA1_Stream6_IRQHandler           /* DMA1 Stream 6                */                   
  .word     ADC_IRQHandler                    /* ADC1, ADC2 and ADC3s         */                   
  .word     CAN1_TX_IRQHandler                /* CAN1 TX                      */                         
  .word     CAN1_RX0_IRQHandler               /* CAN1 RX0                     */                         
  .word     CAN1_RX1_IRQHandler               /* CAN1 RX1                     */                         
  .word     CAN1_SCE_IRQHandler               /* CAN1 SCE                     */                         
  .word     EXTI9_5_IRQHandler                /* External Line[9:5]s          */                         
  .word     TIM1_BRK_TIM9_IRQHandler          /* TIM1 Break and TIM9          */         
  .word     TIM1_UP_TIM10_IRQHandler          /* TIM1 Update and TIM10        */         
  .word     TIM1_TRG_COM_TIM11_IRQHandler     /* TIM1 Trigger and Commutation and TIM11 */
  .word     TIM1_CC_IRQHandler                /* TIM1 Capture Compare         */                         
  .word     TIM2_IRQHandler                   /* TIM2                         */                   
  .word     TIM3_IRQHandler                   /* TIM3                         */                   
  .word     TIM4_IRQHandler                   /* TIM4                         */                   
  .word     I2C1_EV_IRQHandler                /* I2C1 Event                   */                         
  .word     I2C1_ER_IRQHandler                /* I2C1 Error                   */                         
  .word     I2C2_EV_IRQHandler                /* I2C2 Event                   */                         
  .word     I2C2_ER_IRQHandler                /* I2C2 Error                   */                           
  .word     SPI1_IRQHandler                   /* SPI1                         */                   
  .word     SPI2_IRQHandler                   /* SPI2                         */                   
  .word     USART1_IRQHandler                 /* USART1                       */                   
  .word     USART2_IRQHandler                 /* USART2                       */                   
  .word     USART3_IRQHandler                 /* USART3                       */                   
  .word     EXTI15_10_IRQHandler              /* External Line[15:10]s        */                         
  .word     RTC_Alarm_IRQHandler              /* RTC Alarm (A and B) through EXTI Line */                 
  .word     OTG_FS_WKUP_IRQHandler            /* USB OTG FS Wakeup through EXTI line */                       
  .word     TIM8_BRK_TIM12_IRQHandler         /* TIM8 Break and TIM12         */         
  .word     TIM8_UP_TIM13_IRQHandler          /* TIM8 Update and TIM13        */         
  .word     TIM8_TRG_COM_TIM14_IRQHandler     /* TIM8 Trigger and Commutation and TIM14 */
  .word     TIM8_CC_IRQHandler                /* TIM8 Capture Compare         */                         
  .word     DMA1_Stream7_IRQHandler           /* DMA1 Stream7                 */                         
  .word     FSMC_IRQHandler                   /* FSMC                         */                   
  .word     SDIO_IRQHandler                   /* SDIO                         */                   
  .word     TIM5_IRQHandler                   /* TIM5                         */                   
  .word     SPI3_IRQHandler                   /* SPI3                         */                   
  .word     UART4_IRQHandler                  /* UART4                        */                   
  .word     UART5_IRQHandler                  /* UART5                        */                   
  .word     TIM6_DAC_IRQHandler               /* TIM6 and DAC1&2 underrun errors */                   
  .word     TIM7_IRQHandler                   /* TIM7                         */
  .word     DMA2_Stream0_IRQHandler           /* DMA2 Stream 0                */                   
  .word     DMA2_Stream1_IRQHandler           /* DMA2 Stream 1                */                   
  .word     DMA2_Stream2_IRQHandler           /* DMA2 Stream 2                */                   
  .word     DMA2_Stream3_IRQHandler           /* DMA2 Stream 3                */                   
  .word     DMA2_Stream4_IRQHandler           /* DMA2 Stream 4                */                   
  .word     ETH_IRQHandler                    /* Ethernet                     */                   
  .word     ETH_WKUP_IRQHandler               /* Ethernet Wakeup through EXTI line */                     
  .word     CAN2_TX_IRQHandler                /* CAN2 TX                      */                         
  .word     CAN2_RX0_IRQHandler               /* CAN2 RX0                     */                         
  .word     CAN2_RX1_IRQHandler               /* CAN2 RX1                     */                         
  .word     CAN2_SCE_IRQHandler               /* CAN2 SCE                     */                         
  .word     OTG_FS_IRQHandler                 /* USB OTG FS                   */                   
  .word     DMA2_Stream5_IRQHandler           /* DMA2 Stream 5                */                   
  .word     DMA2_Stream6_IRQHandler           /* DMA2 Stream 6                */                   
  .word     DMA2_Stream7_IRQHandler           /* DMA2 Stream 7                */                   
  .word     USART6_IRQHandler                 /* USART6                       */                   
  .word     I2C3_EV_IRQHandler                /* I2C3 event                   */                         
  .word     I2C3_ER_IRQHandler                /* I2C3 error                   */                         
  .word     OTG_HS_EP1_OUT_IRQHandler         /* USB OTG HS End Point 1 Out   */                   
  .word     OTG_HS_EP1_IN_IRQHandler          /* USB OTG HS End Point 1 In    */                   
  .word     OTG_HS_WKUP_IRQHandler            /* USB OTG HS Wakeup through EXTI */                         
  .word     OTG_HS_IRQHandler                 /* USB OTG HS                   */                   
  .word     DCMI_IRQHandler                   /* DCMI                         */                   
  .word     0                                 /* CRYP crypto                  */                   
  .word     HASH_RNG_IRQHandler               /* Hash and Rng                 */
  .word     FPU_IRQHandler                    /* FPU                          */
                         
                         
/*******************************************************************************
*
* Provide weak aliases for each Exception handler to the Default_Handler.
* As they are weak aliases, any function with the same name will override
* this definition.
*
*******************************************************************************/
   .weak      NMI_Handler
   .thumb_set NMI_Handler,Default_Handler
 
   .weak      HardFault_Handler
   .thumb_set HardFault_Handler,Default_Handler
 
   .weak      MemManage_Handler
   .thumb_set MemManage_Handler,Default_Handler
 
   .weak      BusFault_Handler
   .thumb_set BusFault_Handler,Default_Handler

   .weak      UsageFault_Handler
   .thumb_set UsageFault_Handler,Default_Handler

   .weak      SVC_Handler
   .thumb_set SVC_Handler,Default_Handler

   .weak      DebugMon_Handler
   .thumb_set DebugMon_Handler,Default_Handler

   .weak      PendSV_Handler
   .thumb_set PendSV_Handler,Default_Handler

   .weak      SysTick_Handler
   .thumb_set SysTick_Handler,Default_Handler             
 
   .weak      WWDG_IRQHandler                   
   .thumb_set WWDG_IRQHandler,Default_Handler     
                 
   .weak      PVD_IRQHandler     
   .thumb_set PVD_IRQHandler,Default_Handler
               
   .weak      TAMP_STAMP_IRQHandler           
   .thumb_set TAMP_STAMP_IRQHandler,Default_Handler
           
   .weak      RTC_WKUP_IRQHandler                 
   .thumb_set RTC_WKUP_IRQHandler,Default_Handler
           
   .weak      FLASH_IRQHandler         
   .thumb_set FLASH_IRQHandler,Default_Handler
                 
   .weak      RCC_IRQHandler     
   .thumb_set RCC_IRQHandler,Default_Handler
                 
   .weak      EXTI0_IRQHandler         
   .thumb_set EXTI0_IRQHandler,Default_Handler
                 
   .weak      EXTI1_IRQHandler         
   .thumb_set EXTI1_IRQHandler,Default_Handler
                     
   .weak      EXTI2_IRQHandler         
   .thumb_set EXTI2_IRQHandler,Default_Handler
                 
   .weak      EXTI3_IRQHandler         
   .thumb_set EXTI3_IRQHandler,Default_Handler
                       
   .weak      EXTI4_IRQHandler         
   .thumb_set EXTI4_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream0_IRQHandler               
   .thumb_set DMA1_Stream0_IRQHandler,Default_Handler
         
   .weak      DMA1_Stream1_IRQHandler               
   .thumb_set DMA1_Stream1_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream2_IRQHandler               
   .thumb_set DMA1_Stream2_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream3_IRQHandler               
   .thumb_set DMA1_Stream3_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream4_IRQHandler             
   .thumb_set DMA1_Stream4_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream5_IRQHandler               
   .thumb_set DMA1_Stream5_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream6_IRQHandler               
   .thumb_set DMA1_Stream6_IRQHandler,Default_Handler
                 
   .weak      ADC_IRQHandler     
   .thumb_set ADC_IRQHandler,Default_Handler
               
   .weak      CAN1_TX_IRQHandler   
   .thumb_set CAN1_TX_IRQHandler,Default_Handler
           
   .weak      CAN1_RX0_IRQHandler                 
   .thumb_set CAN1_RX0_IRQHandler,Default_Handler
                           
   .weak      CAN1_RX1_IRQHandler                 
   .thumb_set CAN1_RX1_IRQHandler,Default_Handler
           
   .weak      CAN1_SCE_IRQHandler                 
   .thumb_set CAN1_SCE_IRQHandler,Default_Handler
           
   .weak      EXTI9_5_IRQHandler   
   .thumb_set EXTI9_5_IRQHandler,Default_Handler
           
   .weak      TIM1_BRK_TIM9_IRQHandler           
   .thumb_set TIM1_BRK_TIM9_IRQHandler,Default_Handler
           
   .weak      TIM1_UP_TIM10_IRQHandler           
   .thumb_set TIM1_UP_TIM10_IRQHandler,Default_Handler
     
   .weak      TIM1_TRG_COM_TIM11_IRQHandler     
   .thumb_set TIM1_TRG_COM_TIM11_IRQHandler,Default_Handler
     
   .weak      TIM1_CC_IRQHandler   
   .thumb_set TIM1_CC_IRQHandler,Default_Handler
                 
   .weak      TIM2_IRQHandler           
   .thumb_set TIM2_IRQHandler,Default_Handler
                 
   .weak      TIM3_IRQHandler           
   .thumb_set TIM3_IRQHandler,Default_Handler
                 
   .weak      TIM4_IRQHandler           
   .thumb_set TIM4_IRQHandler,Default_Handler
                 
   .weak      I2C1_EV_IRQHandler   
   .thumb_set I2C1_EV_IRQHandler,Default_Handler
                     
   .weak      I2C1_ER_IRQHandler   
   .thumb_set I2C1_ER_IRQHandler,Default_Handler
                     
   .weak      I2C2_EV_IRQHandler   
   .thumb_set I2C2_EV_IRQHandler,Default_Handler
                 
   .weak      I2C2_ER_IRQHandler   
   .thumb_set I2C2_ER_IRQHandler,Default_Handler
                           
   .weak      SPI1_IRQHandler           
   .thumb_set SPI1_IRQHandler,Default_Handler
                       
   .weak      SPI2_IRQHandler           
   .thumb_set SPI2_IRQHandler,Default_Handler
                 
   .weak      USART1_IRQHandler     
   .thumb_set USART1_IRQHandler,Default_Handler
                     
   .weak      USART2_IRQHandler     
   .thumb_set USART2_IRQHandler,Default_Handler
                     
   .weak      USART3_IRQHandler     
   .thumb_set USART3_IRQHandler,Default_Handler
                 
   .weak      EXTI15_10_IRQHandler               
   .thumb_set EXTI15_10_IRQHandler,Default_Handler
               
   .weak      RTC_Alarm_IRQHandler               
   .thumb_set RTC_Alarm_IRQHandler,Default_Handler
           
   .weak      OTG_FS_WKUP_IRQHandler         
   .thumb_set OTG_FS_WKUP_IRQHandler,Default_Handler
           
   .weak      TIM8_BRK_TIM12_IRQHandler         
   .thumb_set TIM8_BRK_TIM12_IRQHandler,Default_Handler
         
   .weak      TIM8_UP_TIM13_IRQHandler           
   .thumb_set TIM8_UP_TIM13_IRQHandler,Default_Handler
         
   .weak      TIM8_TRG_COM_TIM14_IRQHandler     
   .thumb_set TIM8_TRG_COM_TIM14_IRQHandler,Default_Handler
     
   .weak      TIM8_CC_IRQHandler   
   .thumb_set TIM8_CC_IRQHandler,Default_Handler
                 
   .weak      DMA1_Stream7_IRQHandler               
   .thumb_set DMA1_Stream7_IRQHandler,Default_Handler
                     
   .weak      FSMC_IRQHandler           
   .thumb_set FSMC_IRQHandler,Default_Handler
                     
   .weak      SDIO_IRQHandler           
   .thumb_set SDIO_IRQHandler,Default_Handler
                     
   .weak      TIM5_IRQHandler           
   .thumb_set TIM5_IRQHandler,Default_Handler
                     
   .weak      SPI3_IRQHandler           
   .thumb_set SPI3_IRQHandler,Default_Handler
                     
   .weak      UART4_IRQHandler         
   .thumb_set UART4_IRQHandler,Default_Handler
                 
   .weak      UART5_IRQHandler         
   .thumb_set UART5_IRQHandler,Default_Handler
                 
   .weak      TIM6_DAC_IRQHandler                 
   .thumb_set TIM6_DAC_IRQHandler,Default_Handler
               
   .weak      TIM7_IRQHandler           
   .thumb_set TIM7_IRQHandler,Default_Handler
         
   .weak      DMA2_Stream0_IRQHandler               
   .thumb_set DMA2_Stream0_IRQHandler,Default_Handler
               
   .weak      DMA2_Stream1_IRQHandler               
   .thumb_set DMA2_Stream1_IRQHandler,Default_Handler
                 
   .weak      DMA2_Stream2_IRQHandler               
   .thumb_set DMA2_Stream2_IRQHandler,Default_Handler
           
   .weak      DMA2_Stream3_IRQHandler               
   .thumb_set DMA2_Stream3_IRQHandler,Default_Handler
           
   .weak      DMA2_Stream4_IRQHandler               
   .thumb_set DMA2_Stream4_IRQHandler,Default_Handler
           
   .weak      ETH_IRQHandler     
   .thumb_set ETH_IRQHandler,Default_Handler
                 
   .weak      ETH_WKUP_IRQHandler                 
   .thumb_set ETH_WKUP_IRQHandler,Default_Handler
           
   .weak      CAN2_TX_IRQHandler   
   .thumb_set CAN2_TX_IRQHandler,Default_Handler
                           
   .weak      CAN2_RX0_IRQHandler                 
   .thumb_set CAN2_RX0_IRQHandler,Default_Handler
                           
   .weak      CAN2_RX1_IRQHandler                 
   .thumb_set CAN2_RX1_IRQHandler,Default_Handler
                           
   .weak      CAN2_SCE_IRQHandler                 
   .thumb_set CAN2_SCE_IRQHandler,Default_Handler
                           
   .weak      OTG_FS_IRQHandler     
   .thumb_set OTG_FS_IRQHandler,Default_Handler
                     
   .weak      DMA2_Stream5_IRQHandler               
   .thumb_set DMA2_Stream5_IRQHandler,Default_Handler
                 
   .weak      DMA2_Stream6_IRQHandler               
   .thumb_set DMA2_Stream6_IRQHandler,Default_Handler
                 
   .weak      DMA2_Stream7_IRQHandler               
   .thumb_set DMA2_Stream7_IRQHandler,Default_Handler
                 
   .weak      USART6_IRQHandler     
   .thumb_set USART6_IRQHandler,Default_Handler
                       
   .weak      I2C3_EV_IRQHandler   
   .thumb_set I2C3_EV_IRQHandler,Default_Handler
                       
   .weak      I2C3_ER_IRQHandler   
   .thumb_set I2C3_ER_IRQHandler,Default_Handler
                       
   .weak      OTG_HS_EP1_OUT_IRQHandler         
   .thumb_set OTG_HS_EP1_OUT_IRQHandler,Default_Handler
               
   .weak      OTG_HS_EP1_IN_IRQHandler           
   .thumb_set OTG_HS_EP1_IN_IRQHandler,Default_Handler
               
   .weak      OTG_HS_WKUP_IRQHandler         
   .thumb_set OTG_HS_WKUP_IRQHandler,Default_Handler
           
   .weak      OTG_HS_IRQHandler     
   .thumb_set OTG_HS_IRQHandler,Default_Handler
                 
   .weak      DCMI_IRQHandler           
   .thumb_set DCMI_IRQHandler,Default_Handler
                                   
   .weak      HASH_RNG_IRQHandler                 
   .thumb_set HASH_RNG_IRQHandler,Default_Handler   

   .weak      FPU_IRQHandler                 
   .thumb_set FPU_IRQHandler,Default_Handler 




Now, AIUI, on reset, the CPU loads SP from the base word in the VT and jumps to the address in base+4 of the VT (the second word). So why do they have

  ldr   sp, =_estack

at the start of the reset handler? Surely it doesn't do anything useful.

Also it seems to me that while interrupts are disabled (in the boot block) the only portion of the VT which is needed is the first part i.e.

Code: [Select]
   
g_pfnVectors:
  .word  _estack
  .word  Reset_Handler
  .word  NMI_Handler
  .word  HardFault_Handler
  .word  MemManage_Handler
  .word  BusFault_Handler
  .word  UsageFault_Handler


because the other events cannot possibly occur. Is this correct?

And similarly, the VT in the "user code" doesn't really need the first two entries

Code: [Select]
 
   .word  _estack
   .word  Reset_Handler


because a) the SP is already set up and b) if the CPU is reset then VT base is reset to 0x08000000 so neither of these (in fact the entire 2nd VT) will get referenced. OTOH one has to have "something" in these locations because (after VTOR is set up to point at this VT) the CPU is expecting all the subsequent vectors to be appropriately offset, but they could just be two words of zero. IOW the first two VT entries are never referenced in any VT that has been relocated - AIUI.

Does this make sense?

« Last Edit: August 25, 2021, 09:51:28 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
A related Q, to which any amount of googling on boot loaders doesn't stop interrupts from crashing as soon as they get enabled:

base of flash (boot loader): 0x08000000
base of user app: 0x08008000 (base+32k)

At base+32k I have the entry point which is just a reflection to main(). This is deffo right because code after it runs etc and has done for ages.

Then at 0x08008000 + 0x200 I have the user app vector table (which has to be 0x200-aligned) i.e.

SCB->VTOR = 0x08008200;

and the table is in the .map file in the right place too.

Is this correct?

Is there anything else which needs setting up to get the CPU to vector via the relocated table?

EDIT: SOLVED. But no idea why. What was needed was default_handler to be at the end of the relocated table. That sits in a separate asm file.

Code: [Select]
/*
 * vectab2.s
 *
 *  Created on: 25 Aug 2021
 *      Author: peter
 *
 * This table is at the start of customer code. Goes right after main_stub entry point.
 * Must be 0x200 aligned.
 *
 */



   .section  .isr_vector2,"a",%progbits
  .type  g_Vectors2, %object
  .size  g_Vectors2, .-g_Vectors2

/* 25/8/21 PH copied from startup_stm32f407.s */

/**
 * @brief  This is the code that gets called when the processor receives an
 *         unexpected interrupt.  This simply enters an infinite loop, preserving
 *         the system state for examination by a debugger.
 * @param  None
 * @retval None
*/

/* The first two will never get used. This vector table is activated by setting VTOR to point to it
   and a reset will reset VTOR to 0. Also the _estack entry is redundant since SP is already set to
   top of CCM.

*/

g_Vectors2:
  .word  _estack
  .word  Reset_Handler
  .word  NMI_Handler
  .word  HardFault_Handler
  .word  MemManage_Handler
  .word  BusFault_Handler
  .word  UsageFault_Handler
  .word  0
  .word  0
  .word  0
  .word  0
  .word  SVC_Handler
  .word  DebugMon_Handler
  .word  0
  .word  PendSV_Handler
  .word  SysTick_Handler

  /* External Interrupts */
  .word     WWDG_IRQHandler                   /* Window WatchDog              */
  .word     PVD_IRQHandler                    /* PVD through EXTI Line detection */
  .word     TAMP_STAMP_IRQHandler             /* Tamper and TimeStamps through the EXTI line */
  .word     RTC_WKUP_IRQHandler               /* RTC Wakeup through the EXTI line */
  .word     FLASH_IRQHandler                  /* FLASH                        */
  .word     RCC_IRQHandler                    /* RCC                          */
  .word     EXTI0_IRQHandler                  /* EXTI Line0                   */
  .word     EXTI1_IRQHandler                  /* EXTI Line1                   */
  .word     EXTI2_IRQHandler                  /* EXTI Line2                   */
  .word     EXTI3_IRQHandler                  /* EXTI Line3                   */
  .word     EXTI4_IRQHandler                  /* EXTI Line4                   */
  .word     DMA1_Stream0_IRQHandler           /* DMA1 Stream 0                */
  .word     DMA1_Stream1_IRQHandler           /* DMA1 Stream 1                */
  .word     DMA1_Stream2_IRQHandler           /* DMA1 Stream 2                */
  .word     DMA1_Stream3_IRQHandler           /* DMA1 Stream 3                */
  .word     DMA1_Stream4_IRQHandler           /* DMA1 Stream 4                */
  .word     DMA1_Stream5_IRQHandler           /* DMA1 Stream 5                */
  .word     DMA1_Stream6_IRQHandler           /* DMA1 Stream 6                */
  .word     ADC_IRQHandler                    /* ADC1, ADC2 and ADC3s         */
  .word     CAN1_TX_IRQHandler                /* CAN1 TX                      */
  .word     CAN1_RX0_IRQHandler               /* CAN1 RX0                     */
  .word     CAN1_RX1_IRQHandler               /* CAN1 RX1                     */
  .word     CAN1_SCE_IRQHandler               /* CAN1 SCE                     */
  .word     EXTI9_5_IRQHandler                /* External Line[9:5]s          */
  .word     TIM1_BRK_TIM9_IRQHandler          /* TIM1 Break and TIM9          */
  .word     TIM1_UP_TIM10_IRQHandler          /* TIM1 Update and TIM10        */
  .word     TIM1_TRG_COM_TIM11_IRQHandler     /* TIM1 Trigger and Commutation and TIM11 */
  .word     TIM1_CC_IRQHandler                /* TIM1 Capture Compare         */
  .word     TIM2_IRQHandler                   /* TIM2                         */
  .word     TIM3_IRQHandler                   /* TIM3                         */
  .word     TIM4_IRQHandler                   /* TIM4                         */
  .word     I2C1_EV_IRQHandler                /* I2C1 Event                   */
  .word     I2C1_ER_IRQHandler                /* I2C1 Error                   */
  .word     I2C2_EV_IRQHandler                /* I2C2 Event                   */
  .word     I2C2_ER_IRQHandler                /* I2C2 Error                   */
  .word     SPI1_IRQHandler                   /* SPI1                         */
  .word     SPI2_IRQHandler                   /* SPI2                         */
  .word     USART1_IRQHandler                 /* USART1                       */
  .word     USART2_IRQHandler                 /* USART2                       */
  .word     USART3_IRQHandler                 /* USART3                       */
  .word     EXTI15_10_IRQHandler              /* External Line[15:10]s        */
  .word     RTC_Alarm_IRQHandler              /* RTC Alarm (A and B) through EXTI Line */
  .word     OTG_FS_WKUP_IRQHandler            /* USB OTG FS Wakeup through EXTI line */
  .word     TIM8_BRK_TIM12_IRQHandler         /* TIM8 Break and TIM12         */
  .word     TIM8_UP_TIM13_IRQHandler          /* TIM8 Update and TIM13        */
  .word     TIM8_TRG_COM_TIM14_IRQHandler     /* TIM8 Trigger and Commutation and TIM14 */
  .word     TIM8_CC_IRQHandler                /* TIM8 Capture Compare         */
  .word     DMA1_Stream7_IRQHandler           /* DMA1 Stream7                 */
  .word     FSMC_IRQHandler                   /* FSMC                         */
  .word     SDIO_IRQHandler                   /* SDIO                         */
  .word     TIM5_IRQHandler                   /* TIM5                         */
  .word     SPI3_IRQHandler                   /* SPI3                         */
  .word     UART4_IRQHandler                  /* UART4                        */
  .word     UART5_IRQHandler                  /* UART5                        */
  .word     TIM6_DAC_IRQHandler               /* TIM6 and DAC1&2 underrun errors */
  .word     TIM7_IRQHandler                   /* TIM7                         */
  .word     DMA2_Stream0_IRQHandler           /* DMA2 Stream 0                */
  .word     DMA2_Stream1_IRQHandler           /* DMA2 Stream 1                */
  .word     DMA2_Stream2_IRQHandler           /* DMA2 Stream 2                */
  .word     DMA2_Stream3_IRQHandler           /* DMA2 Stream 3                */
  .word     DMA2_Stream4_IRQHandler           /* DMA2 Stream 4                */
  .word     ETH_IRQHandler                    /* Ethernet                     */
  .word     ETH_WKUP_IRQHandler               /* Ethernet Wakeup through EXTI line */
  .word     CAN2_TX_IRQHandler                /* CAN2 TX                      */
  .word     CAN2_RX0_IRQHandler               /* CAN2 RX0                     */
  .word     CAN2_RX1_IRQHandler               /* CAN2 RX1                     */
  .word     CAN2_SCE_IRQHandler               /* CAN2 SCE                     */
  .word     OTG_FS_IRQHandler                 /* USB OTG FS                   */
  .word     DMA2_Stream5_IRQHandler           /* DMA2 Stream 5                */
  .word     DMA2_Stream6_IRQHandler           /* DMA2 Stream 6                */
  .word     DMA2_Stream7_IRQHandler           /* DMA2 Stream 7                */
  .word     USART6_IRQHandler                 /* USART6                       */
  .word     I2C3_EV_IRQHandler                /* I2C3 event                   */
  .word     I2C3_ER_IRQHandler                /* I2C3 error                   */
  .word     OTG_HS_EP1_OUT_IRQHandler         /* USB OTG HS End Point 1 Out   */
  .word     OTG_HS_EP1_IN_IRQHandler          /* USB OTG HS End Point 1 In    */
  .word     OTG_HS_WKUP_IRQHandler            /* USB OTG HS Wakeup through EXTI */
  .word     OTG_HS_IRQHandler                 /* USB OTG HS                   */
  .word     DCMI_IRQHandler                   /* DCMI                         */
  .word     0                                 /* CRYP crypto                  */
  .word     HASH_RNG_IRQHandler               /* Hash and Rng                 */
  .word     FPU_IRQHandler                    /* FPU                          */


/*******************************************************************************
*
* Provide weak aliases for each Exception handler to the Default_Handler.
* As they are weak aliases, any function with the same name will override
* this definition.
*
*******************************************************************************/
   .weak      NMI_Handler
   .thumb_set NMI_Handler,Default_Handler

   .weak      HardFault_Handler
   .thumb_set HardFault_Handler,Default_Handler

   .weak      MemManage_Handler
   .thumb_set MemManage_Handler,Default_Handler

   .weak      BusFault_Handler
   .thumb_set BusFault_Handler,Default_Handler

   .weak      UsageFault_Handler
   .thumb_set UsageFault_Handler,Default_Handler

   .weak      SVC_Handler
   .thumb_set SVC_Handler,Default_Handler

   .weak      DebugMon_Handler
   .thumb_set DebugMon_Handler,Default_Handler

   .weak      PendSV_Handler
   .thumb_set PendSV_Handler,Default_Handler

   .weak      SysTick_Handler
   .thumb_set SysTick_Handler,Default_Handler

   .weak      WWDG_IRQHandler
   .thumb_set WWDG_IRQHandler,Default_Handler

   .weak      PVD_IRQHandler
   .thumb_set PVD_IRQHandler,Default_Handler

   .weak      TAMP_STAMP_IRQHandler
   .thumb_set TAMP_STAMP_IRQHandler,Default_Handler

   .weak      RTC_WKUP_IRQHandler
   .thumb_set RTC_WKUP_IRQHandler,Default_Handler

   .weak      FLASH_IRQHandler
   .thumb_set FLASH_IRQHandler,Default_Handler

   .weak      RCC_IRQHandler
   .thumb_set RCC_IRQHandler,Default_Handler

   .weak      EXTI0_IRQHandler
   .thumb_set EXTI0_IRQHandler,Default_Handler

   .weak      EXTI1_IRQHandler
   .thumb_set EXTI1_IRQHandler,Default_Handler

   .weak      EXTI2_IRQHandler
   .thumb_set EXTI2_IRQHandler,Default_Handler

   .weak      EXTI3_IRQHandler
   .thumb_set EXTI3_IRQHandler,Default_Handler

   .weak      EXTI4_IRQHandler
   .thumb_set EXTI4_IRQHandler,Default_Handler

   .weak      DMA1_Stream0_IRQHandler
   .thumb_set DMA1_Stream0_IRQHandler,Default_Handler

   .weak      DMA1_Stream1_IRQHandler
   .thumb_set DMA1_Stream1_IRQHandler,Default_Handler

   .weak      DMA1_Stream2_IRQHandler
   .thumb_set DMA1_Stream2_IRQHandler,Default_Handler

   .weak      DMA1_Stream3_IRQHandler
   .thumb_set DMA1_Stream3_IRQHandler,Default_Handler

   .weak      DMA1_Stream4_IRQHandler
   .thumb_set DMA1_Stream4_IRQHandler,Default_Handler

   .weak      DMA1_Stream5_IRQHandler
   .thumb_set DMA1_Stream5_IRQHandler,Default_Handler

   .weak      DMA1_Stream6_IRQHandler
   .thumb_set DMA1_Stream6_IRQHandler,Default_Handler

   .weak      ADC_IRQHandler
   .thumb_set ADC_IRQHandler,Default_Handler

   .weak      CAN1_TX_IRQHandler
   .thumb_set CAN1_TX_IRQHandler,Default_Handler

   .weak      CAN1_RX0_IRQHandler
   .thumb_set CAN1_RX0_IRQHandler,Default_Handler

   .weak      CAN1_RX1_IRQHandler
   .thumb_set CAN1_RX1_IRQHandler,Default_Handler

   .weak      CAN1_SCE_IRQHandler
   .thumb_set CAN1_SCE_IRQHandler,Default_Handler

   .weak      EXTI9_5_IRQHandler
   .thumb_set EXTI9_5_IRQHandler,Default_Handler

   .weak      TIM1_BRK_TIM9_IRQHandler
   .thumb_set TIM1_BRK_TIM9_IRQHandler,Default_Handler

   .weak      TIM1_UP_TIM10_IRQHandler
   .thumb_set TIM1_UP_TIM10_IRQHandler,Default_Handler

   .weak      TIM1_TRG_COM_TIM11_IRQHandler
   .thumb_set TIM1_TRG_COM_TIM11_IRQHandler,Default_Handler

   .weak      TIM1_CC_IRQHandler
   .thumb_set TIM1_CC_IRQHandler,Default_Handler

   .weak      TIM2_IRQHandler
   .thumb_set TIM2_IRQHandler,Default_Handler

   .weak      TIM3_IRQHandler
   .thumb_set TIM3_IRQHandler,Default_Handler

   .weak      TIM4_IRQHandler
   .thumb_set TIM4_IRQHandler,Default_Handler

   .weak      I2C1_EV_IRQHandler
   .thumb_set I2C1_EV_IRQHandler,Default_Handler

   .weak      I2C1_ER_IRQHandler
   .thumb_set I2C1_ER_IRQHandler,Default_Handler

   .weak      I2C2_EV_IRQHandler
   .thumb_set I2C2_EV_IRQHandler,Default_Handler

   .weak      I2C2_ER_IRQHandler
   .thumb_set I2C2_ER_IRQHandler,Default_Handler

   .weak      SPI1_IRQHandler
   .thumb_set SPI1_IRQHandler,Default_Handler

   .weak      SPI2_IRQHandler
   .thumb_set SPI2_IRQHandler,Default_Handler

   .weak      USART1_IRQHandler
   .thumb_set USART1_IRQHandler,Default_Handler

   .weak      USART2_IRQHandler
   .thumb_set USART2_IRQHandler,Default_Handler

   .weak      USART3_IRQHandler
   .thumb_set USART3_IRQHandler,Default_Handler

   .weak      EXTI15_10_IRQHandler
   .thumb_set EXTI15_10_IRQHandler,Default_Handler

   .weak      RTC_Alarm_IRQHandler
   .thumb_set RTC_Alarm_IRQHandler,Default_Handler

   .weak      OTG_FS_WKUP_IRQHandler
   .thumb_set OTG_FS_WKUP_IRQHandler,Default_Handler

   .weak      TIM8_BRK_TIM12_IRQHandler
   .thumb_set TIM8_BRK_TIM12_IRQHandler,Default_Handler

   .weak      TIM8_UP_TIM13_IRQHandler
   .thumb_set TIM8_UP_TIM13_IRQHandler,Default_Handler

   .weak      TIM8_TRG_COM_TIM14_IRQHandler
   .thumb_set TIM8_TRG_COM_TIM14_IRQHandler,Default_Handler

   .weak      TIM8_CC_IRQHandler
   .thumb_set TIM8_CC_IRQHandler,Default_Handler

   .weak      DMA1_Stream7_IRQHandler
   .thumb_set DMA1_Stream7_IRQHandler,Default_Handler

   .weak      FSMC_IRQHandler
   .thumb_set FSMC_IRQHandler,Default_Handler

   .weak      SDIO_IRQHandler
   .thumb_set SDIO_IRQHandler,Default_Handler

   .weak      TIM5_IRQHandler
   .thumb_set TIM5_IRQHandler,Default_Handler

   .weak      SPI3_IRQHandler
   .thumb_set SPI3_IRQHandler,Default_Handler

   .weak      UART4_IRQHandler
   .thumb_set UART4_IRQHandler,Default_Handler

   .weak      UART5_IRQHandler
   .thumb_set UART5_IRQHandler,Default_Handler

   .weak      TIM6_DAC_IRQHandler
   .thumb_set TIM6_DAC_IRQHandler,Default_Handler

   .weak      TIM7_IRQHandler
   .thumb_set TIM7_IRQHandler,Default_Handler

   .weak      DMA2_Stream0_IRQHandler
   .thumb_set DMA2_Stream0_IRQHandler,Default_Handler

   .weak      DMA2_Stream1_IRQHandler
   .thumb_set DMA2_Stream1_IRQHandler,Default_Handler

   .weak      DMA2_Stream2_IRQHandler
   .thumb_set DMA2_Stream2_IRQHandler,Default_Handler

   .weak      DMA2_Stream3_IRQHandler
   .thumb_set DMA2_Stream3_IRQHandler,Default_Handler

   .weak      DMA2_Stream4_IRQHandler
   .thumb_set DMA2_Stream4_IRQHandler,Default_Handler

   .weak      ETH_IRQHandler
   .thumb_set ETH_IRQHandler,Default_Handler

   .weak      ETH_WKUP_IRQHandler
   .thumb_set ETH_WKUP_IRQHandler,Default_Handler

   .weak      CAN2_TX_IRQHandler
   .thumb_set CAN2_TX_IRQHandler,Default_Handler

   .weak      CAN2_RX0_IRQHandler
   .thumb_set CAN2_RX0_IRQHandler,Default_Handler

   .weak      CAN2_RX1_IRQHandler
   .thumb_set CAN2_RX1_IRQHandler,Default_Handler

   .weak      CAN2_SCE_IRQHandler
   .thumb_set CAN2_SCE_IRQHandler,Default_Handler

   .weak      OTG_FS_IRQHandler
   .thumb_set OTG_FS_IRQHandler,Default_Handler

   .weak      DMA2_Stream5_IRQHandler
   .thumb_set DMA2_Stream5_IRQHandler,Default_Handler

   .weak      DMA2_Stream6_IRQHandler
   .thumb_set DMA2_Stream6_IRQHandler,Default_Handler

   .weak      DMA2_Stream7_IRQHandler
   .thumb_set DMA2_Stream7_IRQHandler,Default_Handler

   .weak      USART6_IRQHandler
   .thumb_set USART6_IRQHandler,Default_Handler

   .weak      I2C3_EV_IRQHandler
   .thumb_set I2C3_EV_IRQHandler,Default_Handler

   .weak      I2C3_ER_IRQHandler
   .thumb_set I2C3_ER_IRQHandler,Default_Handler

   .weak      OTG_HS_EP1_OUT_IRQHandler
   .thumb_set OTG_HS_EP1_OUT_IRQHandler,Default_Handler

   .weak      OTG_HS_EP1_IN_IRQHandler
   .thumb_set OTG_HS_EP1_IN_IRQHandler,Default_Handler

   .weak      OTG_HS_WKUP_IRQHandler
   .thumb_set OTG_HS_WKUP_IRQHandler,Default_Handler

   .weak      OTG_HS_IRQHandler
   .thumb_set OTG_HS_IRQHandler,Default_Handler

   .weak      DCMI_IRQHandler
   .thumb_set DCMI_IRQHandler,Default_Handler

   .weak      HASH_RNG_IRQHandler
   .thumb_set HASH_RNG_IRQHandler,Default_Handler

   .weak      FPU_IRQHandler
   .thumb_set FPU_IRQHandler,Default_Handler


       .section  .text.Default_Handler,"ax",%progbits
Default_Handler:
Infinite_Loop:
  b  Infinite_Loop
  .size  Default_Handler, .-Default_Handler


Previously, default_handler was in the boot block, which now contains a much abbreviated vector table since all interrupts are disabled in the boot block:

Code: [Select]
/**
  ******************************************************************************
  * @file      startup_stm32f407xx.s
  * @author    MCD Application Team
  * @brief     STM32F407xx Devices vector table for GCC based toolchains.
  *            This module performs:
  *                - Set the initial SP
  *                - Set the initial PC == Reset_Handler,
  *                - Set the vector table entries with the exceptions ISR address
  *                - Branches to main in the C library (which eventually
  *                  calls main()).
  *            After Reset the Cortex-M4 processor is in Thread mode,
  *            priority is Privileged, and the Stack is set to Main.
  *
  * MODDED 17/5/2021 PH for CCM zeroing.
  * 19/7/21 PH main() changed to B_main()
  * 5/8/21 PH SystemInit() call moved to b_main.c
  * 25/8/21 PH Vector table truncated and the whole one copied to vectab2.s
  *
  *
  */
   
  .syntax unified
  .cpu cortex-m4
  .fpu softvfp
  .thumb

.global  g_pfnVectors
.global  Default_Handler

/* start address for the initialization values of the .data section.
defined in linker script */
.word  _sidata
/* start address for the .data section. defined in linker script */ 
.word  _sdata
/* end address for the .data section. defined in linker script */
.word  _edata
/* start address for the .bss section. defined in linker script */
.word  _sbss
/* end address for the .bss section. defined in linker script */
.word  _ebss
/* stack used for SystemInit_ExtMemCtl; always internal RAM used */

/**
 * @brief  This is the code that gets called when the processor first
 *          starts execution following a reset event. Only the absolutely
 *          necessary set is performed, after which the application
 *          supplied main() routine is called.
 * @param  None
 * @retval : None
*/

    .section  .text.Reset_Handler
  .weak  Reset_Handler
  .type  Reset_Handler, %function
Reset_Handler: 

  ldr   sp, =_estack

/* Copy the data segment initializers from flash to SRAM */ 
  movs  r1, #0
  b  LoopCopyDataInit

CopyDataInit:
  ldr  r3, =_sidata
  ldr  r3, [r3, r1]
  str  r3, [r0, r1]
  adds  r1, r1, #4
   
LoopCopyDataInit:
  ldr  r0, =_sdata
  ldr  r3, =_edata
  adds  r2, r0, r1
  cmp  r2, r3
  bcc  CopyDataInit
  ldr  r2, =_sbss
  b  LoopFillZerobss
/* Zero fill the bss segment. */ 
FillZerobss:
  movs  r3, #0
  str  r3, [r2], #4
   
LoopFillZerobss:
  ldr  r3, = _ebss
  cmp  r2, r3
  bcc  FillZerobss

/* Initialise CCM RAM - fills the whole CCM so don't use the stack until afterwards :) */
/* PH 15/5/2021 */
ldr r2, = 0x10000000  /* was _sccmram */
b LoopFillZeroCcm

FillZeroCcm:
movs r3, 0xaaaaaaaa /* this fill intentionally differs from the a5a5a5a5 fill used by FreeRTOS for its stacks */
  str  r3, [r2]
adds r2, r2, #4

LoopFillZeroCcm:
ldr r3, = 0x10010000  /* was _eccmram */
cmp r2, r3
bcc FillZeroCcm

/* Call the clock system initialization function. Moded to b_main.c*/
//  bl  B_SystemInit
/* Call static constructors */
/*  bl B_libc_init_array */

/* Call the application's entry point - in this case the main() in the boot loader */
  bl  B_main
  bx  lr   
.size  Reset_Handler, .-Reset_Handler


/******************************************************************************
*
* The minimal vector table for a Cortex M3. Note that the proper constructs
* must be placed on this to ensure that it ends up at physical address
* 0x0000.0000.
*
*******************************************************************************/
   .section  .isr_vector,"a",%progbits
  .type  g_pfnVectors, %object
  .size  g_pfnVectors, .-g_pfnVectors

/* 25/8/21 PH truncated to just those which could be activated from the boot block */
   
g_pfnVectors:
  .word  _estack
  .word  Reset_Handler
  .word  B_NMI_Handler
  .word  B_HardFault_Handler
  .word  B_MemManage_Handler
  .word  B_BusFault_Handler
  .word  B_UsageFault_Handler
 
                         
/*******************************************************************************
*
* Provide weak aliases for each Exception handler to the Default_Handler.
* As they are weak aliases, any function with the same name will override
* this definition.
*
*******************************************************************************/
/*
   .weak      NMI_Handler
   .thumb_set NMI_Handler,Default_Handler
 
   .weak      HardFault_Handler
   .thumb_set HardFault_Handler,Default_Handler
 
   .weak      MemManage_Handler
   .thumb_set MemManage_Handler,Default_Handler
 
   .weak      BusFault_Handler
   .thumb_set BusFault_Handler,Default_Handler

   .weak      UsageFault_Handler
   .thumb_set UsageFault_Handler,Default_Handler
*/


TBH I don't understand the above default_handler stuff, especially why having that endless loop there makes it work, because if that trap was ever entered then obviously the thing would stay there, but I have the whole thing running perfectly, RTOS and all.

Maybe the boot block code needs its own default_handler also? I've spent hours reading about this stuff and this handler is supposed to trap any interrupts which are not defined in the table, or perhaps invalid opcodes so basically any attempt to execute a nonexistent ISR.

Also there isn't, and never was, a section in the linkfile for default_handler, so that loop isn't getting located anywhere. Could it be that without the default_handler being there, the compiler was optimising out much of the vector table, and that would obviously screw up int servicing :) I thought asm (in a .s file especially) would never be optimised out, although I have read stuff to the contrary.

« Last Edit: August 26, 2021, 02:47:33 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
“ What was needed was default_handler to be at the end of the relocated table”
And where it was before (when the code wasn’t working)? If you’ve placed it i.e. before the VT declaration in the same section - it would shift entire VT placement.

BTW, all those WEAKs in startup hinder some types of optimizations (-flto).
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
The default_handler was in the boot block VT.

Moving it to the "user code" VT fixed the issue. I still don't know how but probably the VT weaks had to point somewhere.

It is in its own named section, which the linker would not have processed, so would not have affected the VT location - either VT - and I checked that the VT location was correct by looking at the .map file and by doing memory examination.

Those "weak" thingies are from STM. I can sort of see how that system works but got rid of it in the boot block VT and made the handlers explicit functions (containing just while(1){} ) in the boot block, so exceptions enable the debugger to end up somewhere obvious. In the "user block", exceptions are handled by the same empty functions but residing in the ST HAL code.

So the only point of the "weaks" is a lazy way of setting up new ISRs without having to edit that vector table file, which is valid for user code but not for the boot block which has ints disabled. TBH I have not looked at int handling in the ST HAL yet; it looks like a lot of it goes to some huge function which picks its way through the int sources to find out who did it :)

I think 93.4% of stuff posted all over the internet, by desperate people trying to get STM stuff to work, is the result of shitty convoluted STM code which most "normal" embedded people don't understand. And there is no documentation! It was created to make life easy for those very people but it made it very hard to customise the code. For example google on STM 32F4 VTOR and see how many people struggle.
« Last Edit: August 26, 2021, 07:32:27 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
Future technological singularity painted by SciFi: AIs inventing things beyond human understanding.
Singularity we are approaching in reality: AIs bloating code beyond human understanding :D
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Is there anything special that needs doing around the VTOR value loading?

The RM says nothing, but I've seen some stuff online where people used inline asm to flush the CPU cache, and such like.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
Nothing special. Set VTOR, enable interrupts, harvest. Check your vector table content (and actual placement/alignment) if it doesn't work.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
It does all work fine. I just had something "funny" around that VTOR loading line. It didn't come back and I suspect it was something funny with Cube which sometimes leaves breakpoints around, active, without displaying them. The only way to find them is Windows -> Show View -> Breakpoints. Didn't happen again.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline eutectique

  • Frequent Contributor
  • **
  • Posts: 390
  • Country: be
You can place the functions into RAM with the following line added to your .ld file:
Code: [Select]
  .data : AT ( _sidata )
  {
    . = ALIGN(4);
    _sdata = .;
    *(.ramfunc .ramfunc.*)  <----- ADD THIS
    *(.data)
    _edata = .;
  } > RAM
  ...
and the following line in your code (or something similar to that effect depending on your compiler):
Code: [Select]
#define RAMFUNC __attribute__ ((section(".ramfunc")))

Then decorate whichever functions and data with RAMFUNC, and you are there. As I can see, this has already been suggested.

You would probably want to reclaim the RAM in case you don't use RAM functions. Then there is OVERLAY command in ld linker script. I've never needed one myself, but google brings quite a few examples.

Or you could probably want to have the piece of code and data in RAM to be completely independent from your main app, and yet linked into the app. Then you can add it as a subdirectory (subproject, whatever), add the rules to your overall Makefile so that the app depends on this sub-thing, and let the toolchain build both and combine them in one hex or srec file.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf