Author Topic: How to create ELF file which contains the normal prog, plus a relocatable block? (Read 12597 times)

peter-h · « **Reply #25 on:** July 15, 2021, 08:38:17 pm »

OK thanks. Yes that worked. Loader is 0x140 in size:

KDE_loader      0x0000000020000000      0x140 load address 0x000000000803b698
                0x0000000020000000                _loader_start = .
 *KDE_loader.o(.text*)
 .text.HAL_GPIO_WritePin
                0x0000000020000000       0x32 src/KDE_loader.o
 *fill*         0x0000000020000032        0x2 
 .text.KDE_LED_On
                0x0000000020000034       0x38 src/KDE_loader.o
 .text.KDE_LED_Off
                0x000000002000006c       0x38 src/KDE_loader.o
 .text.hang_around
                0x00000000200000a4       0x26 src/KDE_loader.o
 .text.loader   0x00000000200000ca       0x1c src/KDE_loader.o
                0x00000000200000ca                loader
 *KDE_loader.o(.rodata*)
 *fill*         0x00000000200000e6        0x2 
 .rodata        0x00000000200000e8        0xc src/KDE_loader.o
                0x00000000200000e8                GPIO_PIN_X
 *KDE_loader.o(.data*)
 .data          0x00000000200000f4       0x18 src/KDE_loader.o
                0x00000000200000f4                GPIO_PORT_X
 *KDE_loader.o(.ARM.attributes)
 .ARM.attributes
                0x000000002000010c       0x34 src/KDE_loader.o
                0x0000000020000140                . = ALIGN (0x4)
                0x0000000020000140                _loader_end = .
                0x0000000000000140                _loader_size = SIZEOF (KDE_loader)

Is "extern char _loader_start;" right? I know this is a dumb Q but char can hold only a byte, not a 32 bit address. When I step through it, hovering on that value shows decimal 16, which is obviously garbage. Trying "extern _loader_start;" doesn't help.

BUT the transfer to loader() does work! I am stepping through the code, and the PC shows 0x2... so it is running in RAM!

and THANK YOU this runs:

Code: [Select]

void loader(void)
{


	for (;;)
	{
		KDE_LED_On(KDE_LED2);
		hang_around(200);
		KDE_LED_Off(KDE_LED2);
		hang_around(200);
	}

}

The values of these, when hovered over in single stepping

Code: [Select]

	extern char _loader_start;
	extern char _loader_end;
	extern char _loader_loadaddr;
	extern char _loader_bss_start;
	extern char _loader_bss_end;

are

0x20000000
0x20000140
empty
empty
empty

However, when I single step through the asm code I see the correct values in all the registers. BSS size=0, FWIW, even if I add
uint8_t fred[256];
or
volatile uint8_t fred[256];
outside any loader function, which should generate a bss of 256 bytes. The zero value is confirmed by

loader_bss 0x0000000020000140 0x0
0x0000000020000140 _loader_bss_start = .
*KDE_loader.o(.bss*)
0x0000000020000140 . = ALIGN (0x4)
0x0000000020000140 _loader_bss_end = .

Anyway, it is running

I don't need the bss, because I can keep all arrays etc within functions, but it would be nice to fix it. And the non-functioning could be hiding another issue.

The veneer removal syntax worked, too.

This is brilliant, not least because it avoids maintaining a separate project just to generate the loader.

When I am done, I will post the details, because it is bound to help others. The net is full of people who struggled with RAM code, usually for the purpose of rewriting the CPU FLASH.

As an aside, I timed that delay function, and RAM code runs 12.5% SLOWER than FLASH code, so the "cache accelerator" works perfectly well for code which fits into the cache. This is contrary to what has been posted elsewhere - at least for fairly compact code. I did not expect it to run slower though; it is supposed to run with zero wait states!

Code: [Select]

// Hang around for delay in ms. Approximate but doesn't need interrupts etc working.
// Tweaked for RAM resident code which runs a little slower (!) than FLASH resident code.

static void hang_around(uint32_t delay)
{

	uint32_t fred = 15100*delay;

	while (fred>0)
	{
		fred--;
	}

}

gf · « **Reply #26 on:** July 15, 2021, 09:12:37 pm »

Quote from: peter-h on July 15, 2021, 08:38:17 pm

Is "extern char _loader_start;" right?

It is just a dummy variable, in order that that the linker-defined symbol _loader_start (which is an address) can be referenced from C as if it were the address of an extern varialbe.
The type matters only for the size calculation &_loader_end - &_loader_start. If the pointer difference should be in bytes, then type char is a good idea.
You could also declare it extern char _loader_start[]; Then you need to omit the & for the memcpy() and memset() arguments.

Edit:

Which processor model is it?
I'm also wondering, whether a Instruction Synchronization Barrier (ISB) were actually required, before calling loader().

Edit:

Quote

However, when I single step through the asm code I see the correct values in all the registers. BSS size=0, FWIW, even if I add
uint8_t fred[256];
or
volatile uint8_t fred[256];
outside any loader function, which should generate a bss of 256 bytes. The zero value is confirmed by

Can you do objdump -h KDE_loader.o?
Is a .bss with 256 bytes present in the .o file?
Or did the compiler possibly generate a common block instead?

peter-h · « **Reply #27 on:** July 15, 2021, 10:01:11 pm »

Thank you.

It's a 32F417.

The bss stuff I will try tomorrow or the day after - running around a bit

What an excellent forum this is.

gf · « **Reply #28 on:** July 16, 2021, 06:06:22 am »

It is also possible that fred[] gets garbage-collected if it is not used (i.e. not referenced from anywhere).

peter-h · « **Reply #29 on:** July 16, 2021, 06:19:18 am »

That's why I tried "volatile".

However, other unused things don't get removed e.g. I declare a 48k uint8_t array in main.c which goes in CCM and is used for the FreeRTOS workspace, and another 16k dummy one, not referenced, to fill that out to 64k, so that CCM usage shows as "64k" in the CubeIDE project usage numbers, to show that CCM is all full

The last 16k is used for the general stack for ISRs etc (SP set to top of CCM) but the IDE has no way of knowing about that, and would show the last 16k as available, which might confuse the hell out of me if I have to revisit it in 5 years' time

I will do more tests tomorrow.

gf · « **Reply #30 on:** July 16, 2021, 07:28:32 am »

Quote from: peter-h on July 16, 2021, 06:19:18 am

That's why I tried "volatile".

However, other unused things don't get removed e.g. I declare a 48k uint8_t array in main.c which goes in CCM and is used for the FreeRTOS workspace, and another 16k dummy one, not referenced, to fill that out to 64k, so that CCM usage shows as "64k" in the CubeIDE project usage numbers, to show that CCM is all full The last 16k is used for the general stack for ISRs etc (SP set to top of CCM) but the IDE has no way of knowing about that, and would show the last 16k as available, which might confuse the hell out of me if I have to revisit it in 5 years' time

I will do more tests tomorrow.

As said, do an objdump -x on the .o file to see what's actually present in the object file.
If the compiler generates a common block for the global data you need to add

loader_bss _loader_end : { _loader_bss_start = .; *KDE_loader.o(.bss*) *KDE_loader.o(COMMON) . = ALIGN(4); _loader_bss_end = .; }

[ Actually, you can add this in general. ]

abyrvalg · « **Reply #31 on:** July 16, 2021, 08:18:35 am »

Why are you doing this? Your RAM code has nothing common with the main code - separate memory map, init, library functions set, lifecycle. It can’t use code/data of the main part and vice versa. What are you trying to do now is implementing two isolated projects built as one. What’s the point? Move the RAM code to a separate project, configure it to output a .bin, then use INCBIN directive in the main part (so there will be no custom build steps, just build RAM code, then build the main and RAM code will be included automatically), memcpy() it to a fixed address, call it and hang.
If you need to pass any data between the parts declare a struct placed in a separate noinit section at the same fixed address in both projects (or use some regs like BKP->DRx if the data is small).

gf · « **Reply #32 on:** July 16, 2021, 09:16:24 am »

It was the OP's requirement:

Quote from: peter-h on July 12, 2021, 08:32:29 pm

However, I don't just want to load a chunk of binary data into RAM. That is easy. One could even have a .c file with an initialised uchar array in it; that will automatically end up in initialised data. The challenge is how to implement in effect

ORG 0x20000000
C code goes here

and have this within the same project.

abyrvalg · « **Reply #33 on:** July 16, 2021, 10:11:50 am »

Sure, I’ve read the "ends up with two projects to maintain, and switch between them" remark. But this alternative two-in-one solution would be error prone and even harder to maintain it the end, IMO. And there are "bigger picture" questions, i.e. what happens if the power is cut in the middle of SFlash->Flash copy? Answering that could move even further away from two-in-one solution, like moving the flash copy code to some kind of bootloader that survives update failures and repeats the copy process on the next power on.

gf · « **Reply #34 on:** July 16, 2021, 10:55:41 am »

Sure, the usual solution is rather to have a separate bootloader in a dedicated region of flash, which is not overwritten when a new image is flashed, so that it keeps functional when the flash update is aborted somewhere in the middle.

peter-h · « **Reply #35 on:** July 16, 2021, 07:56:31 pm »

"Why are you doing this? "

Firstly, in my general business I often have to revisit old projects. Right now I am doing an update of a job last done c. 1997 (analog only, but quite tricky). This is why I am using Protel PCB 2.8 (1995)

So I am very careful to do stuff in a way which makes it as easy as possible to do this. This is something 99% of designers don't need to worry about because they will move on every few years, or more often. But this is my business and I have to look after it, because in the end there is only me. And you know how hard it is to get into old software. Most people who are paid to do that really hate it. And there are multiple reasons for doing just one project e.g. archiving, documentation, etc. Project archiving is a particular problem which I have struggled with many times...

"the usual solution is rather to have a separate bootloader in a dedicated region of flash, which is not overwritten when a new image is flashed, so that it keeps functional when the flash update is aborted somewhere in the middle."

This is how it will be done. The loader will be written into the top 16k of the 1MB FLASH (it will be originally written there during factory programming, when the whole 1MB will be written, using SWD) and it (well, the RAM resident copy of it) will never write into this top 16k. But there is a little problem: on the 32F417 the top 128k has to be erased as a single block, so there will be a window of opportunity for bricking the product. Very unlikely, because the flashing will start at the bottom and by the time you get to the top 128k you have had error-free writes to all the lower blocks. Also the erasure of the 128k block will necessitate the top 16k of it to be temporarily saved in RAM and then immediately written back. The way to avoid this "brick window" is to have the loader somewhere other than the top 128k, but that causes other issues.

"Your RAM code has nothing common with the main code"

That isn't actually the case. There is a huge amount of common data e.g. the huge .h files full of port addresses, pin names, etc. These come from the ST libraries. This stuff can be #included in both the main code (many .c files) and in the loader.c file.

"If you need to pass any data between the parts declare a struct placed in a separate noinit section "

There will be some "data passing" involved because e.g. the loader will be executed at every power-up but it may need to perform different actions. The plan is to get main() to shove some data into a serial (SPI) flash which this product also has, and the loader can pick it up. Or one could store data in the 32F4's RTC data storage area; that is less good because it will be lost if the RTC backup battery (a supercap) is not charged, or not even fitted, and there is a power-down involved. The data passing has to survive a power-down, for reasons not easy to explain, but the amount of data passed to the loader can be done in a single byte.

I actually have to do something else. I need a copy of that loader to be compiled to execute at the top 16k of the FLASH, and arranged so that the SWD writes it there. I realise, from reading various things around that this may generate a huge .elf where most of the 1MB is 0x00 or some such, but that's ok because it still takes only seconds to write it. And that loader will be what gets copied to RAM. Alternatively I could have the loader anywhere in FLASH and a bit of code which writes it to FLASH if not already there, but writing it there using SWD is the cleanest way. The entry point of the FLASH based loader will obviously need to be at the start of the 16k block, so the loader.c file will need to start with a function which just contains a jump/call to the real loader which does the work.

So I need two copies of the loader, one compiled/linked to execute at "1MB minus 16k" and the other at 0x20000000. Both will be actually run from those addresses. Relocatable code would be a neat solution...

harerod · « **Reply #36 on:** July 17, 2021, 12:29:29 pm »

Quote from: peter-h on July 16, 2021, 07:56:31 pm

"Why are you doing this? "
...
"the usual solution is rather to have a separate bootloader in a dedicated region of flash, which is not overwritten when a new image is flashed, so that it keeps functional when the flash update is aborted somewhere in the middle."

This is how it will be done. The loader will be written into the top 16k of the 1MB FLASH (it will be originally written there during factory programming, when the whole 1MB will be written, using SWD) and it (well, the RAM resident copy of it) will never write into this top 16k. But there is a little problem: on the 32F417 the top 128k has to be erased as a single block, so there will be a window of opportunity for bricking the product. Very unlikely, because the flashing will start at the bottom and by the time you get to the top 128k you have had error-free writes to all the lower blocks. Also the erasure of the 128k block will necessitate the top 16k of it to be temporarily saved in RAM and then immediately written back. The way to avoid this "brick window" is to have the loader somewhere other than the top 128k, but that causes other issues.
...

I love this thread, because it shows many interesting facets of STM32 operation. Running dynamically relocatable code on a MCU optimized for flash operation - I love that.
What I don't understand is why you need to put the loader into the top sector. The boring and time proven concept is using some of the small sectors at the beginning of the flash memory to either trampoline to the app or execute loader functionality. Again, I am certain you have your reason for doing this and I enjoy reading the information users have put into this thread.

abyrvalg · « **Reply #37 on:** July 17, 2021, 12:42:33 pm »

Quote from: peter-h on July 16, 2021, 07:56:31 pm

"Your RAM code has nothing common with the main code"

That isn't actually the case. There is a huge amount of common data e.g. the huge .h files full of port addresses, pin names, etc. These come from the ST libraries. This stuff can be #included in both the main code (many .c files) and in the loader.c file.

I mean the resulting binaries. There is nothing wrong in including same headers/linking same libraries while building a separate binary. And you don't need it to be in some unrelated place, put it into a subdir of the main project, archive everything together, include/link files from the main project freely, but output a separate .elf/.bin - this will ensure that there is no code/data shared between the two parts, no matter what you'll change in the sources. You are ok with .ld script tweaking, why not do the same with the Makefile? Link each of the two bins separately (from it's own file set and with own .ld). Of course you can do it your way, but you are working against the nature of linker now.

peter-h · « **Reply #38 on:** July 17, 2021, 05:38:36 pm »

I don't appear to have objdump.exe (as a part of ST Cub IDE, or anywhere else). Anyway, the .map file shows fred123[256] in the Common section

Regarding doing two copies of the loader, compiled/loaded for two addresses

Top 16k block of CPU FLASH: 0x08100000 minus 16k = 0x80FC000
Base of RAM: 0x20000000

it seems to me that I can have two .c files, called say loader_flash.c and loader_ram.c, and #include the same loader file in both, say loader_common.c. Then have two sections in the link script, loading loader_flash.0 at 0x80FC000 and loader_ram.0 at 0x20000000. Can anyone think of a problem with this? I think it should be loader_common.txt otherwise the makefile creation script will try to compile it, but I just want to #include it as a block of text.

SiliconWizard · « **Reply #39 on:** July 17, 2021, 06:00:43 pm »

Quote from: peter-h on July 17, 2021, 05:38:36 pm

I don't appear to have objdump.exe (as a part of ST Cub IDE, or anywhere else). Anyway, the .map file shows fred123[256] in the Common section

I'm pretty sure it uses GCC as a compiler, so I'd be pretty suprised if it didn't come with objdump.
Look for 'arm-none-eabi-objdump.exe', as it's normally the exact file name for this utility for ARM Cortex-M targets.

gf · « **Reply #40 on:** July 17, 2021, 06:27:27 pm »

I also think so. Like the names of the other cross tools it is likely prefixed with arm-none-eabi-, i.e. arm-none-eabi-objdump, arm-none-eabi-gcc, etc.

Potential issue regarding multiple inclusion might be duplicate symbols (at least global ones; local symbols are hopefully resolved by the linker inside the same .o file only).
The two modules should neither rererence extern symbols (in order to be self-contained), nor export global symbols. I.e. declare all functions and all variables outside functions as static.
Hopepully the included HAL, etc. header files don't define/reference any non-static global/extern data and functions either -- do you know and/or can you ensure this?
If the one or other symbol must still be exported -- like loader() -- then these symbols must get different names in the two modules, e.g. loader1() and loader2().

Is the 2nd loader instance also copied to 0x20000000 prior to invocation?

peter-h · « **Reply #41 on:** July 17, 2021, 07:07:29 pm »

Hmmm very good points.

Both loaders (unless possible to do relocatable, which nobody has yet reported as possible) will have to start life in the CPU FLASH (obviously). And since neither may be overwritten by any code intentionally flashing the CPU, both will have to live in that 16k block. So the budget is 8k each

The one loaded to run at 0x080FC000 (loader_flash.c) will only ever live at 0x080FC000 (the base of the uppermost 16k block). Its entry point will be 0x080FC000.

The one loaded to run at 0x20000000 (loader_ram.c) will live in the top half of the 16k block i.e. at 0x080FE000. Its entry point will be 0x080FE000. It will be copied to RAM by loader_flash.c.

The two loaders won't be exactly the same because only loader_flash.c will do the copying to RAM.

The whole 16k block will be written during factory config, using SWD.

I will try this next and see whether I get a huge .elf file, or some other problem.

This scheme should address the requirements:

- have a non-brickable* product, which can always restore working CPU software from one stored in serial SPI FLASH
- be able to re-flash the CPU using RAM-executed code
- generate both versions of the loader within the same one project

SiliconWizard · « **Reply #42 on:** July 17, 2021, 07:17:43 pm »

Whatever you do, having a fixed, non-erasable minimal loader somewhere sounds like a good idea. Now a problem with this is always: what do you do if some bug is found in this loader once in the field? (You better test it thoroughly before releasing your product

)

peter-h · « **Reply #43 on:** July 17, 2021, 07:23:13 pm »

I don't think there is any way to fix such a loader (in the field) other than by having a second CPU.

I need a bit more help with the linker script... I am testing just loader_flash (currently called KDE_loader.c) which is compiled for 0x080FC000 and it is supposed to end up in the FLASH at 0x080FC000. But this (the bold bit) is not happening).

My linker script

Code: [Select]

/*


/* Entry Point */
ENTRY(Reset_Handler)

/* Reference loader to ensure that is gets linked-in */
EXTERN(loader)													   

/* Highest address of the main stack */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    	/* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
_Min_Heap_Size  = 0xa000;     /* 40k heap - min size; it can grow to end of main RAM  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)  : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)     : ORIGIN = 0x10000000, LENGTH = 64K
}

/* Define output sections */
SECTIONS
{
/* loader_bss and loader sections must come first, in order to override the wildcards in subsequent sections */
 loader_bss _loader_end : {
  _loader_bss_start = .;
  *KDE_loader.o(.bss*)
  *KDE_loader.o(COMMON)
  . = ALIGN(4);
  _loader_bss_end = .;
 }
 
 KDE_loader 0x080FC000  : AT(_loader_loadaddr) {
  _loader_start = .;
  *KDE_loader.o(.text*)
  *KDE_loader.o(.rodata*)
  *KDE_loader.o(.data*)
  *KDE_loader.o(.ARM.attributes)
  . = ALIGN(4);
  _loader_end = .;
 }
 
  _loader_size = SIZEOF(KDE_loader);

  /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
	*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH


   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH

  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(.fini_array*))
    KEEP (*(SORT(.fini_array.*)))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH

 
  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
 /*   . = ALIGN(4); */
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT >FLASH

 /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

 /* dummy placeholder in flash for loader section, to count flash usage */
 .KDE_loader : {
  . = . + SIZEOF(KDE_loader);
 } AT >FLASH
 _loader_loadaddr = LOADADDR(.KDE_loader);

  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
  
  /* ._user_heap_stack : */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . ); 
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM 
  /*   } >CCMRAM */


  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
  
  /* CCM-RAM section 
  * 
  * IMPORTANT NOTE! 
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section.  
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
    
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}

I reckon I need to somehow add its sections into this

Note that I am not writing code which will program the loader into the FLASH at 0x080FC000. I could do that, but I want this to be done during factory config, using SWD, when the whole 1MB gets written.

I've tried this

Code: [Select]

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    
    *KDE_loader.o(.text*)
  	*KDE_loader.o(.rodata*)
  	*KDE_loader.o(.data*)
  	*KDE_loader.o(.ARM.attributes)
  
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
	*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH

and sure enough the call to loader() in main() does go to the right address but there is nothing there so the debugger didn't program the FLASH:

gf · « **Reply #44 on:** July 17, 2021, 09:45:40 pm »

My understanding is that you can't include the sections of KDE_loader.o twice.
Once an input section, say KDE_loader.o(.text), has ben included in one output section (say KDE_loader), it is considered "consumed" and won't be included in a different output section (say .text) any more.
So I think you need two object files, e.g. KDE_loader_flash.o and KDE_loader_ram.o.

Btw, at which address are data and bss of loader_flash supposed to reside? Also at 0x20000000? Or do they need to coexist simultaneously with the data of the main program (without overlap)?

peter-h · « **Reply #45 on:** July 18, 2021, 05:08:57 am »

"So I think you need two object files, e.g. KDE_loader_flash.o and KDE_loader_ram.o."

Yes.

"at which address are data and bss of loader_flash supposed to reside? Also at 0x20000000? "

loader_flash needs to be self contained at 0x080FC000-0x080FDFFF (lower 8k of uppermost 16k of CPU FLASH). It executes there.

loader_ram needs to be self contained at 0x080FE000-0x080FFFFF (upper 8k of uppermost 16k of CPU FLASH). It executes at 0x20000000.

"Or do they need to coexist simultaneously with the data of the main program (without overlap)?"

The main program is not needed for this scheme to work. If necessary I can avoid any need for initialised data or bss, because at this stage I have loads of stack space.

Well... if you completely trash the bottom of the CPU FLASH, where the vector table is, then loader_flash will never get run, but I can't see any way around that, with just a single CPU. If you had a second CPU then you could boot the main CPU using one of the non-writable loaders (BOOT0=1, I think) and use the 2nd CPU to feed the 1st one with bytes of code via a serial port, SPI, CAN, etc. One can minimise the risk of this situation by never writing (in the field) the 1st 4k block. And of course never writing (in the field) the topmost 16k block.

Currently what I appear to be missing is getting the SWD debugger to write the code into 0x080FC000+

Now that the principle of running code in RAM has been proven, I am going back a step to getting loader_flash (in file KDE_loader_FLASH.c) to be SWD-programmed to 0x080FC000 and do something (flash some LED) there. When that works I will move to the RAM copy of it. My current linker script is this:

Code: [Select]

/* Entry Point */
ENTRY(Reset_Handler)

/* Reference loader to ensure that is gets linked-in */
EXTERN(loader_flash)													   

/* Highest address of the main stack */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    	/* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
_Min_Heap_Size  = 0xa000;     /* 40k heap - min size; it can grow to end of main RAM  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)  : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)     : ORIGIN = 0x10000000, LENGTH = 64K
}

/* Define output sections */
SECTIONS
{
/* loader_flash_bss and loader_flash sections must come first, in order to override the wildcards in subsequent sections */
 loader_flash_bss _loader_flash_end : {
  _loader_flash_bss_start = .;
  *KDE_loader_FLASH.o(.bss*)
  *KDE_loader_FLASH.o(COMMON)
  . = ALIGN(4);
  _loader_flash_bss_end = .;
 }
 
 KDE_loader_FLASH 0x080FC000  : AT(_loader_flash_loadaddr) {
  _loader_flash_start = .;
  *KDE_loader_FLASH.o(.text*)
  *KDE_loader_FLASH.o(.rodata*)
  *KDE_loader_FLASH.o(.data*)
  *KDE_loader_FLASH.o(.ARM.attributes)
  . = ALIGN(4);
  _loader_flash_end = .;
 }
 
  _loader_flash_size = SIZEOF(KDE_loader_FLASH);

  /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
	*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH

   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH

  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(.fini_array*))
    KEEP (*(SORT(.fini_array.*)))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH

 
  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
 /*   . = ALIGN(4); */
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT >FLASH

 /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

 /* dummy placeholder in flash for loader section, to count flash usage */
 .KDE_loader_FLASH : {
  . = . + SIZEOF(KDE_loader_FLASH);
 } AT >FLASH
 _loader_flash_loadaddr = LOADADDR(.KDE_loader_FLASH);

  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
  
  /* ._user_heap_stack : */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . ); 
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM 
  /*   } >CCMRAM */


  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
  
  /* CCM-RAM section 
  * 
  * IMPORTANT NOTE! 
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section.  
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
    
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}

You can see R3 is the right value but those addresses are all 0xFF

It must be something simple. I thought that if you create an "output section" then the debugger will simply pick that up and program the CPU FLASH with it. The .map file suggests the addresses are correct:

Code: [Select]

loader_flash_bss
                0x00000000080fc140        0x0
                0x00000000080fc140                _loader_flash_bss_start = .
 *KDE_loader_FLASH.o(.bss*)
 *KDE_loader_FLASH.o(COMMON)
                0x00000000080fc140                . = ALIGN (0x4)
                0x00000000080fc140                _loader_flash_bss_end = .

KDE_loader_FLASH
                0x00000000080fc000      0x140 load address 0x000000000803b668
                0x00000000080fc000                _loader_flash_start = .
 *KDE_loader_FLASH.o(.text*)
 .text.HAL_GPIO_WritePin
                0x00000000080fc000       0x32 src/KDE_loader_FLASH.o
 *fill*         0x00000000080fc032        0x2 
 .text.KDE_LED_On
                0x00000000080fc034       0x38 src/KDE_loader_FLASH.o
 .text.KDE_LED_Off
                0x00000000080fc06c       0x38 src/KDE_loader_FLASH.o
 .text.hang_around
                0x00000000080fc0a4       0x26 src/KDE_loader_FLASH.o
 .text.loader_flash
                0x00000000080fc0ca       0x1c src/KDE_loader_FLASH.o
                0x00000000080fc0ca                loader_flash
 *KDE_loader_FLASH.o(.rodata*)
 *fill*         0x00000000080fc0e6        0x2 
 .rodata        0x00000000080fc0e8        0xc src/KDE_loader_FLASH.o
                0x00000000080fc0e8                GPIO_PIN_X
 *KDE_loader_FLASH.o(.data*)
 .data          0x00000000080fc0f4       0x18 src/KDE_loader_FLASH.o
                0x00000000080fc0f4                GPIO_PORT_X
 *KDE_loader_FLASH.o(.ARM.attributes)
 .ARM.attributes
                0x00000000080fc10c       0x34 src/KDE_loader_FLASH.o
                0x00000000080fc140                . = ALIGN (0x4)
                0x00000000080fc140                _loader_flash_end = .
                0x0000000000000140                _loader_flash_size = SIZEOF (KDE_loader_FLASH)

gf · « **Reply #46 on:** July 18, 2021, 10:21:55 am »

Try this. Given that that the loaders have fixed addresses now, I tried to organize a bit cleaner.

Code: [Select]

/* Entry Point */
ENTRY(Reset_Handler)

/* Reference loaders to ensure that they get linked-in */
EXTERN(loader_flash)
EXTERN(loader_ram)

/* Highest address of the main stack */
 /*  _estack = 0x20020000;  */    /* stack in 128K RAM */
 _estack = 0x10010000;    	/* stack in  64k CCM - note: configTOTAL_HEAP_SIZE + min_stack_size must not exceed 64k) */
 
/* top of RAM for _sbrk - top of heap check */
 _top = 0x20020000;

/* Heap and stack sizes */
_Min_Heap_Size  = 0xa000;     /* 40k heap - min size; it can grow to end of main RAM  */
_Min_Stack_Size = 0x4000;     /* 16k stack - in CCM */

/* Specify the memory areas */
/* CCMRAM added PH 12/5/2021 - cannot use with DMA */
MEMORY
{
  FLASH (rx)              : ORIGIN = 0x08000000, LENGTH = 1024K - 16K
  RAM (xrw)               : ORIGIN = 0x20000000, LENGTH = 128K
  MEMORY_B1 (rx)          : ORIGIN = 0x60000000, LENGTH = 0K
  CCMRAM (rw)             : ORIGIN = 0x10000000, LENGTH = 64K

  /* flash regions for loaders */
  FLASH_LOADER_FLASH (rx) : ORIGIN = 0x080FC000, LENGTH = 8K
  FLASH_LOADER_RAM (rx)   : ORIGIN = 0x080FE000, LENGTH = 8K

  /* ram regions for loaders, overlap with RAM */
  RAM_LOADER_FLASH (rw)   : ORIGIN = 0x20000000, LENGTH = 128K
  RAM_LOADER_RAM (rw)     : ORIGIN = 0x20000000, LENGTH = 128K
}

/* Define output sections */
SECTIONS
{
  /* loader_flash_bss and loader_flash sections must come first,
     in order to override the wildcards in subsequent sections */

  KDE_loader_FLASH : {
     *KDE_loader_FLASH.o(.text*)
     *KDE_loader_FLASH.o(.rodata*)
     . = ALIGN(4);
  } >FLASH_LOADER_FLASH

  KDE_loader_FLASH_data : {
     _loader_flash_data_start = .;
     *KDE_loader_FLASH.o(.data*)
     . = ALIGN(4);
     _loader_flash_data_end = .;
  } >RAM_LOADER_FLASH AT >FLASH_LOADER_FLASH

  _loader_flash_data_loadaddr = LOADADDR(KDE_loader_FLASH_data);

  KDE_loader_FLASH_bss : {
     _loader_flash_bss_start = .;
     *KDE_loader_FLASH.o(.bss*)
     *KDE_loader_FLASH.o(COMMON)
     . = ALIGN(4);
     _loader_flash_bss_end = .;
  } >RAM_LOADER_FLASH

  /* ============================================================*/

  KDE_loader_RAM : {
     _loader_ram_start = .;
     *KDE_loader_RAM.o(.text*)
     *KDE_loader_RAM.o(.rodata*)
     *KDE_loader_RAM.o(.data*)
     . = ALIGN(4);
     _loader_ram_end = .;
  } >RAM_LOADER_RAM AT >FLASH_LOADER_RAM

  _loader_ram_loadaddr = LOADADDR(KDE_loader_RAM);

  KDE_loader_RAM_bss : {
     _loader_ram_bss_start = .;
     *KDE_loader_RAM.o(.bss*)
     *KDE_loader_RAM.o(COMMON)
     . = ALIGN(4);
     _loader_ram_bss_end = .;
  } >RAM_LOADER_RAM

  /* ============================================================*/

  /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
	*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH

   .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
    .ARM : {
    __exidx_start = .;
      *(.ARM.exidx*)
      __exidx_end = .;
    } >FLASH

  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(.fini_array*))
    KEEP (*(SORT(.fini_array.*)))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH

 
  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
 /*   . = ALIGN(4); */
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT >FLASH

 /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* The heap ends up after BSS in main RAM */
  /* This also checks that the top of the heap doesn't hit the bottom of the stack i.e. how much RAM left */
  /* User_heap_stack section, used to check that there is enough RAM left */
 
  /* ._user_heap_stack : */
  .main_heap :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
 /*   . = . + _Min_Stack_Size; */  /* PH 14/5/2021 stack is in CCM, not here */
    . = ALIGN(8);
   } >RAM
  /*   } >CCMRAM */


  /* MEMORY_bank1 section, code must be located here explicitly            */
  /* Example: extern int foo(void) __attribute__ ((section (".mb1text"))); */
  /* Not used 14/5/2021 - was apparently used for LCD display on ST dev kit */
  .memory_b1_text :
  {
    *(.mb1text)        /* .mb1text sections (code) */
    *(.mb1text*)       /* .mb1text* sections (code)  */
    *(.mb1rodata)      /* read-only data (constants) */
    *(.mb1rodata*)
  } >MEMORY_B1
 
  /* CCM-RAM section
  *
  * IMPORTANT NOTE!
  * If variables placed in this section must be zero initialized,
  * the startup code needs to be modified to initialize this section. 
  * Done PH 12/5/2021
  */
  .ccmram :
  {
    . = ALIGN(4);
    _sccmram = .;       /* create a global symbol at ccmram start */
    *(.ccmram)
    *(.ccmram*)
   
    . = ALIGN(4);
    _eccmram = .;       /* create a global symbol at ccmram end */
  } >CCMRAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}

abyrvalg · « **Reply #47 on:** July 18, 2021, 10:38:57 am »

Quote from: peter-h on July 18, 2021, 05:08:57 am

Well... if you completely trash the bottom of the CPU FLASH, where the vector table is, then loader_flash will never get run

You can create a separate vector table (it is just a const void *[] array) for loader and place loader+VT in the bottom sector to run before the main part (starting at the next free sector) and be independent.

BTW, looks like Cube has a "subproject" concept for building multiple separate bins from a combination of private/shared sources within a single project (used for multicore CPUs normally): https://community.st.com/s/question/0D53W000003xwtZ/is-it-possible-to-have-cubeide-project-with-multiple-main-functions.

harerod · « **Reply #48 on:** July 18, 2021, 11:07:15 am »

Quote from: abyrvalg on July 18, 2021, 10:38:57 am

Quote from: peter-h on Today at 06:08:57
...
BTW, looks like Cube has a "subproject" concept for building multiple separate bins from a combination of private/shared sources within a single project (used for multicore CPUs normally): https://community.st.com/s/question/0D53W000003xwtZ/is-it-possible-to-have-cubeide-project-with-multiple-main-functions.

I prefer to maintain the bootloader and the application within the same project. A complex bootloader may even use some of the same library source (e.g. wear levelling file system, network stack) as the main application.
One could use CubeIDE build options to switch between different builds, for instance: app standalone, bootloader, app with bootloader. Build options only vary in some #defines, linker scripts and source files ("exclude from build").
One can start from one of the standard Release/Debug builds and modify settings as needed.

Siwastaja · « **Reply #49 on:** July 18, 2021, 11:33:37 am »

It's very usual that the bootloader or flasher shares the requirements with the app. Of course, if it works through UART it's no big deal to replicate the UART "driver" which is 10 lines of code. But the whole point of custom developed bootloaders/flashers is they use whatever interfaces the application is using, allowing more convenient FW update compared to working with a dedicated flashing cable (JTAG, SWD or UART). If both the application and the bootloader work through CAN, or even more relevantly, TCP/IP over Ethernet or wireless, then sharing the communication code is obvious, and maintaining the whole shebang in one "project" is likely easier.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: How to create ELF file which contains the normal prog, plus a relocatable block? (Read 12597 times)

Share me