What is the "boot loader" actually doing? Is it the complete product i.e. one is trying to run everything from RAM (like one used to do in the Z80 days, for development) or is it just a bit of code which reads data (from a UART, SPI FLASH, etc) and write the data to the CPU FLASH? And what does this loader do when it is finished?
If the latter, one is not looking at much code. This is how I do it
#ifdef ITM_DEBUG
B_debug_puts("Entering Loader\n");
#endif
// This should not be needed
SCB->VTOR = FLASH_BASE;
// === At this point, interrupts and DMA must still be disabled ====
// Execute loader. Reboots afterwards.
// Parameters for loader are in the SSA.
extern char _loader_ram_start;
extern char _loader_ram_end;
extern char _loader_flash_start;
// Copy loader code and its init data to RAM. B_memcpy is a local memcpy() because we don't have access to stdlib at this point!!
B_memcpy(&_loader_ram_start, &_loader_flash_start, &_loader_ram_end - &_loader_ram_start);
// Set SP to top of CCM. This is not where the general stack is normally but it doesn't matter because
// the loader always reboots at the end. This assignment can't be done inside loader because it trashes
// the stack frame and any local variables which are allocated *at* the call.
// NOTE local variables cannot be used after the SP load.
asm volatile ("ldr sp, = 0x10010000 \n");
// See comments in loader.c for why the long call.
extern void loader_entry() __attribute__((long_call));
loader_entry();
// never get here (loader always reboots)
for (;;);
You have to be careful with stuff like memcpy because the compiler will try to detect code which looks like memcpy and will substitute it with a call to stdlib mempy which may not work, etc, etc, all dependent subtly on optimisation levels, etc, etc.
So...
// Local version
__attribute__((optimize("O0"))) // prevent replacement with memcpy()
static void B_memcpy (void *dest, const void *src, size_t len)
{
char *d = dest;
const char *s = src;
while (len--)
*d++ = *s++;
}
// Local version
__attribute__((optimize("O0"))) // prevent replacement with memset()
static void B_memset(void *s, uint8_t c, uint32_t len)
{
uint8_t * p=s;
while(len--)
{
*p++ = c;
}
}
and my loader ends with
static inline void L_reboot(void)
{
// Ensure all outstanding memory accesses including buffered write are completed before reset
__ASM volatile ("dsb 0xF":::"memory");
// Keep priority group unchanged
SCB->AIRCR = (uint32_t)((0x5FAUL << SCB_AIRCR_VECTKEY_Pos) |
(SCB->AIRCR & SCB_AIRCR_PRIGROUP_Msk) |
SCB_AIRCR_SYSRESETREQ_Msk );
__ASM volatile ("dsb 0xF":::"memory");
// wait until reset
for(;;)
{
__NOP();
}
}
I don't do anything with MSP. The CPU starts with a reset (obviously), then does some stuff, then copies the loader from flash to ram (above) then jumps to it, the loader does its stuff, and does a software reset.