"Why are you doing this? "
Firstly, in my general business I often have to revisit old projects. Right now I am doing an update of a job last done c. 1997 (analog only, but quite tricky). This is why I am using Protel PCB 2.8 (1995)
So I am very careful to do stuff in a way which makes it as easy as possible to do this. This is something 99% of designers don't need to worry about because they will move on every few years, or more often. But this is my business and I have to look after it, because in the end there is only me. And you know how hard it is to get into old software. Most people who are
paid to do that really hate it. And there are multiple reasons for doing just
one project e.g. archiving, documentation, etc. Project archiving is a particular problem which I have struggled with many times...
"the usual solution is rather to have a separate bootloader in a dedicated region of flash, which is not overwritten when a new image is flashed, so that it keeps functional when the flash update is aborted somewhere in the middle."
This is how it will be done. The loader will be written into the top 16k of the 1MB FLASH (it will be originally written there during factory programming, when the whole 1MB will be written, using SWD) and it (well, the RAM resident copy of it) will never write into this top 16k. But there is a little problem: on the 32F417 the top 128k has to be erased as a single block, so there will be a window of opportunity for bricking the product. Very unlikely, because the flashing will start at the bottom and by the time you get to the top 128k you have had error-free writes to all the lower blocks. Also the erasure of the 128k block will necessitate the top 16k of it to be temporarily saved in RAM and then immediately written back. The way to avoid this "brick window" is to have the loader somewhere other than the top 128k, but that causes other issues.
"Your RAM code has nothing common with the main code"
That isn't actually the case. There is a huge amount of common data e.g. the huge .h files full of port addresses, pin names, etc. These come from the ST libraries. This stuff can be #included in both the main code (many .c files) and in the loader.c file.
"If you need to pass any data between the parts declare a struct placed in a separate noinit section "
There will be some "data passing" involved because e.g. the loader will be executed at every power-up but it may need to perform different actions. The plan is to get main() to shove some data into a serial (SPI) flash which this product also has, and the loader can pick it up. Or one could store data in the 32F4's RTC data storage area; that is less good because it will be lost if the RTC backup battery (a supercap) is not charged, or not even fitted, and there is a power-down involved. The data passing has to survive a power-down, for reasons not easy to explain, but the amount of data passed to the loader can be done in a single byte.
I actually have to do something else. I need a copy of that loader to be compiled to execute at the top 16k of the FLASH, and arranged so that the SWD writes it there. I realise, from reading various things around that this may generate a huge .elf where most of the 1MB is 0x00 or some such, but that's ok because it still takes only seconds to write it. And that loader will be what gets copied to RAM. Alternatively I could have the loader anywhere in FLASH and a bit of code which writes it to FLASH if not already there, but writing it there using SWD is the cleanest way. The entry point of the FLASH based loader will obviously need to be at the start of the 16k block, so the loader.c file will need to start with a function which just contains a jump/call to the real loader which does the work.
So I need two copies of the loader, one compiled/linked to execute at "1MB minus 16k" and the other at 0x20000000. Both will be actually run from those addresses. Relocatable code would be a neat solution...