I've got a few RPi Pico boards when they were released. Well, now their time has come. I decided to have a closer look at them and I made my typical bare-metal starter project that I make for every device I work with. The code can be found here
https://github.com/ataradov/mcu-starter-projects/tree/master/rp2040I did not use their header files or SDK, since they are just horrible, IMO. I generated my own header file from SVD file using SVDConv utility from ARM. The result is not perfect by any means, but it is at least usable.
Initial loader is built as a part of the project itself. The project is made so that initial loader transfers the application from the flash into SRAM and entire application is linked in SRAM. Flash remains accessible in XIP mode. It is never used by the compiler generated code, but you can store any data there and read it manually.
The loader uses plain SPI mode to not mess with the QSPI settings in the different flash devices. I have a "clean" version of the QSPI initialization for the flash on RPi Pico if anyone wants that. But I found that QSPI mode does not actually increase performance that much to bother with it for generic code. I tested with the main code running from SRAM and doing CRC32 calculation on the entire 2 MB flash. So the access pattern is linear (in 32-bit words) with minor data processing in-between. QSPI was 1.6 times faster that SPI in that test. Cache did not matter that much here (as it is constantly trashed by long linear access). Cache does matter A LOT if you also run from that flash. The test was ~30 times slower in that case. But this project runs entirely from SRAM, so this is not an issue.
Also note that if you are using the flash in the main application, then you probably need to adjust XIP_SSI->BAUDR, since SPI clock would be 60 MHz with 120 MHz core clock, and 0x03 command only goes up to 50MHz. If you are using flash a lot, you may get better performance by dropping CPU clock to 100 MHz and leaving the divider as is.
The project also includes a tool to generate UF2 file with correct CRC32 for the bootloader section. It can also just update the binary file itself if you have some other way to place files into the flash. There are probably other tools like this out there. I have not really looked.
When I have more time I'll have a look at using both cores and USB, as this is the only not trivial peripheral there.