Hi,
I would very much appreciate an advice regarding building position-independent applications with gcc.
BackgroundA device has a bootloader and two slots for applications, say A and B. The bootloader starts, chooses which app to run, and jumps into it.
An application should have position-independent code, it is not known in advance which slot it will occupy -- the firmware update process can place it in either A or B. Data should have absolute addresses. So, the whole thing should be read-only position-independent, ROPI.
Slot size is 400 kB. (For the record, current app size is about 230 kB)
Compiler is arm-none-eabi-gcc v9.3.1, the app runs in FreeRTOS. The project contains about 100 source files, some of them are vendor's HAL and BSP, but the majority are ours.
The build- CFLAGS += -fPIC -mno-pic-data-is-text-relative -msingle-pic-base -mpic-register=r9
- LFLAGS += -fPIC
- rebuild Newlib from source with CFLAGS
- fix FreeRTOS to not trash R9 register
- build the app for slot A
- in start-up code:
- copy GOT from flash to RAM and relocate code addresses (don't touch data addresses)
- setup R9 to point to GOT
- copy vector table from flash to RAM, relocate addresses, setup VTOR
- copy .data section to RAM, clean .bss section
- call main()
Place the app into slot A and enjoy.
The problemThe app fails when placed into slot B. The source of the problem is data structures initialised with function addresses like this:
extern int bspR1BoardGetMcuUid(void);
extern int bspR1BoardGetBoardType(void);
extern int bspR1BoardGetSerialNumber(void);
halBoard_t halBoard = {
.bspGetMcuUid = bspR1BoardGetMcuUid,
.bspGetBoardType = bspR1BoardGetBoardType,
.bspGetSerialNumber = bspR1BoardGetSerialNumber,
};
driverBoard_t driverBoard = {
.hal = &halBoard,
};
driverBoardSetup(&driverBoard);
halBoard is initialised as follows:
halBoard_t halBoard = {
0000C6A6 LDR.W R3, =0x000000D4
0000C6AA LDR.W R3, [R9, R3]
0000C6AE LDM.W R3, {R0-R2} ; <--- R3=10001E00, points to .data section
0000C6B2 ADD.W R3, SP, #0x1F20
0000C6B6 ADDS R3, #8
0000C6B8 STM.W R3, {R0-R2}
driverBoard_t driverBoard = {
........
driverBoardSetup(&driverBoard);
0000C6D4 MOV R0, R2
0000C6D6 BL driverBoardSetup
........
_sdata
10001E00 DC32 0x000126B1 ; bspR1BoardGetMcuUid
10001E04 DC32 0x0003AE99 ; bspR1BoardGetBoardType
10001E08 DC32 0x0003AEA5 ; bspR1BoardGetSerialNumber
10001E0C DC32 0x00000000
10001E10 DC32 0x10001E28 ; networkBands.15358
10001E14 DC32 0x00000003
........
So,
absolute addresses are placed into .data section (first three words) and then used to initialise the structure.
The pages suggested by google include:
https://community.arm.com/support-forums/f/compilers-and-libraries-forum/44805/cannot-build-position-independent-code-that-works -- Yes, but the resulting shared object is bloated 3 times, from original 230kB to 650kB. It will not fit the slot.
https://mcuoneclipse.com/2021/06/05/position-independent-code-with-gcc-for-arm-cortex-m -- Yes, but the manual creation of PLT or other Linuxy stuff is out of question.
And lots of others, without a clear answer.
The questionHas anyone successfully built a ROPI application:
- with gcc and relevant command-line options (which?),
- without manual veneers or other crutches,
- with a reasonable size?
How?
gcc is not that strict requirement. If it's a wrong tool, would Clang do the job?
Thank you for hints, pointers, etc.