I have a hardfault wrapper copying the registers before going to the real hardfault function (Made for CM0) which I got from
here:
LDR R1, =HardFault_Handler_cp
It always worked in all conditions, until I tried enabling flto optimization (Got additional 10KB flash in CM3), then I got;
"Error: invalid offset, value too big (0x00003B4C)".
I understand the literal is too large (Largest literal for LDR in CM0 is 1020), how could I split this instruction into several ones?
Like: Load first byte, load second byte, or whatever? My knowledge in ARM assembly is close to zero.
I checked the
CM0 LDR instruction docs.
I tried searching how to make a const-struc expression, perhabs I could find a way to load it in steps, but found nothing.
STR R2, [R0,#const-struc] ; const-struc is an expression evaluating to a constant in the range 0-1020.
Also it seems strange to me that it works without flto, it's loading the literal for HardFault_Handler_cp address after all?
Thanks. I understand it's due the "Literal pool" thing:
If the next literal pool is out of range, the assembler generates an error message.
In this case you must use the LTORG directive to place an additional literal pool in the code.
Place the LTORG directive after the failed LDR pseudo-instruction, and within ±4KB (ARM, 32-bit Thumb-2) or in the range 0 to +1KB (pre-Thumb-2 Thumb, 16-bit Thumb-2).
See LTORG for a detailed description.
Edit: It was ".LTORG", not "LTORG":
__attribute__((used)) __attribute__((naked)) void HardFault_Handler_(void){
asm(
"MOV R1, LR \n"
"LDR R0, =HardFault_Handler_cp \n"
"MOV LR, R0 \n"
"MOVS R0, #4 \n" // Determine correct stack
"TST R0, R1 \n"
"MRS R0, MSP \n" // Read MSP (Main)
"BEQ .+6 \n" // BEQ 2, MRS R0,PSP 4
"MRS R0, PSP \n" // Read PSP (Process)
"MOV R1, R4 \n" // Registers R4-R6, as parameters 2-4 of the function called
"MOV R2, R5 \n"
"MOV R3, R6 \n" // sourcer32@gmail.com
"BX LR \n"
".LTORG \n" // Place literal pool outside of code execution
);
}
Your constant gets placed right where .ltorg is and in current form you are executing those 4 bytes as code (CPU doesn’t treat them as a part of LDR instruction and doesn’t skip them). Move it out of execution flow (i.e. behind bx lr).
Thanks!
As I say this is not my job nor my studies (At least for now), just DIY/ hobby, so lots to learn yet.
After more RTFM I found exactly what you said:
You must place literal pools where the processor does not attempt to execute them as instructions.
Place them after unconditional branch instructions, or after the return instruction at the end of a subroutine.
Edit: It was ".LTORG", not "LTORG":
It's "LTORG" in the armasm syntax, but ".ltorg" in the gas one. Obviously you are using the 2nd one. In armasm directives do not begin with a dot, whereas in gas all start with it. Not all armasm directives are ported to gas by just prepending a dot, though.
I have no idea what gas is, I don't do arm asm
Your toolchain is GCC, so your asm is “gas”
With Keil MDK you’d need the “armasm” syntax (ARMCC/ARMClang toolchains).
Looks like there is a typo in your code: the LDR in question has R1 destination, but the next instruction moves some uninitialized R0 to LR (where you jump finally).
Damm, yeah it was R0, no idea how it became R1 during the process!
Your toolchain is GCC, so your asm is “gas” With Keil MDK you’d need the “armasm” syntax (ARMCC/ARMClang toolchains).
In Keil you can also select the asm syntax to use:
No need to use R0, you can load directly to LR: LDR LR, =label
LR is just R14.
Note that if label refers to a function, the linker will automatically set the low bit in the literal if you compile with -mthumb.
Yeah it did normally work without adding the literal pool, but failed when enabling -flto optimization.
Of course, stm32 is compiled with -mthumb!
All Cortex-M support only Thumb mode.
You have to go back to ARM7TDMI (or ARM9, maybe there's something else newer pre -M) or up to an A-series to get non-Thumb mode.
BTW your suggestion didn't work. Maybe it would for CM3, but it seems CM0 can only perform LDR operations on R0-R7 registers.
Sure. Thumb1 / ARMv6-M support using r8-r15 only with ADD, CMP, MOV and BX. Plus the special (implied) operations that use SP, LR, PC for their special purpose.
did normally work without adding the literal pool, but failed when enabling -flto optimization.
Don't know why
-flto essentially causes the compiler to inline a lot of code that wouldn't normally be inlined, resulting in (for example) a very large
main() function with all your nice modular function special-cased and inlined, with occasional calls to library functions that didn't come under the
-flto umbrella.
So it's not surprising that a function would get too large to access a literal pool in its normal position "after" the function body.
Still, why does it miss placing a literal pool? Shouldn't handle this automatically?
Still, why does it miss placing a literal pool? Shouldn't handle this automatically?
As near as I can tell, and would love to be corrected by someone who actually knows, the compiler handles making sure the literal pools it needs are there, with or without LTO. If the function grows too large for a single literal pool it will create more . But if you write an assembly function and don't provide a literal pool it will happily use one of the ones provided by the compiler assuming it's in range. But if the pool you are squatting on moves it won't try to fix it because that was your responsibility.