Disassemble that crap (objdump -d xxx.o) and see what's generated. ARM assembly can't be rocket science, I suppose. There has to be some way of loading/storing 8 or 16 bits. OTOH, if you load/store 32bits unaligned, some ARMs supposedly may end up barrel shifting the word. Maybe wrong target core is specified?
Oh, I did that. I also generate assembly from GCC (with -S option), as it produces more readable code with symbols. I use objdump on the final linked binary, as it allows to pinpoint which instruction is at which address. All I got from the exception handler was a few registers, FAR giving you the faulty address for mem access, and ELR the "link" address (which points you the address at which the exception occured).
I finally figured out that the culprit was this instruction: "ldr d0, [sp, 52]". Guess what?. It points to an address that is 4-byte aligned (which is fine, since I was accessing a 4-byte aligned, 32-bit field of a struct): sp (16-byte aligned) + 52 => a 4-byte aligned address.
It was not a stack alignment issue per se, just happened to be on the stack (so at first that misled me, because a pure alignment issue from my code didn't seem to be possible otherwise.)
Problem is that GCC emits this ldr d0, ... instruction. Which is a 64-bit transfer. Why so? Well. because it wants to optimize accesses, and I was actually accessing two successive 32-bit struct members, so it decided to access both in one shot. Problem is that this creates a non-aligned access. Is that great or what?
If you dig a little deeper, you figure out that GCC allows itself to do this, because AARCH64 CPUs allow non-aligned accesses. So GCC optimizes away, and goes on. Now why am I getting an exception if this is allowed by the CPU? Because, it is, but not always. You have to enable the MMU, from what I gathered, to allow unaligned accesses. Otherwise, you get an exception. And at the stage my code runs, the MMU is not yet enabled...
But you know what's worst? After all, it's not trivial indeed, but you could still say it's just a case of RTFM. Which would be fair enough. Except that... I enabled the "-mstrict-align" option, which is supposed to PREVENT GCC from assuming it can generate unaligned accesses. But guess what, it doesn't do squat here, and it still generates this 64-bit access. I think this looks like a nice bug.
Hope this can be helpful for others. If you ever work on ARM CPUs in 64-bit mode, and at a very low level. This one took me a while, all the more that I'm not ultra familiar with AARCH64 assembly, so this 64-bit access was not immediately obvious to spot without reading the IS manual...
The only fix for now is to cut down the opt. level to -O1 max, and the code runs with no exception. No fucked-up unaligned access is generated.
The above instruction is duely replaced with "ldr w0, [sp, 52]" and later "ldr w0, [sp, 56]", which is what I expected.
For the record, d0 is a 64-bit register, w0 a 32-bit one.
Starting with -O2, it's fucked up and the "-mstrict-align" option appears to be ignored.
First time I run into a GCC bug in a long time here. It had served me very well until now. Maybe the official ARM version (AARCH64-ELF 8.3) is borked? I'll have to try with a custom-built one off a newer GCC version (9.1/9.2)...
Thanks for your attention.