Electronics > Microcontrollers

Cortex-M3+ GCC unaligned access quirks

(1/14) > >>

Perhaps this could save someone from the same pitfall:

As we know, Cortex-M3 and above can handle unaligned memory accesses. Let's try the following code:

--- Code: ---//compile with -mcpu=cortex-m3 -O2/-O3/-Os
extern void MyFn(int a, int b);

//Disable for now
//#define WITH_PACK

#ifdef WITH_PACK
    #define PACKED __attribute__((packed))
    #define PACKED

typedef struct
    int a, b;
} PACKED MyStruct;

void DoTest(unsigned char *src)
    MyStruct *ptr = (MyStruct *)src;
    MyFn(ptr->a, 1); //OK
    MyFn(4, ptr->b); //OK
    MyFn(ptr->a, ptr->b); //Surprise!

--- End code ---
- everything looks fine, the core should handle int field accesses even if the src pointer is unaligned, but in reality it would UsageFault on the 3rd MyFn() call :)
The reason is GCC combining sequential memory accesses into a single LDRD or LDM instruction and those, unlike single LDR, can't handle unaligned accesses.
The solution is to make MyStruct packed, even if it looks perfectly packed already. This doesn't split all accesses into individual bytes, GCC perfectly understands M3's capabilities and just issues two separate LDR instructions.


Upd: -Wcast-align option detects such cases.
But the solution with packing the struct type feels wrong to me (1. it can cause unwanted packing if there are not so nice field types, 2. it breaks LDR->LDM optimizations everywhere, including places where the alignment is correct by design). The problem is not in a struct type itself, but in a more local bad pointer type. Does anyone know a better solution?

Here's the correct approved answer, in which a modern compiler will automatically use unaligned loads IFF the CPU supports them, and otherwise (e.g. on Cortex M0) will actually call memcpy or use aligned loads and shifts and stuff.


--- Code: ---void DoTest(unsigned char *src)
    MyStruct s;
    memcpy(&s, src, sizeof(s));
    MyFn(s.a, s.b); // No surprise!

--- End code ---

--- Code: ---DoTest:
        mov     r2, r0
        sub     sp, sp, #8
        mov     r3, sp
        ldr     r0, [r0]  @ unaligned
        ldr     r1, [r2, #4]      @ unaligned
        stmia   r3!, {r0, r1}
        add     sp, sp, #8
        b       MyFn

--- End code ---

Note how the compiler explicitly marks that those ldr are unaligned.

I'm not sure why gcc is failing to optimise away the stack frame and stmia. Clang does much better, though the stupid thing should just reorder the loads and avoid the extra mov.


--- Code: ---DoTest:
        ldr     r2, [r0]
        ldr     r1, [r0, #4]
        mov     r0, r2
        b       MyFn

--- End code ---

Has godbolt.org been REALLY sluggish for anyone else recently?  Do we need to donate to Matt for more servers?


--- Quote from: abyrvalg on September 22, 2023, 11:24:40 am ---Does anyone know a better solution?

--- End quote ---
Yes. Never, ever casts structs onto pointers which are of a different type (especially void or byte sized pointers). That is bad coding practise and begging for a world of pain on your bare knees. You have a compiler that knows how to align variables types while adhering to memory access abilities of the platform. Don't throw that out of the window.


--- Quote from: nctnico on September 22, 2023, 07:57:45 pm ---Never, ever casts structs onto pointers which are of a different type (especially void or byte sized pointers). That is bad coding practise and begging for a world of pain on your bare knees.

--- End quote ---

How else would you do "type erasure", for instance when you call qsort() in order to sort an array of structs?
Or how can you memcpy() a struct w/o casting the struct pointer to void* ?


--- Quote from: brucehoult on September 22, 2023, 07:41:51 pm ---Has godbolt.org been REALLY sluggish for anyone else recently?

--- End quote ---

Yeah. Some days it's very slow. Right now for me, it is relatively snappy.
As a quick note, I think the fact it triggers a recompile at each text change in the editor is a very bad idea. There should be a "compile" button instead, to avoid unnecessarily using sever resources.


[0] Message Index

[#] Next page

There was an error while thanking
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod