Author Topic: ARM Unaligned data... (Read 5646 times)

nctnico · « **Reply #25 on:** August 18, 2019, 05:00:40 pm »

Quote from: andersm on August 18, 2019, 04:28:20 pm

Quote from: mfro on August 18, 2019, 04:01:54 pm
Not only I am unable to find any mention of such usage in the gcc documentation (but My understanding is, that this is still UB.
It is an extension to the standard. A compiler can extend the standard in any way it sees fit.

That is one of the reasons why it is bad. Also the behaviour isn't guaranteed by GCC so mfro is right; you shouldn't count on how the compiler deals with the situation.

In general casting one type into another is asking for trouble down the road and should be avoided in code used for production software.

mfro · « **Reply #26 on:** August 18, 2019, 05:25:30 pm »

Quote from: andersm on August 18, 2019, 04:28:20 pm

Quote from: mfro on August 18, 2019, 04:01:54 pm
Not only I am unable to find any mention of such usage in the gcc documentation (but will be happy if you could point me to such), but I also can't see that this usage would suddenly make upcasting a char pointer (that has no proper alignment for uint32_t) to an uint32_t pointer legal.
I am not accessing the array via an uint32_t pointer, I am accessing it through an unaligned_uint32_t pointer. The unaligned_uint32_t type has a minimum alignment requirement of 1 byte. The documentation can be found at https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Common-Type-Attributes.html#index-aligned-type-attribute

Quote
My understanding is, that this is still UB.
It is an extension to the standard. A compiler can extend the standard in any way it sees fit.

That's the document I was reading.

To me, it just says the compiler would ensure that a variable has the specified minimum alignment (to, e.g. make sure a char is aligned to the requirements of an int), but it doesn't say it would allow lowering the alignment requirements of a type and generate (special) code to enable dereferencing such a variable through a pointer to that type.

Did some tests with an m68k toolchain and gcc always generated the exact same (working) code no matter if I accessed the variable through an uint32_t pointer or an unaligned_uint32_t pointer.

Nominal Animal · « **Reply #27 on:** August 18, 2019, 06:12:02 pm »

Quote from: westfw on August 17, 2019, 06:34:36 am

Is there a well-known trick for reading unaligned data on the ARM chips that don't support it in hardware?

No, there isn't. (If such a trick exists, it is as obscure as secret undocumented instructions, and probably involves those.)

The load-shift-add method takes 8 or 10 instructions to read a 32-byte value: four byte loads, three shifts, and three adds. Anything cleverer, like reading the two aligned 32-bit words covering the desired value, leads to longer machine code. The simple method is just too simple to beat with anything more complex.

Personally, I do prefer explicit casting and notation to ensure both us humans and future versions of GCC parse the intent correctly, i.e.

Code: [Select]

#include <stdint.h>

/* Read 32-bit unsigned integer from a possibly unaligned pointer.
    get_u32_native(ptr):  Native byte order.
    get_u32_swapped(ptr): Swapped byte order.
    get_u32_le(ptr):      Little-endian byte order (least significant byte first).
    get_u32_be(ptr):      Big-endian byte order (most significant byte first).
*/

static inline uint32_t get_u32_le(const void *const ptr)
{
    const uint8_t *const byte = ptr;
    return  (uint32_t)byte[0]
         + ((uint32_t)byte[1] << 8)
         + ((uint32_t)byte[2] << 16)
         + ((uint32_t)byte[3] << 24);
}

static inline uint32_t get_u32_be(const void *const ptr)
{
    const uint8_t *const byte = ptr;
    return  (uint32_t)byte[3]
         + ((uint32_t)byte[2] << 8)
         + ((uint32_t)byte[1] << 16)
         + ((uint32_t)byte[0] << 24);
}

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define  get_u32_native   get_u32_le
#define  get_u32_swapped  get_u32_be
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
#define  get_u32_native   get_u32_be
#define  get_u32_swapped  get_u32_le
#else
#error Unsupported byte order.
#endif

These compile to 20 bytes on Cortex-M0, M0+, and M4, when using -O2 or -Os (and I always do, and recommend you do too).

The other way, say

Code: [Select]

#include <stdint.h>
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__

static inline uint32_t get_u32_le(const void *const ptr)
{
    const uintptr_t  address = (uintptr_t)ptr;
    const uint32_t  *base = (const uint32_t *)(address & (~(uintptr_t)3));
    const uint32_t   word[2] = { base[0], base[1] };
    const unsigned char  shift = (address & 3) << 3;
    return (word[0] >> shift) + (word[1] << (32 - shift));
}

#else
#error Not implemented
#endif

compiles to nine instructions (26 bytes) on Cortex-M4, 13 instructions (also 26 bytes) on Cortex-M0 and M0+; compared to 20 bytes for each of the earlier two functions. Furthermore, it accesses the following 32-bit integer if ptr happens to be aligned, and that can be quite problematic in some cases.

If you look at the generated code, you'll see that there are just too many individual operations needed to be done to achieve this, to get under the 10 instructions and 20 bytes of code limit, even with trickery.

The only situation when I find myself needing to access possibly unaligned pointers, is when processing binary data (or data stream) with a specific protocol. In those cases, I've always found the overhead of using such accessors warranted.

(When dealing with massive amounts of binary data, I've crafted my own protocol which ensures data alignment. I have done this, BTW, for reading and writing molecular dynamic data from both Fortran 95 and C code (although the Fortran code requires one nonstandard feature, reading raw binary data without record length words, but all Fortran 90/95 compilers I had access to, did support that). This was to allow a distributed simulation to save oodles of data to node-local storage without slowing down the simulation; data combining and slicing was done afterwards using a helper utility, when collecting the resulting data.)

andersm · « **Reply #28 on:** August 18, 2019, 06:39:06 pm »

Quote from: nctnico on August 18, 2019, 05:00:40 pm

Also the behaviour isn't guaranteed by GCC so mfro is right; you shouldn't count on how the compiler deals with the situation.

It's guaranteed as much as any of its other documented behavior.

Quote from: mfro on August 18, 2019, 05:25:30 pm

To me, it just says the compiler would ensure that a variable has the specified minimum alignment (to, e.g. make sure a char is aligned to the requirements of an int), but it doesn't say it would allow lowering the alignment requirements of a type and generate (special) code to enable dereferencing such a variable through a pointer to that type.

Don't be daft. There would be absolutely no point in having the compiler generate data layouts it can't access. It's no different than using the packed attribute on structs.

Quote

Did some tests with an m68k toolchain and gcc always generated the exact same (working) code no matter if I accessed the variable through an uint32_t pointer or an unaligned_uint32_t pointer.

How did you test? This is with GCC 9.1:

Code: [Select]

typedef uint32_t uuint32_t __attribute__((aligned(1)));

extern char buffer[];

uint32_t getuuint32(uint8_t p[]) {
    return *(uuint32_t*)p;
}

uint32_t getuint32(uint8_t p[]) {
    return *(uint32_t*)p;
}

00000090 <getuuint32>:
  90:	206f 0004      	moveal %sp@(4),%a0
  94:	7200           	moveq #0,%d1
  96:	1210           	moveb %a0@,%d1
  98:	e149           	lslw #8,%d1
  9a:	4841           	swap %d1
  9c:	4241           	clrw %d1
  9e:	7000           	moveq #0,%d0
  a0:	1028 0001      	moveb %a0@(1),%d0
  a4:	4840           	swap %d0
  a6:	4240           	clrw %d0
  a8:	8280           	orl %d0,%d1
  aa:	7000           	moveq #0,%d0
  ac:	1028 0002      	moveb %a0@(2),%d0
  b0:	e188           	lsll #8,%d0
  b2:	8081           	orl %d1,%d0
  b4:	8028 0003      	orb %a0@(3),%d0
  b8:	4e75           	rts

000000ba <getuint32>:
  ba:	206f 0004      	moveal %sp@(4),%a0
  be:	2010           	movel %a0@,%d0
  c0:	4e75           	rts

nctnico · « **Reply #29 on:** August 18, 2019, 08:15:38 pm »

Quote from: andersm on August 18, 2019, 06:39:06 pm

Quote from: nctnico on August 18, 2019, 05:00:40 pm
Also the behaviour isn't guaranteed by GCC so mfro is right; you shouldn't count on how the compiler deals with the situation.
It's guaranteed as much as any of its other documented behavior.

Then explain why I have seen it go wrong using the code presented in this thread.

andersm · « **Reply #30 on:** August 18, 2019, 08:41:22 pm »

Quote from: nctnico on August 18, 2019, 08:15:38 pm

Then explain why I have seen it go wrong using the code presented in this thread.

Where?

nctnico · « **Reply #31 on:** August 18, 2019, 08:50:58 pm »

Quote from: andersm on August 18, 2019, 08:41:22 pm

Quote from: nctnico on August 18, 2019, 08:15:38 pm
Then explain why I have seen it go wrong using the code presented in this thread.
Where?

On an ARM cpu using GCC.

andersm · « **Reply #32 on:** August 18, 2019, 09:42:15 pm »

I was asking for an example.

mfro · « **Reply #33 on:** August 19, 2019, 02:27:26 am »

Quote from: andersm on August 18, 2019, 06:39:06 pm

Don't be daft. There would be absolutely no point in having the compiler generate data layouts it can't access. It's no different than using the packed attribute on structs.

The documentation - at least according to my interpretation - only talks about placement of variables (so my interpretation is that it was originally meant to increase alignment), not about generating specific access code when you lower alignment (moreover, you obviously can't change placement through a pointer) whereas it explicitly talks about special code for struct field access for the packed attribute.

Quote from: andersm on August 18, 2019, 06:39:06 pm

Quote
Did some tests with an m68k toolchain and gcc always generated the exact same (working) code no matter if I accessed the variable through an uint32_t pointer or an unaligned_uint32_t pointer.
How did you test? This is with GCC 9.1:
...

I basically did the same with GCC 4.6.4 (which is - admitted - quite old, but the documentation for the aligned() attribute did not change since as far as I can tell). The generated code is exactly the same in both cases.

The fact that your (newer) compiler appears to generate different code depending on the (apparently reduced) alignment of the type indeed supports your claim, but frankly, I can't see this documented. As long as it isn't explicitly documented, your code still appears dubious to me as it is violating strict aliasing rules. It might be that what you see is intended behaviour, but then the docs should be more explicit. As long as they aren't (it's still possible that I've overseen something, however, and would be glad if you could prove me wrong as I would agree this would be an elegant solution if it really was legal), it might as well be a side effect and pure coincidence that it works for you (potentially subject to change with next release).

Does the code change in any way when you use -O3 as opposed to -O2?

[P.S.: nvm, just tried with gcc 6.2 and that one indeed behaves like yours (regardless of optimisation level). That does not invalidate anything I said above, however]

legacy · « **Reply #34 on:** August 19, 2019, 05:50:39 am »

GCC-addicted code is no good.

mfro · « **Reply #35 on:** August 19, 2019, 03:39:02 pm »

Today I asked on the gcc mailing list whether your approach is a valid use case for __atribute:__((aligned()))

Guess what, the first few opinions that flew in are even farther apart than ours

magic · « **Reply #36 on:** August 20, 2019, 06:48:07 am »

LOL. I guess it's some "users" list filled with armchair language lawyers who participated neither in writing the compiler nor its specification

Strictly speaking, casting from char* to __attribute(aligned(1))__ uint32_t* does not increase alignment requirements of the pointer, unlike casting to plain uint32_t*

andersm · « **Reply #37 on:** August 20, 2019, 11:27:28 pm »

Quote from: mfro on August 19, 2019, 03:39:02 pm

Today I asked on the gcc mailing list whether your approach is a valid use case for __atribute:__((aligned()))

I saw your post. The documentation was changed to match the behavior (ie. allowing decreasing alignment via typedefs) in 2018, though the ticket was created in 2016 and mentions the behavior going back to at least GCC 4.5. GCC's testsuite has tested the functionality since at least 2012. Older versions of GCC did indeed explicitly state that the attribute could only be used to increase alignment, but that has not been true for some time.

In any case, GCC is smart enough that the code produced using the attribute is almost identical to the manual load-and-shift version, so it's really just syntactic sugar.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: ARM Unaligned data... (Read 5646 times)

nctnico

Re: ARM Unaligned data...

mfro

Re: ARM Unaligned data...

Nominal Animal

Re: ARM Unaligned data...

andersm

Re: ARM Unaligned data...

nctnico

Re: ARM Unaligned data...

andersm

Re: ARM Unaligned data...

nctnico

Re: ARM Unaligned data...

andersm

Re: ARM Unaligned data...

mfro

Re: ARM Unaligned data...

legacy

Re: ARM Unaligned data...

mfro

Re: ARM Unaligned data...

magic

Re: ARM Unaligned data...

andersm

Re: ARM Unaligned data...

Share me