Author Topic: ARM Unaligned data...  (Read 1719 times)

0 Members and 1 Guest are viewing this topic.

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 19839
  • Country: nl
    • NCT Developments
Re: ARM Unaligned data...
« Reply #25 on: August 18, 2019, 05:00:40 pm »
Not only I am unable to find any mention of such usage in the gcc documentation (but My understanding is, that this is still UB.
It is an extension to the standard. A compiler can extend the standard in any way it sees fit.
That is one of the reasons why it is bad. Also the behaviour isn't guaranteed by GCC so mfro is right; you shouldn't count on how the compiler deals with the situation.

In general casting one type into another is asking for trouble down the road and should be avoided in code used for production software.
« Last Edit: August 18, 2019, 05:05:23 pm by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline mfro

  • Regular Contributor
  • *
  • Posts: 71
  • Country: de
Re: ARM Unaligned data...
« Reply #26 on: August 18, 2019, 05:25:30 pm »
Not only I am unable to find any mention of such usage in the gcc documentation (but will be happy if you could point me to such), but I also can't see that this usage would suddenly make upcasting a char pointer (that has no proper alignment for uint32_t) to an uint32_t pointer legal.
I am not accessing the array via an uint32_t pointer, I am accessing it through an unaligned_uint32_t pointer. The unaligned_uint32_t type has a minimum alignment requirement of 1 byte. The documentation can be found at https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Common-Type-Attributes.html#index-aligned-type-attribute

Quote
My understanding is, that this is still UB.
It is an extension to the standard. A compiler can extend the standard in any way it sees fit.

That's the document I was reading.

To me, it just says the compiler would ensure that a variable has the specified minimum alignment (to, e.g. make sure a char is aligned to the requirements of an int), but it doesn't say it would allow lowering the alignment requirements of a type and generate (special) code to enable dereferencing such a variable through a pointer to that type.

Did some tests with an m68k toolchain and gcc always generated the exact same (working) code no matter if I accessed the variable through an uint32_t pointer or an unaligned_uint32_t pointer.

« Last Edit: August 18, 2019, 05:28:33 pm by mfro »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 1775
  • Country: fi
    • My home page and email address
Re: ARM Unaligned data...
« Reply #27 on: August 18, 2019, 06:12:02 pm »
Is there a well-known trick for reading unaligned data on the ARM chips that don't support it in hardware?
No, there isn't.  (If such a trick exists, it is as obscure as secret undocumented instructions, and probably involves those.)

The load-shift-add method takes 8 or 10 instructions to read a 32-byte value: four byte loads, three shifts, and three adds.  Anything cleverer, like reading the two aligned 32-bit words covering the desired value, leads to longer machine code.  The simple method is just too simple to beat with anything more complex.

Personally, I do prefer explicit casting and notation to ensure both us humans and future versions of GCC parse the intent correctly, i.e.
Code: [Select]
#include <stdint.h>

/* Read 32-bit unsigned integer from a possibly unaligned pointer.
    get_u32_native(ptr):  Native byte order.
    get_u32_swapped(ptr): Swapped byte order.
    get_u32_le(ptr):      Little-endian byte order (least significant byte first).
    get_u32_be(ptr):      Big-endian byte order (most significant byte first).
*/

static inline uint32_t get_u32_le(const void *const ptr)
{
    const uint8_t *const byte = ptr;
    return  (uint32_t)byte[0]
         + ((uint32_t)byte[1] << 8)
         + ((uint32_t)byte[2] << 16)
         + ((uint32_t)byte[3] << 24);
}

static inline uint32_t get_u32_be(const void *const ptr)
{
    const uint8_t *const byte = ptr;
    return  (uint32_t)byte[3]
         + ((uint32_t)byte[2] << 8)
         + ((uint32_t)byte[1] << 16)
         + ((uint32_t)byte[0] << 24);
}

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define  get_u32_native   get_u32_le
#define  get_u32_swapped  get_u32_be
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
#define  get_u32_native   get_u32_be
#define  get_u32_swapped  get_u32_le
#else
#error Unsupported byte order.
#endif
These compile to 20 bytes on Cortex-M0, M0+, and M4, when using -O2 or -Os (and I always do, and recommend you do too).

The other way, say
Code: [Select]
#include <stdint.h>
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__

static inline uint32_t get_u32_le(const void *const ptr)
{
    const uintptr_t  address = (uintptr_t)ptr;
    const uint32_t  *base = (const uint32_t *)(address & (~(uintptr_t)3));
    const uint32_t   word[2] = { base[0], base[1] };
    const unsigned char  shift = (address & 3) << 3;
    return (word[0] >> shift) + (word[1] << (32 - shift));
}

#else
#error Not implemented
#endif
compiles to nine instructions (26 bytes) on Cortex-M4, 13 instructions (also 26 bytes) on Cortex-M0 and M0+; compared to 20 bytes for each of the earlier two functions.  Furthermore, it accesses the following 32-bit integer if ptr happens to be aligned, and that can be quite problematic in some cases.

If you look at the generated code, you'll see that there are just too many individual operations needed to be done to achieve this, to get under the 10 instructions and 20 bytes of code limit, even with trickery.

The only situation when I find myself needing to access possibly unaligned pointers, is when processing binary data (or data stream) with a specific protocol.  In those cases, I've always found the overhead of using such accessors warranted.

(When dealing with massive amounts of binary data, I've crafted my own protocol which ensures data alignment.  I have done this, BTW, for reading and writing molecular dynamic data from both Fortran 95 and C code (although the Fortran code requires one nonstandard feature, reading raw binary data without record length words, but all Fortran 90/95 compilers I had access to, did support that).  This was to allow a distributed simulation to save oodles of data to node-local storage without slowing down the simulation; data combining and slicing was done afterwards using a helper utility, when collecting the resulting data.)
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1156
  • Country: fi
Re: ARM Unaligned data...
« Reply #28 on: August 18, 2019, 06:39:06 pm »
Also the behaviour isn't guaranteed by GCC so mfro is right; you shouldn't count on how the compiler deals with the situation.
It's guaranteed as much as any of its other documented behavior.

To me, it just says the compiler would ensure that a variable has the specified minimum alignment (to, e.g. make sure a char is aligned to the requirements of an int), but it doesn't say it would allow lowering the alignment requirements of a type and generate (special) code to enable dereferencing such a variable through a pointer to that type.
Don't be daft. There would be absolutely no point in having the compiler generate data layouts it can't access. It's no different than using the packed attribute on structs.

Quote
Did some tests with an m68k toolchain and gcc always generated the exact same (working) code no matter if I accessed the variable through an uint32_t pointer or an unaligned_uint32_t pointer.
How did you test? This is with GCC 9.1:

Code: [Select]
typedef uint32_t uuint32_t __attribute__((aligned(1)));

extern char buffer[];

uint32_t getuuint32(uint8_t p[]) {
    return *(uuint32_t*)p;
}

uint32_t getuint32(uint8_t p[]) {
    return *(uint32_t*)p;
}

00000090 <getuuint32>:
  90: 206f 0004      moveal %sp@(4),%a0
  94: 7200            moveq #0,%d1
  96: 1210            moveb %a0@,%d1
  98: e149            lslw #8,%d1
  9a: 4841            swap %d1
  9c: 4241            clrw %d1
  9e: 7000            moveq #0,%d0
  a0: 1028 0001      moveb %a0@(1),%d0
  a4: 4840            swap %d0
  a6: 4240            clrw %d0
  a8: 8280            orl %d0,%d1
  aa: 7000            moveq #0,%d0
  ac: 1028 0002      moveb %a0@(2),%d0
  b0: e188            lsll #8,%d0
  b2: 8081            orl %d1,%d0
  b4: 8028 0003      orb %a0@(3),%d0
  b8: 4e75            rts

000000ba <getuint32>:
  ba: 206f 0004      moveal %sp@(4),%a0
  be: 2010            movel %a0@,%d0
  c0: 4e75            rts

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 19839
  • Country: nl
    • NCT Developments
Re: ARM Unaligned data...
« Reply #29 on: August 18, 2019, 08:15:38 pm »
Also the behaviour isn't guaranteed by GCC so mfro is right; you shouldn't count on how the compiler deals with the situation.
It's guaranteed as much as any of its other documented behavior.
Then explain why I have seen it go wrong using the code presented in this thread.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1156
  • Country: fi
Re: ARM Unaligned data...
« Reply #30 on: August 18, 2019, 08:41:22 pm »
Then explain why I have seen it go wrong using the code presented in this thread.
Where?

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 19839
  • Country: nl
    • NCT Developments
Re: ARM Unaligned data...
« Reply #31 on: August 18, 2019, 08:50:58 pm »
Then explain why I have seen it go wrong using the code presented in this thread.
Where?
On an ARM cpu using GCC.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1156
  • Country: fi
Re: ARM Unaligned data...
« Reply #32 on: August 18, 2019, 09:42:15 pm »
I was asking for an example.

Offline mfro

  • Regular Contributor
  • *
  • Posts: 71
  • Country: de
Re: ARM Unaligned data...
« Reply #33 on: August 19, 2019, 02:27:26 am »

Don't be daft. There would be absolutely no point in having the compiler generate data layouts it can't access. It's no different than using the packed attribute on structs.
The documentation - at least according to my interpretation - only talks about placement of variables (so my interpretation is that it was originally meant to increase alignment), not about generating specific access code when you lower alignment (moreover, you obviously can't change placement through a pointer) whereas it explicitly talks about special code for struct field access for the packed attribute.

Quote
Did some tests with an m68k toolchain and gcc always generated the exact same (working) code no matter if I accessed the variable through an uint32_t pointer or an unaligned_uint32_t pointer.
How did you test? This is with GCC 9.1:
...

I basically did the same with GCC 4.6.4 (which is - admitted - quite old, but the documentation for the aligned() attribute did not change since as far as I can tell). The generated code is exactly the same in both cases.

The fact that your (newer) compiler appears to generate different code depending on the (apparently reduced) alignment of the type indeed supports your claim, but frankly, I can't see this documented. As long as it isn't explicitly documented, your code still appears dubious to me as it is violating strict aliasing rules. It might be that what you see is intended behaviour, but then the docs should be more explicit. As long as they aren't (it's still possible that I've overseen something, however, and would be glad if you could prove me wrong as I would agree this would be an elegant solution if it really was legal), it might as well be a side effect and pure coincidence that it works for you (potentially subject to change with next release).

Does the code change in any way when you use -O3 as opposed to -O2?

[P.S.: nvm, just tried with gcc 6.2 and that one indeed behaves like yours (regardless of optimisation level). That does not invalidate anything I said above, however]
« Last Edit: August 19, 2019, 05:23:37 am by mfro »
 
The following users thanked this post: nctnico

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: ARM Unaligned data...
« Reply #34 on: August 19, 2019, 05:50:39 am »
GCC-addicted code is no good.
 

Offline mfro

  • Regular Contributor
  • *
  • Posts: 71
  • Country: de
Re: ARM Unaligned data...
« Reply #35 on: August 19, 2019, 03:39:02 pm »
Today I asked on the gcc mailing list whether your approach is a valid use case for __atribute:__((aligned()))

Guess what, the first few opinions that flew in are even farther apart than ours  :palm:
 

Offline magic

  • Super Contributor
  • ***
  • Posts: 2334
  • Country: pl
Re: ARM Unaligned data...
« Reply #36 on: August 20, 2019, 06:48:07 am »
LOL. I guess it's some "users" list filled with armchair language lawyers who participated neither in writing the compiler nor its specification :)

Strictly speaking, casting from char* to __attribute(aligned(1))__ uint32_t* does not increase alignment requirements of the pointer, unlike casting to plain uint32_t*
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1156
  • Country: fi
Re: ARM Unaligned data...
« Reply #37 on: August 20, 2019, 11:27:28 pm »
Today I asked on the gcc mailing list whether your approach is a valid use case for __atribute:__((aligned()))
I saw your post. The documentation was changed to match the behavior (ie. allowing decreasing alignment via typedefs) in 2018, though the ticket was created in 2016 and mentions the behavior going back to at least GCC 4.5. GCC's testsuite has tested the functionality since at least 2012. Older versions of GCC did indeed explicitly state that the attribute could only be used to increase alignment, but that has not been true for some time.

In any case, GCC is smart enough that the code produced using the attribute is almost identical to the manual load-and-shift version, so it's really just syntactic sugar.


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf