Author Topic: The umaal instruction for __ARM_ARCH >= 6.  (Read 1365 times)

0 Members and 1 Guest are viewing this topic.

Offline AVI-crakTopic starter

  • Regular Contributor
  • *
  • Posts: 125
  • Country: ru
    • Rtos
The umaal instruction for __ARM_ARCH >= 6.
« on: August 21, 2023, 08:33:29 pm »
Good people, help whoever can. I'm tired of banging my head against the wall, nothing helps.
The evil GCC refuses to use the umaal instruction - u64=u32+u32+u32*u32.
There is a simple code
Code: [Select]
uint64_t mult64toH128(uint64_t val1, uint64_t val2, uint32_t* nc){
    uint32_t c1 = (uint64_t) (val1 & 0xffffffff) * (val2 & 0xffffffff) >> 32;
    uint64_t a2a1 = (val1 & 0xffffffff) * (val2 >> 32) ;
    uint64_t b2b1 = (a2a1 & 0xffffffff)+ c1 + (val1 >> 32) * (val2 & 0xffffffff);
    c1 = val1 >> 32;
    val1 = (a2a1 >> 32) + (b2b1 >> 32) + c1 * (val2 >> 32) ;
    *nc = (uint64_t) val1 >> 63;
    if((val1 >> 63) == 0){
        val1 <<= 1;
        val1 |= (b2b1 >> 31) & 1;
    };
    return val1;
}
this is how clang 16.0.0 builds  https://godbolt.org/z/TTYvEYdvh
and so GCC 13.2.0  https://godbolt.org/z/EhYb8T1fv
The speed difference is very big. I'm only allowed to use GCC, no assembler inserts. It turns out slowly. If you use the built-in functions for long double - the speed drops dozens of times.
And I need it fast.
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5914
  • Country: es
Re: The umaal instruction for __ARM_ARCH >= 6.
« Reply #1 on: August 24, 2023, 07:59:47 pm »
Can't you use CMSIS __umaal?
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14490
  • Country: fr
Re: The umaal instruction for __ARM_ARCH >= 6.
« Reply #2 on: August 24, 2023, 10:18:39 pm »
The best way of using specific assembly instructions is just to write assembly.
Otherwise, check whether there is a builtin GCC function for it. Which would be circumventing the "no assembler inserts" somewhat.

If the rule is no assembly code within the C source files, you could code small functions in assembly entirely and put them in separate .s files. If the latter is also forbidden, I have an idea about what I would say to your boss.

But trying to twist C code to make a given compiler generate some specific assembly is about the worst option you could think of. Even if by any chance you managed to get what you wanted, it could change with any different version of the compiler, different compilation options or even just by inserting a new line of code.

If your boss/team rules make(s) this a better option than writing assembly directly,  :-// .

Now if no assembly at all is the rule - in itself, why not - then trying to get the compiler use specific assembly instructions makes no sense in that setup IMHO. If it becomes needed for performance reasons, either have the rules changed or use a more powerful MCU. What can we say.
 
The following users thanked this post: janoc, newbrain

Online ejeffrey

  • Super Contributor
  • ***
  • Posts: 3727
  • Country: us
Re: The umaal instruction for __ARM_ARCH >= 6.
« Reply #3 on: August 25, 2023, 02:27:24 am »
Yeah, getting a compiler to generate specific instructions is worth a small amount of time if the difference is significant, mostly to figure out if there is an architecture or optimization option you are missing.  But once the easy avenues are exhausted, it's not worth it.  It's inherently brittle and likely to change based on totally unpredictable conditions.

You say you have measured it and the difference is large and presumably important for your application.  Show that to your boss, and then your options are clear: 1) live with the performance, declare it a nice-to-have but not critical, 2) relax the assembly rules or find a way around them (such as compiler intrinsic or library assembly macros), or 3) use a faster processor.

Honestly I've personally never heard of an environment that was both 1) (CPU) performance critical, and 2) completely disallowed assembly.  There might be rules on assembly use and how it gets used, documented, and approved, but not outright banned.  In fact, I would say that if you actually have a blanket rule like that, basically by definition you are not working on a performance critical project.
 
The following users thanked this post: harerod

Offline AVI-crakTopic starter

  • Regular Contributor
  • *
  • Posts: 125
  • Country: ru
    • Rtos
Re: The umaal instruction for __ARM_ARCH >= 6.
« Reply #4 on: August 25, 2023, 11:01:27 pm »
Can't you use CMSIS __umaal?
There is an instruction, but there is no function in CMSIS. It is assumed that umaal is performed automatically. However, the semiconductor industry has managed to spawn defective microcontrollers, with a clear violation of standards. So, conditional assembly doesn't work...
Code: [Select]
#if (defined __ARM_ARCH && __ARM_ARCH > 6)https://godbolt.org/z/6dTzz7b3x
I researched the GCC sources, the umaal instruction is found in some exotic processors __ARM_ARCH == 6. In my opinion, ARM is a real mess.

These were the problems I was warned about. This situation is the reason for the prohibition of inserts in assembler for the entire project. However, if the situation is corrected, it is possible.

So I need a umaal instruction detector. In the simplest version, inserting in assembler stops the assembly of the code, but this is not enough. It is necessary to make sure that instead of stopping with an error, the function is replaced with a program version.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14490
  • Country: fr
Re: The umaal instruction for __ARM_ARCH >= 6.
« Reply #5 on: August 26, 2023, 03:37:15 am »
Maybe your issue becomes a bit clearer.

The UMAAL instruction is a DSP instruction that is available only on Cortex-M4 and above as far as I've gathered, and for the M4, DSP instructions are possibly optional? Not sure.
The M4 and M7 are ARMv7 architectures.

Yes, the ARM familty has become a bit complex, but if you stick to a limited number of supported variants, figuring out what kind of instructions they support shouldn't be too hard.

I'm not sure that the condition '__ARM_ARCH > 6' is enough to guarantee that UMAAL is available. Conversely, while I'm personally not aware of any ARMv6-based CPU that has this instruction, it may be a possibility.

Unless your goal is really to write completely reusable code for any ARM-based CPU, without anything to modify, though, I'm not sure that detecting the availability of the UMAAL instruction is the best approach, however nice it looks.
You could just define a macro that indicates it's supported based on the actual target that the code is being compiled for.

Alternatively, if you really want something more automated, you could do it at "build" time. A bit like autoconf tools do. So as a preliminary build step (that would be a dependency for all files that could use this assembly instruction), you could run the assembler on a very small assembly file containing this instruction, and if the assembler succeeds, define the macro that indicates the instruction is supported. Otherwise leave it undefined.
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5914
  • Country: es
Re: The umaal instruction for __ARM_ARCH >= 6.
« Reply #6 on: August 26, 2023, 05:14:46 am »
Have you tried adding this to the gcc cmd?
Code: [Select]
-march=armv6t2 -D__ARM_FEATURE_DSP
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: The umaal instruction for __ARM_ARCH >= 6.
« Reply #7 on: August 26, 2023, 07:00:12 am »
Hmm.  If you change the cpu to "-mcpu=cortex-m7+dsp" you'll get an error:
Code: [Select]
arm-unknown-linux-gnueabihf-gcc: error: 'cortex-m7' does not support feature 'dsp'
arm-unknown-linux-gnueabihf-gcc: note: valid feature names are: nofp.dp nofp

so it may be that gcc just doesn't know how to use the v7m dsp instructions.
(while manipulating code to get particular instructions is ... less than productive, I don't see any reason not to investigate whether you're missing some needed compiler-time switch, or complain about gcc's code generator...  I still complain on a regular basis how cm0 doesn't have customized float functions.)
 

Offline AVI-crakTopic starter

  • Regular Contributor
  • *
  • Posts: 125
  • Country: ru
    • Rtos
Re: The umaal instruction for __ARM_ARCH >= 6.
« Reply #8 on: August 26, 2023, 09:45:46 am »
GCC does not know how to independently use dsp instructions for the cortex Mxxx line, it was not brought in.
But he can where there is neon. Vector automatic inductions are just a fairy tale. No need to suffer with the choice of built-in functions, it is enough to correctly declare the data format, and then GCC itself. Where there is neon, there are no problems with the umaal instruction.1859374-2
« Last Edit: August 26, 2023, 09:49:25 am by AVI-crak »
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 3240
  • Country: gb
Re: The umaal instruction for __ARM_ARCH >= 6.
« Reply #9 on: August 26, 2023, 12:42:25 pm »
I'm not sure that the condition '__ARM_ARCH > 6' is enough to guarantee that UMAAL is available. Conversely, while I'm personally not aware of any ARMv6-based CPU that has this instruction, it may be a possibility.

It does seem a bit messy:

https://developer.arm.com/documentation/dui0204/h/arm-and-thumb-instructions/multiply-instructions/umaal

"This ARM instruction is available in ARMv6 and above, and E variants of ARMv5.

These 32-bit Thumb instructions are available in ARMv6T2 and ARMv7, except the ARMv7-M profile.

There is no 16-bit Thumb version of this instruction."
 
The following users thanked this post: SiliconWizard

Offline hans

  • Super Contributor
  • ***
  • Posts: 1641
  • Country: nl
Re: The umaal instruction for __ARM_ARCH >= 6.
« Reply #10 on: August 26, 2023, 01:59:36 pm »
The Cortex-m4 and m7 have the  `Armv7E-M`  architecture. Cortex-m3 is Armv7-M.

So I guess "E" stands for extended, and so it is supported on m4/m7. The m3 also doesn't list having single-cycle MAC and SIMD extensions, so it seems like those DSP instructions are part of "E".
 
The following users thanked this post: AVI-crak

Offline AVI-crakTopic starter

  • Regular Contributor
  • *
  • Posts: 125
  • Country: ru
    • Rtos
Re: The umaal instruction for __ARM_ARCH >= 6.
« Reply #11 on: August 26, 2023, 02:25:58 pm »
Good idea, dsp divides the problem by zero.
Code: [Select]
#if (defined __ARM_ARCH && __ARM_FEATURE_DSP == 1) It works.
https://godbolt.org/z/G3M64K7nz
« Last Edit: August 26, 2023, 02:36:02 pm by AVI-crak »
 
The following users thanked this post: hans


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf