Author Topic: Efficient C Code for ARM Devices  (Read 2141 times)

0 Members and 1 Guest are viewing this topic.

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2743
  • Country: nz
Efficient C Code for ARM Devices
« on: October 02, 2019, 08:56:50 pm »
Dropping this here as it looks to be interesting to many:

https://m.eet.com/media/1157397/atc-152paper_shore_v4.pdf

Has anybody seen "__promise" before? I haven't! - gives compiler hints to allow better code generation.

Example from paper:

Code: [Select]
void f(int *x, int n)
{
    int i;
    __promise((n > 0) && ((n&7)==0));
    for (i = 0; i < n; i++)  {
         x[i]++;
    }
}
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 10171
  • Country: fr
Re: Efficient C Code for ARM Devices
« Reply #1 on: October 02, 2019, 10:26:58 pm »
Has anybody seen "__promise" before? I haven't! - gives compiler hints to allow better code generation.

Ahah, nope.
"restrict", yes. It's actually C99, and doesn't need any underscore for any C99-compliant compiler. "restrict" can yield some interesting optimizations when used properly.

But "__promise"? Ahem. It just looks like some kind of "contract", but used completely backwards. Instead of making sure some condition will always hold true, it makes the compiler assume it is, without any guarantee that it's ever going to actually hold true. Sure it can lead to interesting opportunities for optimizations, but it's rather totally atrocious from any other point of view... ::)

Just pass to "n" any value that breaks the "promise", and the generated code may not only not be optimized, but could as well be completely wrong for this value. EEK. Talk about a good idea...

And now if you actually actively CHECK for a condition, the compiler will not only generate code to check it's true, but will also have all hints it needs to optimize the code when it is. Much, much better (sure you get the check overhead, but you can't have it all. Oh, and there's now the interesting static_assert() in C11...)

Just my opinion. Favoring optimization at the obvious (unless I missed something) expense of security is... not always judicious, to put it gently. ;D

Finally, I'm not quite sure which compiler supports this. Apparently armcc: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491g/CJACHIDG.html
which I'll probably never get to use anyway. Oh well.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2743
  • Country: nz
Re: Efficient C Code for ARM Devices
« Reply #2 on: October 02, 2019, 11:10:06 pm »
Sure it can lead to interesting opportunities for optimizations, but it's rather totally atrocious from any other point of view... ::)

Exactly


  Promises are made
  Code known to be broken
  Programmers cry
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1199
  • Country: fi
Re: Efficient C Code for ARM Devices
« Reply #3 on: October 02, 2019, 11:15:52 pm »
It looks like you can convince GCC to perform the same optimization like so:
Code: [Select]
void f(int *x, int n)
{
    int i;
    if ((n > 0) && ((n&7)==0)) {
        for (i = 0; i < n; i++)  {
            x[i]++;
        }
    }
    else {
        __builtin_unreachable();
    }
}

00000000 <f>:
   0: 3804      subs r0, #4
   2: eb00 0181 add.w r1, r0, r1, lsl #2
   6: f850 3f04 ldr.w r3, [r0, #4]!
   a: 3301      adds r3, #1
   c: 4281      cmp r1, r0
   e: 6003      str r3, [r0, #0]
  10: d1f9      bne.n 6 <f+0x6>
  12: 4770      bx lr
Compare that to the version without:
Code: [Select]
void g(int *x, int n)
{
    int i;
    for (i = 0; i < n; i++)  {
        x[i]++;
    }
}

00000014 <g>:
  14: 2900      cmp r1, #0
  16: dd08      ble.n 2a <g+0x16>
  18: 3804      subs r0, #4
  1a: eb00 0181 add.w r1, r0, r1, lsl #2
  1e: f850 3f04 ldr.w r3, [r0, #4]!
  22: 3301      adds r3, #1
  24: 4281      cmp r1, r0
  26: 6003      str r3, [r0, #0]
  28: d1f9      bne.n 1e <g+0xa>
  2a: 4770      bx lr
I don't see this as any worse than all other situations where lying to the compiler invokes undefined behaviour, eg. casting unaligned pointers and so on. Static code analyzers will use regular assert()s in their analysis, but I don't know offhand if any compilers do. Maybe in optimized debug builds?
« Last Edit: October 02, 2019, 11:25:11 pm by andersm »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2743
  • Country: nz
Re: Efficient C Code for ARM Devices
« Reply #4 on: October 02, 2019, 11:24:10 pm »
It looks like you can convince GCC to perform the same optimization like so:
The ARM document was hinting that it would help with unrolling the loop too, avoiding quite a bit of code that is otherwise needed.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1199
  • Country: fi
Re: Efficient C Code for ARM Devices
« Reply #5 on: October 02, 2019, 11:35:03 pm »
The ARM document was hinting that it would help with unrolling the loop too, avoiding quite a bit of code that is otherwise needed.
It seems GCC doesn't take advantage of that hint, but it's not very aggressive at unrolling anyway.

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 4124
  • Country: fi
    • My home page and email address
Re: Efficient C Code for ARM Devices
« Reply #6 on: October 03, 2019, 01:31:17 am »
I do use known_aligned_ptr = __builtin_assume_aligned(ptr, alignment) GCC extensions (for SSE/AVX vector stuff), and value = __builtin_expect(expression, expected_value) for branch predicion (when having a fast and a slow path depending on alignment/being multiples of some value), but haven't seen __promise() before, probably because I use mostly GCC.

(Only for critical code, like large matrix or array operations, where this kind of stuff makes sense, though.)
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9315
  • Country: us
Re: Efficient C Code for ARM Devices
« Reply #7 on: October 03, 2019, 03:51:32 pm »
I hope to never run across those optimization hints.  They look like a disaster in the making.

“Premature optimization is the root of all evil”
  -- Donald Knuth
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 5896
  • Country: fi
Re: Efficient C Code for ARM Devices
« Reply #8 on: October 03, 2019, 04:41:38 pm »
I hope to never run across those optimization hints.  They look like a disaster in the making.

“Premature optimization is the root of all evil”
  -- Donald Knuth

I don't think you have thought this over very well.

The implementation isn't perfect, but the direction is valid, and not for optimization.

The idea, that you constraint the ranges of your variables, is valuable in static and runtime verification, error checking, and finally, optimization. It forces you think, and formally write it down, and enables automatic checking. Safety-conscious languages like Ada use this principle.

Sure, this particular way of doing it may suck, and they should recommend a similar function but with error checking instead, and only in extremely speed-critical places, suggest one without error checking.

But in general, this is not a disaster in making; it's a right way to go to, for readability, understanding, and avoiding bugs. Optimization opportunities come as an additional bonus.

(andersm's version enables the logical place to insert your handle_invalid_value_error() call.)
 

Offline kamtar

  • Regular Contributor
  • *
  • Posts: 62
Re: Efficient C Code for ARM Devices
« Reply #9 on: October 09, 2019, 03:53:28 pm »
maybe a stupid question but wouldn't this allow the compiler to do the same optimization?  ???
Code: [Select]
void f(int *x, int n)
{
    if (((n > 0) && ((n&7)==0)) != true)
      return;

    int i;
    for (i = 0; i < n; i++)  {
         x[i]++;
    }
}
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1199
  • Country: fi
Re: Efficient C Code for ARM Devices
« Reply #10 on: October 09, 2019, 06:23:33 pm »
Yes, but if the compiler can't see the arguments it has to generate code for the if statement.
« Last Edit: October 09, 2019, 07:46:13 pm by andersm »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 10171
  • Country: fr
Re: Efficient C Code for ARM Devices
« Reply #11 on: October 09, 2019, 06:42:46 pm »
maybe a stupid question but wouldn't this allow the compiler to do the same optimization?  ???
Code: [Select]
void f(int *x, int n)
{
    if (((n > 0) && ((n&7)==0)) != true)
      return;

    int i;
    for (i = 0; i < n; i++)  {
         x[i]++;
    }
}

Maybe you haven't read the whole thread, because this is exactly what some of us have said (I being first when saying "if you actually actively CHECK for a condition, the compiler will not only generate code to check it's true, but will also have all hints it needs to optimize the code when it is".

And most of us agree that the ARM optimization hint is a VERY slippery thing.

Now, to be fair, many of us have already encountered the case when you had to write a function that should be very efficient, and in which you didn't want to check for extra conditions each time it's executed, due to the extra overhead (in that case, you'd make sure the conditions always hold true when calling the function). In that case, if again you chose NOT TO actively check parameters, giving the compiler an opportunity  to better optimize the code can be interesting, which this "assume" directive is for. It's always risky, but I'd venture many, if not most of us, have done that at least once (silently assuming parameter values are in a given range).

If you do that correctly, the calling code should still include ways of checking the parameters from a higher level, or at least only generate parameters that meet the expected ranges. There again, the compiler WILL have the info if it's not too dumb. Of course, for that to work properly, you should usually put said function in the same source file as where it's called. If it can be called externally, then the compiler will usually not do any kind of cross-file static analysis AFAIK. So my rule and advice here is that if you ever need to write such a function that doesn't check parameters at run-time for optimization reasons, only do that with local functions (local to one source file, declared static).


 

Offline Jeroen3

  • Super Contributor
  • ***
  • Posts: 3796
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: Efficient C Code for ARM Devices
« Reply #12 on: October 09, 2019, 06:56:40 pm »
We're in a world where we try to keep as much little bug inducing things away from the programmer. And you come around with this  :P
Oh well, I'm sure someone has a need for this.
I always try to avoid tricks like this unless it's highly streamlined stuff that yields you something and isn't portable anyway. (eg: kernels, drivers)
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf