Author Topic: Identify this GCC optimization  (Read 3441 times)

0 Members and 1 Guest are viewing this topic.

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11257
  • Country: us
    • Personal site
Identify this GCC optimization
« on: October 15, 2017, 09:59:37 pm »
So I'm running into an interesting optimization GCC does. I really don't understand it, and it would be nice to see an explanation. This has been first identified on MIPS, but here I'm showing examples on ARM for simplicity.

So here is the code:
Code: [Select]

int main(void)
{
  int cnt = 0;
  int cnt1 = 0;

  while (1)
  {
    if (cnt++ == 2000000)
    {
      *(volatile uint32_t *)0x00020000 = cnt1++;
      cnt = 0;
    }

    //asm("nop"); // <- This line is important
  }
}
The idea is to have some delay, and write a slowly incrementing  value into a memory mapped location.

When compiled with -O1, I get the following code:
Code: [Select]
00000008 <main>:
   8: 2200      movs r2, #0
   a: 4805      ldr r0, [pc, #20] ; (20 <main+0x18>)
   c: 2180      movs r1, #128 ; 0x80
   e: 0289      lsls r1, r1, #10
  10: 1c03      adds r3, r0, #0
  12: 3b01      subs r3, #1
  14: 2b00      cmp r3, #0
  16: d1fc      bne.n 12 <main+0xa>
  18: 600a      str r2, [r1, #0]
  1a: 3201      adds r2, #1
  1c: e7f8      b.n 10 <main+0x8>
  1e: 46c0      nop ; (mov r8, r8)
  20: 001e8481 .word 0x001e8481
Which is basically correct.

But if compiled with a higher level of optimization (-O2, -O3, -Os), I get this:
Code: [Select]
00000008 <main>:
   8: 2300      movs r3, #0
   a: 2280      movs r2, #128 ; 0x80
   c: 0292      lsls r2, r2, #10
   e: 6013      str r3, [r2, #0]
  10: 3301      adds r3, #1
  12: e7fa      b.n a <main+0x2>
It completely eliminated the delay loop.

But if you uncomment the "nop", it does produce full code.

The logic behind this optimization is clear, I just wan to identify if there is GCC specific type of optimization that causes this. Going through all flags is probably an option, but may be someone knows this already?
Alex
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Identify this GCC optimization
« Reply #1 on: October 15, 2017, 10:51:04 pm »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11257
  • Country: us
    • Personal site
Re: Identify this GCC optimization
« Reply #2 on: October 15, 2017, 10:55:46 pm »
It is not really "dead code", since it just eliminates extra loops over the same live code. Anyway, none of the flags mentioned there did anything in this case.
Alex
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1638
  • Country: nl
Re: Identify this GCC optimization
« Reply #3 on: October 15, 2017, 10:59:17 pm »
Could be many mechanics at play here, but I think a core one is:

https://en.wikipedia.org/wiki/Loop-invariant_code_motion

With the Nop you're breaking hoare logic mechanisms, as it cannot establish weakest preconditions for that assembly statement.
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 826
Re: Identify this GCC optimization
« Reply #4 on: October 15, 2017, 11:10:34 pm »
I'm sure you know this already, but
int cnt = 0; -> volatile int cnt = 0;
since this is what you don't want eliminated

I would also suspect an asm("") would do the same as your asm("nop")

I can't answer your question on what optimization this falls under, though
« Last Edit: October 17, 2017, 03:54:58 pm by cv007 »
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11257
  • Country: us
    • Personal site
Re: Identify this GCC optimization
« Reply #5 on: October 16, 2017, 12:35:05 am »
int cnt = 0; -> volatile int cnt = 0;
Well, yes, this is obvious once you see the output.

also, forgive me if I change your code (nothing wrong with yours, but it is a little bit of a mind bender at first glance :) ,I'm not sure why, though but it may be I'm not too bright )-
The whole issue resulted from removing a bunch of stuff from a super loop, so that nop represents a ton of code that should not be blocked, as it would be with your change.
Alex
 

Offline joshtyler

  • Contributor
  • Posts: 36
  • Country: gb
Re: Identify this GCC optimization
« Reply #6 on: October 16, 2017, 12:35:42 am »
As mentioned by cv007, the simple solution is to make cnt volatile, but it seems as though you are more interested as to why the compiler makes the optimisation.

When you write C code, the nominal goal of the compiler is to figure out what the output of your code will be and produce the fastest/smallest implementation that will achieve the intended result. The compiler can re-order your code, completely remove chunks, do whatever it wants so long as the output is unchanged.

The actual definition of the volatile keyword can be confusing, and actually slightly vague in one technicality. When it boils down the way I find helps me to picture it is to imagine that the volatile variable is being used in a printf() or scanf() call every time it is read/written to/from. The compiler must guarantee that someone looking at the output of that printf() would see an unchanged behaviour as a result of the compiler optimisations. In your case the optimised code still provides an increasing count on the volatile register, and so the compiler thinks that it has done an amazing job in allowing that to happen really quickly by removing your outer loop.

Adding the asm("nop"); is a somewhat "dangerous" way to fix the problem. GCC is obliged to put a nop at that point in the code, so it forces it to put the entire loop in there, but GCC is still perfectly entitled to move around that asm command around with other code statements in unexpected ways. It makes no difference in this case, but could trip up more complex code. The more elegant solution (only in my opinion) is to make cnt volatile, so that the language definition forces the compiler to create the delay loop you want.

There's a very good episode of Embedded with Matt Goldbolt, author of the excellent Compiler Explorer, where he discusses writing compilers, and how compilers are able to prove that they are allowed to optimise code http://embedded.fm/episodes/190.

As you suspected, if you are very keen to know exactly which GCC flag(s) enable this optimisation, then you'll probably have to go through them one at a time.
 
The following users thanked this post: thm_w

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11257
  • Country: us
    • Personal site
Re: Identify this GCC optimization
« Reply #7 on: October 16, 2017, 12:38:10 am »
I know why this happens and why fixes work. I'm just interested in a specific type of optimization that did this. I'll just go over all of them once I have more time.
Alex
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11257
  • Country: us
    • Personal site
Re: Identify this GCC optimization
« Reply #8 on: October 16, 2017, 01:35:36 am »
So I removed "-O"  flag and added all the optimization options as a huge list. And the code is bigger that even -O1.

In fact, the code is very different between -O1 and plain listing flags that are mentioned in the documentation for -O1.

Some more digging is required, I guess.
Alex
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11257
  • Country: us
    • Personal site
Re: Identify this GCC optimization
« Reply #9 on: October 16, 2017, 01:47:45 am »
Doing things in reverse, and enabling -O2, but then prefixing all options with -fno-xxx, I found that the option responsible for this is
Quote
-ftree-dce Perform dead code elimination (DCE) on trees. This flag is enabled by default at -O and higher.

So it looks like this falls under dead code for GCC, at least at some stage.
Alex
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: Identify this GCC optimization
« Reply #10 on: October 16, 2017, 10:00:05 am »
IIRC, GCC was changed to special-case inline assembly statements with no inputs, outputs, or side-effects precisely so it won't optimize away nops. Can't find a reference for this offhand, though.

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11257
  • Country: us
    • Personal site
Re: Identify this GCC optimization
« Reply #11 on: October 16, 2017, 02:53:19 pm »
IIRC, GCC was changed to special-case inline assembly statements with no inputs, outputs, or side-effects precisely so it won't optimize away nops. Can't find a reference for this offhand, though.
This is my experience, and I'm relying on it in day to day work.
Alex
 

Offline rsjsouza

  • Super Contributor
  • ***
  • Posts: 5986
  • Country: us
  • Eternally curious
    • Vbe - vídeo blog eletrônico
Re: Identify this GCC optimization
« Reply #12 on: October 16, 2017, 04:02:07 pm »
Doing things in reverse, and enabling -O2, but then prefixing all options with -fno-xxx, I found that the option responsible for this is
Quote
-ftree-dce Perform dead code elimination (DCE) on trees. This flag is enabled by default at -O and higher.

So it looks like this falls under dead code for GCC, at least at some stage.
In my experience, I always got these types of loops suppressed when using high optimization levels - although I don't use GCC extensively in higher optimization codes, other compilers indeed treated this as a dead code.

IIRC, GCC was changed to special-case inline assembly statements with no inputs, outputs, or side-effects precisely so it won't optimize away nops. Can't find a reference for this offhand, though.
This is my experience, and I'm relying on it in day to day work.
In my experience, "C" optimizers will leave the function code alone the moment they find an assembly inline instruction - they don't go to the extend of evaluating what is inside the ASM code.
Vbe - vídeo blog eletrônico http://videos.vbeletronico.com

Oh, the "whys" of the datasheets... The information is there not to be an axiomatic truth, but instead each speck of data must be slowly inhaled while carefully performing a deep search inside oneself to find the true metaphysical sense...
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: Identify this GCC optimization
« Reply #13 on: October 16, 2017, 04:37:49 pm »
In my experience, "C" optimizers will leave the function code alone the moment they find an assembly inline instruction - they don't go to the extend of evaluating what is inside the ASM code.
The point of GCC's extended inline assembly syntax is to provide the compiler with some insight into the assembly code (inputs, outputs, clobbers), and let it optimize around it.

Offline Jeroen3

  • Super Contributor
  • ***
  • Posts: 4078
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: Identify this GCC optimization
« Reply #14 on: October 23, 2017, 06:18:15 pm »
It's funny. I saw a delay like this on /r/programminghorror today.
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11257
  • Country: us
    • Personal site
Re: Identify this GCC optimization
« Reply #15 on: October 23, 2017, 06:20:19 pm »
It's funny. I saw a delay like this on /r/programminghorror today.
There is absolutely noting wrong with delays like this on their own.  And in that case it is actually done right and ii is declared volatile.

It is very easy to over-complicate things because of some ideas that people consider "good", without ever questioning them.
« Last Edit: October 23, 2017, 06:22:02 pm by ataradov »
Alex
 

Offline rsjsouza

  • Super Contributor
  • ***
  • Posts: 5986
  • Country: us
  • Eternally curious
    • Vbe - vídeo blog eletrônico
Re: Identify this GCC optimization
« Reply #16 on: October 23, 2017, 08:36:52 pm »
In my experience, "C" optimizers will leave the function code alone the moment they find an assembly inline instruction - they don't go to the extend of evaluating what is inside the ASM code.
The point of GCC's extended inline assembly syntax is to provide the compiler with some insight into the assembly code (inputs, outputs, clobbers), and let it optimize around it.
Not at all, at least in the compilers and optimizers I work with. The "C" compiler and the "C" optimizer have zero visibility about any assembly inline and act conservatively when faced with one of these constructs.

You may be thinking about intrinsics, which are "C" constructs that have a 1:1 correlation with opcodes.

It's funny. I saw a delay like this on /r/programminghorror today.
There is absolutely noting wrong with delays like this on their own.  And in that case it is actually done right and ii is declared volatile.

It is very easy to over-complicate things because of some ideas that people consider "good", without ever questioning them.
+1. The code shown is perfectly fine.
Vbe - vídeo blog eletrônico http://videos.vbeletronico.com

Oh, the "whys" of the datasheets... The information is there not to be an axiomatic truth, but instead each speck of data must be slowly inhaled while carefully performing a deep search inside oneself to find the true metaphysical sense...
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11257
  • Country: us
    • Personal site
Re: Identify this GCC optimization
« Reply #17 on: October 23, 2017, 08:40:49 pm »
Not at all, at least in the compilers and optimizers I work with. The "C" compiler and the "C" optimizer have zero visibility about any assembly inline and act conservatively when faced with one of these constructs.
This.

GCC has the best inline assembly syntax out there, which makes integration of C and assembly easy and straightforward. But those inputs, outputs and clobbers lists are for C to adjust to the assembly, not the other way around.

The only optimization on assembly that I know of, is instruction move into the branch delay slot on MIPS. MIPS assembler from GCC can do this on its own.
Alex
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: Identify this GCC optimization
« Reply #18 on: October 23, 2017, 09:31:56 pm »
Not at all, at least in the compilers and optimizers I work with. The "C" compiler and the "C" optimizer have zero visibility about any assembly inline and act conservatively when faced with one of these constructs.

You may be thinking about intrinsics, which are "C" constructs that have a 1:1 correlation with opcodes.
No, other compilers at least used to take the conservative approach that inline assembly could have completely altered the program state and eg. reload all variables from memory. GCC instead makes you annotate what the you mody, allowing the compiler to be less pessimistic around the inline assembly. GCC can also allocate registers for the inline assembly, again allowing it to produce better code.


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf