Author Topic: No more code-size-limited version of IAR embedded workbench for ARM?  (Read 12887 times)

0 Members and 1 Guest are viewing this topic.

Online 5U4GB

  • Frequent Contributor
  • **
  • Posts: 639
  • Country: au
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #75 on: November 25, 2024, 09:21:24 am »
I think a better question might be "how recent is that advice" or "what version of gcc does it apply to" since its validity can change over time.  For a definitive answer, could I suggest the incredibly useful Godbolt compiler explorer, where you can select something like a hundred different compilers and compiler versions to see what each one does.  For its default selection of gcc 14.2 for x86-64 and no optimisation it's telling me that there's no difference between 'asm volatile' and 'asm', the same asm is present (once you add the necessary include of stdint.h).  As soon as you get to -O1 though the whole function vanishes without the 'volatile' present. 
Code: [Select]
asm volatile ("nop"); leaves the nop in place which would otherwise be removed.

Other compilers handle it differently, e.g:

Code: [Select]
#if defined __SUNPRO_C
asm("");
#endif // Bypass Sun compiler bug

so in that case just the presence of the asm(), at any optimisation level, is sufficient.
« Last Edit: November 25, 2024, 09:35:34 am by 5U4GB »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15911
  • Country: fr
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #76 on: November 25, 2024, 09:40:08 pm »
Quote
For inline assembly with "C" operands, it *can* be optilmized out depending on how the operands are used in the rest of the code. In that case, to prevent optimization, you must add the "volatile" keyword to the "asm" one.

Does that mean that this might get optimised out?

Code: [Select]

// Hang around for delay in ms. Approximate but doesn't need interrupts etc working.

__attribute__((noinline))
static void hang_around(uint32_t delay)
{
extern uint32_t SystemCoreClock;
delay *= (SystemCoreClock/4100);

asm volatile (
"1: subs %[delay], %[delay], #1 \n"
"   nop \n"
"   bne 1b \n"
: [delay] "+l"(delay)
);
}

if the "volatile" was not there?

According to what the GCC manual page says and my experience, with that piece of code, I would say that GCC might indeed optimize this out if you omit the 'volatile' keyword, given that this is a finite loop that only modifies the delay local variable which is never used afterwards. As I said, obviously that may vary depending on how exactly the compiler handles optimizations in a given version, which is why I recommend always using the volatile qualifier when using inline assembly with GCC and Clang if said assembly must be inlined verbatim (and that should be accepted - if possibly ignored - by most other compilers too these days). At worst, it's not necessary, at best, it will do what you intended.

So that's again the reason why you're likely to almost always see "asm volatile" in vendor source code.

Note that for such a delay as above (which given the SystemCoreClock variable, I assume this is for STM32 with the HAL), I recommend this instead, which will give you exact delays (down to a few cycles and assuming it's not interrupted) and using the same SystemCoreClock global:

Code: [Select]
static inline void delay_us(uint32_t nDelay_us)
{
uint32_t nStart = DWT->CYCCNT;

nDelay_us *= (SystemCoreClock / 1000000);

while ((DWT->CYCCNT - nStart) < nDelay_us) {}
}

No need for hard-coded tweaked constants and inline assembly.

If the DWT is not enabled in your code, you may first need to enable it (at initialization):

Code: [Select]
void DWT_Init(void)
{
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
DWT->CYCCNT = 0;
DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;
}
« Last Edit: November 25, 2024, 09:46:57 pm by SiliconWizard »
 

Online coppice

  • Super Contributor
  • ***
  • Posts: 10143
  • Country: gb
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #77 on: November 25, 2024, 10:06:34 pm »
Quote
For inline assembly with "C" operands, it *can* be optilmized out depending on how the operands are used in the rest of the code. In that case, to prevent optimization, you must add the "volatile" keyword to the "asm" one.

Does that mean that this might get optimised out?

Code: [Select]

// Hang around for delay in ms. Approximate but doesn't need interrupts etc working.

__attribute__((noinline))
static void hang_around(uint32_t delay)
{
extern uint32_t SystemCoreClock;
delay *= (SystemCoreClock/4100);

asm volatile (
"1: subs %[delay], %[delay], #1 \n"
"   nop \n"
"   bne 1b \n"
: [delay] "+l"(delay)
);
}

if the "volatile" was not there?
Yes, and that has been true for years. I met that issue at least a decade ago. Its perfectly reasonably behaviour on the part of GCC, as the code does nothing functional. Its literally just a time waster, and GCC needs to be signalled not to eliminate time wasting. Volatile works for that, just as it works to stop interrupt routines being tinkered with, because they also don't do anything useful the compiler can detect.
 
The following users thanked this post: Siwastaja

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9439
  • Country: fi
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #78 on: November 26, 2024, 07:06:44 am »
Remember that busy loops or special hardware register access patterns are not the only uses for inline assembly. You could, for example, want to use a specific instruction the compiler does not have knowledge about in part of calculation. Or you looked at compiler output and concluded you can do better with some manual inline assembly tuning. And in such cases you most definitely do want it to be part of the usual optimization. For example, you might change some constants after which the operation becomes completely unnecessary. If the original reason for the asm was performance optimization, then it now became a burden if not allowed to be optimized out.

Actually hand-optimization for performance is pretty classic reason to use inline asm, and this works well when the compiler knows the inputs and outputs of the asm and is allowed to optimize it out.

Semantics of the volatile qualifier follows the same logic as everywhere - you use it to force a variable access in memory, or in this case, force the injection of asm instructions into program code, even when they have no effects on the C abstract machine.

If you have seen asm busy loop implemented without volatile qualifier, that is just poor programming, an error which needs to be fixed. Mistakes happen, learn and go on.
« Last Edit: November 26, 2024, 07:08:42 am by Siwastaja »
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4414
  • Country: gb
  • Doing electronics since the 1960s...
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #79 on: November 26, 2024, 08:54:38 am »
Indeed; I do use CYCCNT:

Code: [Select]

// Delay for ms. Uses CPU clock counter CYCNT.
// Uses two loops to prevent due to delay*SystemCoreClock being too big for uint32_t.
// Max delay is uint32_t ms.
// This is a precise delay. It uses special code to deal with uint32_t overflow.
// DO NOT USE THIS before CYCCNT has been enabled!
// If this function is called before the PLL is set up to wind up the CPU to 168MHz, the
// delay will be 168/16 longer than the ms value, because the CPU starts up at 16MHz.

void hang_around(uint32_t delay)
{

volatile uint32_t max_count = SystemCoreClock/1000L;  // 168M = 1 sec
volatile uint32_t start_time;

do
{
start_time = DWT->CYCCNT;
while((DWT->CYCCNT-start_time) < max_count) ; // this counts milliseconds
delay--;
} while (delay>0);

}

As an interesting aside, this function appears to be re-entrant too. It only ever reads CYCCNT.

And yes indeed if I leave out the "volatile" on that asm version I get an empty function



No warning, the code won't crash, but the wait will be close to zero, which is gonna surprise somebody ;)
« Last Edit: November 26, 2024, 09:30:33 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3302
  • Country: ca
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #80 on: November 27, 2024, 02:46:24 am »
If you have seen asm busy loop implemented without volatile qualifier, that is just poor programming, an error which needs to be fixed. Mistakes happen, learn and go on.

The asm keyword is not a part of the standard, it can be found in Annex J (C11, don't know if that chaned later) which is informative only (sic!). It says:

Quote
J.5.10 The asm keyword

1 The asm keyword may be used to insert assembly language directly into the translator
output. The most common implementation is via a statement of the form:

asm (character-string-literal );

Requiring a volatile qualifier for it is a very bad idea. Chances that someone writes assembler code with the intention of such code being deleted during the optimization process are slim to none. It would be much better to assume that any asm code must stay (unless the whole function is deleted of course).

Of course, you need to follow all the idiosyncrasies of the compiler you use.  This is just a grim reality which you can do nothing about.

 
The following users thanked this post: KE5FX

Offline mark03Topic starter

  • Frequent Contributor
  • **
  • Posts: 750
  • Country: us
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #81 on: November 27, 2024, 03:33:23 am »
Indeed; I do use CYCCNT:

Code: [Select]
// Delay for ms. Uses CPU clock counter CYCNT.
// Uses two loops to prevent due to delay*SystemCoreClock being too big for uint32_t.
// Max delay is uint32_t ms.
// This is a precise delay. It uses special code to deal with uint32_t overflow.
// DO NOT USE THIS before CYCCNT has been enabled!
// If this function is called before the PLL is set up to wind up the CPU to 168MHz, the
// delay will be 168/16 longer than the ms value, because the CPU starts up at 16MHz.

void hang_around(uint32_t delay)
{
volatile uint32_t max_count = SystemCoreClock/1000L;  // 168M = 1 sec
volatile uint32_t start_time;

do
{
start_time = DWT->CYCCNT;
while((DWT->CYCCNT-start_time) < max_count) ; // this counts milliseconds
delay--;
} while (delay>0);
}

Something doesn't add up here.  DWT->CYCCNT is declared volatile, so this shouldn't be optimized away.  Also, I see no reason why max_count and start_time would need to be declared volatile.  What's going on?
 
The following users thanked this post: newbrain

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4414
  • Country: gb
  • Doing electronics since the 1960s...
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #82 on: November 29, 2024, 02:26:52 pm »
Yes indeed the 2x volatile should not be needed because DWT->CYCCNT is volatile so loading stuff from it should work ok. No idea where that code came from.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15911
  • Country: fr
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #83 on: November 29, 2024, 11:01:48 pm »
Yes indeed the 2x volatile should not be needed because DWT->CYCCNT is volatile so loading stuff from it should work ok. No idea where that code came from.

They aren't needed indeed.

For 'start_time', it won't make a difference, but it's not needed.
For 'max_count', which doesn't change within the loop, adding the 'volatile' quailifier actually makes it "semantically" odd, as this is a value that is precisely never supposed to change in the rest of the function.
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4414
  • Country: gb
  • Doing electronics since the 1960s...
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #84 on: November 30, 2024, 07:59:43 am »
This code removal business is confusing not just me but many others. In this area, C has been a moving target for all the years. For years, people have been working on the assumption that asm is never optimised but this is now clearly wrong. The result is that a whole load of stuff is vulnerable to a new compiler version etc.

I've been writing documentation on my project all along and now have hundreds of pages but I struggle to document this aspect. Fortunately the job is now done :)

It is not even clear whether there is a global "don't remove code" option. -Og certainly can remove code. Maybe -O0 (zero opt) but then you get some 30-50% more code.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9439
  • Country: fi
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #85 on: November 30, 2024, 08:11:19 am »
This code removal business is confusing not just me but many others. In this area, C has been a moving target for all the years. For years, people have been working on the assumption that asm is never optimised but this is now clearly wrong.

It's attitude problem, working with assumptions instead of facts. Really, just RTFM. asm is not standard C, it's compiler extension. Read the compiler manual.

Really, 10 seconds in google: "gcc asm keyword", first result leads to https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html , where mention of volatile qualifier fits in the first screenful of information.

All this rationalization and explaining takes 100x the effort compared to just checking instead of assuming. Really, the same principle applies to every field of engineering. RTFM, check, double-check, never assume.

I have never seen asm keyword used without volatile (except in rare context where optimizing it out is allowable, i.e. part of hand-optimizing a calculation). It never ever crossed my mind to not use the volatile qualifier. Maybe I have been lucky, or maybe I did read the manual already 20 years ago; I don't remember. And it's not a moving target, I remember this asm volatile from 1990's.

Admitting mistakes and doing better next time is fastest way forward.
« Last Edit: November 30, 2024, 08:16:01 am by Siwastaja »
 
The following users thanked this post: newbrain, JPortici

Offline cfbsoftware

  • Regular Contributor
  • *
  • Posts: 137
  • Country: au
    • Astrobe: Oberon IDE for Cortex-M and FPGA Development
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #86 on: November 30, 2024, 08:46:33 am »
It's attitude problem, working with assumptions instead of facts. Really, just RTFM. asm is not standard C, it's compiler extension. Read the compiler manual.

Really, 10 seconds in google: "gcc asm keyword", first result leads to https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html , where mention of volatile qualifier fits in the first screenful of information.
Note that the behaviour of Extended Asm is inconsistent with Basic Asm:
Quote
The optional volatile qualifier has no effect. All basic asm blocks are implicitly volatile.
https://gcc.gnu.org/onlinedocs/gcc/Basic-Asm.html
Chris Burrows
CFB Software
https://www.astrobe.com
 
The following users thanked this post: Siwastaja

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4414
  • Country: gb
  • Doing electronics since the 1960s...
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #87 on: November 30, 2024, 08:55:48 am »
Yes; this was posted earlier. Asm without C operands is not removed.

The GCC reference posted above by Siwastaja is IMHO really complicated. I'd say in close to 100% of cases of somebody using asm, there is an expectation of non removal ever. Is there an attribute which can be used on a function which prevents any optimisation? I am sure we have done this before.

I have been using __attribute__((optimize("O0"))) to prevent replacement of a loop structure with a call to memcpy etc. This should also stop optimisation of asm, surely? I will test it and report. EDIT: yes that does it perfectly.

Quote
just as it works to stop interrupt routines being tinkered with, because they also don't do anything useful the compiler can detect.

I haven't used volatile on ISRs and it works presumably because there is a pointer to them in the vector table. The same method works to preserve main() in an "overlay" which you jump to.
« Last Edit: November 30, 2024, 12:44:04 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9439
  • Country: fi
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #88 on: November 30, 2024, 02:32:43 pm »
Note that the behaviour of Extended Asm is inconsistent with Basic Asm:

If you think about it, it makes sense: the whole point of extended asm is that compiler is told what are inputs and what are outputs. The obvious reason is: optimization. Without this information, basic asm cannot be optimized away.
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9439
  • Country: fi
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #89 on: November 30, 2024, 02:34:41 pm »
Is there an attribute which can be used on a function which prevents any optimisation?

This is fundamentally a wrong question; a loaded question. Because C is not a portable macro assembler but an abstract language, there is no direct mapping from input to output. Therefore, what even is optimization cannot be clearly defined. So what would you want to disable? Do you want to prevent literals to be pre-calculated (e.g. 1+1 replaced with 2)? Even assemblers do that much optimization.

If you need exactly certain machine code output, write it in asm instead, but as said, even those do optimizations and abstract away stuff so in extreme cases you may want to write binary 1 and 0's directly.
« Last Edit: November 30, 2024, 02:36:19 pm by Siwastaja »
 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 3578
  • Country: it
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #90 on: November 30, 2024, 03:16:56 pm »
It's attitude problem, working with assumptions instead of facts. Really, just RTFM.
This.
Quote
Read the compiler manual.
This.

Quote
All this rationalization and explaining takes 100x the effort compared to just checking instead of assuming. Really, the same principle applies to every field of engineering.
This

Quote
RTFM, check, double-check, never assume.
and THIS.

To 90% or more of the questions about C's """quirks"""
 
The following users thanked this post: Siwastaja

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9439
  • Country: fi
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #91 on: November 30, 2024, 03:21:08 pm »
To 90% or more of the questions about C's """quirks"""

The stupidest thing here is that 99% of the time complaints about "C"'s "quirks" apply to every imaginable alternative and replacement as well.

For example, every "recommended" "new" "cool" replacement language, be it Rust or Elixir or whatever, is also defined through some sort of abstract machine, and will do optimization. "Portable macroassemblers" are nearly non-existent and trying to turn C into one is just stupid. Maybe there is a reason for why portable macro assemblers are nonexistent, maybe that's just a stupid way to develop software projects.
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4414
  • Country: gb
  • Doing electronics since the 1960s...
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #92 on: November 30, 2024, 05:13:08 pm »
Quote
This is fundamentally a wrong question; a loaded question

Don't be silly. It was in the asm context.

And the -O0 function attribute I posted above does work to preserve asm code.

It does have the predictable quirk: the C code in the same function is obviously also not optimised, and is a lot bigger. But here it is just one line
delay *= (SystemCoreClock/4100000L);
so it doesn't matter.

-Og:

Code: [Select]
  delay *= (B_SystemCoreClock/4100000L);
 80012ac: eb00 0080 add.w r0, r0, r0, lsl #2
 80012b0: 00c0      lsls r0, r0, #3

-O0:

Code: [Select]
delay *= (B_SystemCoreClock/4100000L);
 80000ac: 687a      ldr r2, [r7, #4]
 80000ae: 4613      mov r3, r2
 80000b0: 009b      lsls r3, r3, #2
 80000b2: 4413      add r3, r2
 80000b4: 00db      lsls r3, r3, #3
 80000b6: 607b      str r3, [r7, #4]
« Last Edit: November 30, 2024, 05:20:28 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9439
  • Country: fi
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #93 on: November 30, 2024, 05:26:56 pm »
Quote
This is fundamentally a wrong question; a loaded question

Don't be silly. It was in the asm context.

And the -O0 function attribute I posted above does work to preserve asm code.

For god's sake, how about the freaking volatile qualifier which is documented to prevent it being optimized out? What's wrong with a simple solution to a simple problem?
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4414
  • Country: gb
  • Doing electronics since the 1960s...
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #94 on: November 30, 2024, 06:12:19 pm »
Curiosity :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline KE5FX

  • Super Contributor
  • ***
  • Posts: 2113
  • Country: us
    • KE5FX.COM
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #95 on: November 30, 2024, 06:24:23 pm »
For god's sake, how about the freaking volatile qualifier which is documented to prevent it being optimized out? What's wrong with a simple solution to a simple problem?

Abusing a language keyword to fix a problem they created themselves isn't the triumphal feat of software engineering that you (and the GCC authors) seem to think it is.

But ...  :-//  that's what we have to work with.
 
The following users thanked this post: peter-h, cfbsoftware, 5U4GB

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9439
  • Country: fi
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #96 on: November 30, 2024, 06:49:06 pm »
Abusing a language keyword to fix a problem they created themselves isn't the triumphal feat of software engineering that you (and the GCC authors) seem to think it is.

So using a keyword the purpose of which is to tell the compiler that code has side effects, to tell the compiler that the code has side effects, is abuse. OK.
 
The following users thanked this post: newbrain

Offline KE5FX

  • Super Contributor
  • ***
  • Posts: 2113
  • Country: us
    • KE5FX.COM
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #97 on: November 30, 2024, 07:17:13 pm »
Abusing a language keyword to fix a problem they created themselves isn't the triumphal feat of software engineering that you (and the GCC authors) seem to think it is.

So using a keyword the purpose of which is to tell the compiler that code has side effects, to tell the compiler that the code has side effects, is abuse. OK.

The purpose of the volatile keyword is to tell the compiler that the value of a variable may be modified at any time. 

That's it.  Anything beyond that is something nonstandard that somebody made up.
 
The following users thanked this post: cfbsoftware

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9439
  • Country: fi
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #98 on: November 30, 2024, 07:25:31 pm »
That's it.  Anything beyond that is something nonstandard that somebody made up.

Of course. Every usable real-world C compiler is full of non-standard extensions. The standard allows extensions, and standard does not forbid using of standard keywords within extensions - why would it do that. This might be news to you, but everything around you in this world is made up by somebody.

Luckily, you are free not to use these extensions.
« Last Edit: November 30, 2024, 07:27:41 pm by Siwastaja »
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4414
  • Country: gb
  • Doing electronics since the 1960s...
Re: No more code-size-limited version of IAR embedded workbench for ARM?
« Reply #99 on: November 30, 2024, 07:41:31 pm »
The context here is asm code, and it is hard to think of any case where the coder wants asm code modified in any way.

Perhaps this (asm preservation) is hard for the compiler writer to do because optimisation is done on the (intermediate) asm output. It could be done by marking asm code in the source, obviously.

I would also argue that having to declare variables (particularly static ones) as volatile is daft since the coder clearly intended these to be maintained. The C compilers I recall using in the 1980s (IAR) did work like that. The volatile keyword was not necessary. I was working on projects where someone else was doing C and I was doing asm, and the hardware.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf