GCC compiler and optimization.

#25 Reply
Posted by DiTBho on 10 Dec, 2020 08:15
I have recently worked with the control board of a couple of industrial embroidery machines. There are hardware mailboxes and more than one CPU per board and even peripherals do massively use DMA.

In their projects, I see that Japan engineers isolated the "critical code" into blocks which begin with #pragma-like instances (unfortunately specific to the proprietary toolchain they use) to locally disable all the optimizations inside a block of statements.

#disable optimization
----- critical code
#enable optimization

With Gcc, I have recently seen a workaround for a quad-copter drone between the mission-board (Linux-based) and the fly-board (RT-OS driven): the two boards talks via DMA on a shared ram with hardware semaphores and someone isolated the critical code into modules, and forced each module in the Makefile to be compiled without any optimization.

So, the project has a global Cflags that are applied to all the C files, but the critical modules override the Cflag locally with explicit "no-optimization here, please"

Sugar solution.

#26 Reply
Posted by DiTBho on 10 Dec, 2020 08:28
Someone also described all the pointers to the shared-ram as "opaque" (I mean just described as "external" but not declared), and moved the declaration into the linker script.

This "seems working" probably because in this way the C compiler doesn't assume anything, it just knows there is *somewhere* a pointer for a certain data-type and with a certain data-size, but it cannot optimize anything, while the linker script describe the data-type as pure starting address, length, session-kind and alignment type, making it consistent regarding the memory layout.

Interesting

#27 Reply
Posted by TomS_ on 10 Dec, 2020 10:18
Quote from: brucehoult on 10 Dec, 2020 00:03
omg. Why?
I single step my code quite often, maybe the first couple of times after initially writing it while I verify that it does actually do what I told it to do, and iron out any issues that I might have introduced or logic I mixed up.

Quote
But it's honestly not something I've done or wanted to do in the last 35 years probably.
Thats all well and good for you, but some people find single stepping to be a very useful tool.

Logging is a lot easier to implement on a PC where you can simply open a file and dump stuff into it without potentially taking up real percentages of flash/RAM.

#28 Reply
Posted by brucehoult on 10 Dec, 2020 12:22
Quote from: TomS_ on 10 Dec, 2020 10:18
Logging is a lot easier to implement on a PC where you can simply open a file and dump stuff into it without potentially taking up real percentages of flash/RAM.

If there's a communications channel for a debugger then there's a communications channel for logging.

#29 Reply
Posted by Siwastaja on 10 Dec, 2020 15:39
Getting to single-step is so easy and quick, that's the whole point. You press some F key in an IDE, your program runs, you hit single-step key, see how your program flows. Great for beginners; later, it boils down to laziness, lack of skill, and finally, there's absolutely nothing wrong in that. Sometimes having just the hammer suffices. You can do surprisingly lot with just a hammer.

You just hit the limits of what you can do with it, but often it's enough. The trap is, you think you have the best thing since sliced bread, but at some point, you are wasting more time using substandard tools because they are easier to run initially.

It's all fine until these single-steppers start claiming their tools are superior to anything else, and using a screwdriver to drive a screw is "unprofessional" and "stupid" because they are so brilliant they can use a professional, nicely wrapped hammer. At that point, I just chuckle to myself and ignore them.

#30 Reply
Posted by nfmax on 10 Dec, 2020 16:33
It can be quite useful to single-step through convoluted piece of if-encrusted logic with a pencil and paper to hand - but this does not have to be done as part of the target build. Code like that I will sometimes build into a simple test harness I can run under a full-featured IDE on the development computer. Single-stepping on the target, when the hardware is making things happen under you with an uncontrollable speed and timing, is much more problematic.

#31 Reply
Posted by voltsandjolts on 10 Dec, 2020 16:40
Quote from: brucehoult on 10 Dec, 2020 00:03
Quote from: GromBeestje on 09 Dec, 2020 09:17
The thing with that is, while developing I would like to be able to step through the code line-by-line. That won't work when you optimise.
omg. Why?
<snip>....honestly not something I've done or wanted to do in the last 35 years probably.

Never single stepped out of a trap to see where it occurred?

Edit: fixed quote.

#32 Reply
Posted by SiliconWizard on 10 Dec, 2020 16:43
Quote from: Nominal Animal on 10 Dec, 2020 02:22
Quote from: SiliconWizard on 10 Dec, 2020 00:57
Code: [Select]
volatile Basetype *pointer;
Yes, but in this particular case, the problem is more like
Code: [Select]
{ char buffer[2] = { 0, 0 }; { volatile char *const ptr = buffer; // Use ptr to set up machine registers for DMA, but do not explicitly dereference it. // Wait for DMA to complete } // Is the compiler allowed to assume buffer[0] == 0 and buffer[1] == 0 ? }The compiler sees a pointer to volatile contents of the buffer, but not it being dereferenced; does that mean that in the outer block, it has to assume the buffer may have changed?

Your example indeed won't "work". Declaring the pointer to volatile at the level you did here is useless, as you understood.
If you're using the buffer outside of this block and want the compiler to assume it may have changed, you need to use a pointer to volatile at the level of the outer block to dereference it.
Based on your example, the correct way of doing it would be:
Code: [Select]
{ volatile char buffer[2] = { 0, 0 }; { volatile char *const ptr = buffer; // Use ptr to set up machine registers for DMA, but do not explicitly dereference it. // Wait for DMA to complete } // Is the compiler allowed to assume buffer[0] == 0 and buffer[1] == 0 ? }
Alternatively, you can declare the array non-volatile and access the content via a pointer to volatile (but of course at the outer level), such as this:
Code: [Select]
{ char buffer[2] = { 0, 0 }; { char *const ptr = buffer; // Use ptr to set up machine registers for DMA, but do not explicitly dereference it. // Wait for DMA to complete } // Is the compiler allowed to assume buffer[0] == 0 and buffer[1] == 0 ? volatile char *ptr = buffer; ... ptr[0] == 0 ... }

#33 Reply
Posted by Nominal Animal on 10 Dec, 2020 18:06
Quote from: SiliconWizard on 10 Dec, 2020 16:43
If you're using the buffer outside of this block and want the compiler to assume it may have changed, you need to use a pointer to volatile at the level of the outer block to dereference it.
Yes, exactly. (I only quoted the above small part, but I do believe we are in full agreement.)

GCC already has for example a built-in to describe known pointer alignment, __builtin_assume_aligned(pointer, alignment [, offset ] ). This generates no code, just changes the compiler assumptions about how the pointer (target/value) is aligned.

I expect a similar built-in to be provided, say __builtin_assume_modified(pointer, size), through ARM GCC efforts initially, because this is becoming more and more of a problem in embedded targets. It simply tells the compiler to invalidate all its existing assumptions about the contents of the referred to memory region, and does not generate any code (like hardware read or write barriers or such). It fixes the issue without extra side effects.

An "implementation" of such built-in is trivial, because we can do it with GCC already (since version 3, probably earlier), via
Code: [Select]
#define __builtin_assume_modified(ptr, len) __asm__ __volatile__ ("": "+m" (*(char (*)[len])(ptr)))and, the syntax itself has been explicitly shown in the GCC Extended Asm documentation. (The reason for making it a built-in is twofold: being a built-in encourages compatibility across compilers; and being documented at the source, would make it easier to point it out and get embedded libraries and frameworks to use it where needed. As it is, the macro itself is a side effect of extended inline assembly, and happens to only rely on the "m" output modifier, which is common to all architectures. Having something more explicit for the task would be clearer for all.)

This is explicitly useful for DMA, and for any other mechanism where the storage representation really isn't volatile in any sense, only modified once by a mechanism invisible to the compiler. In typical hosted C and C++ environments this is usually not an issue, because being compiled in a separate unit provides a similar barrier for the compiler assumptions of the contents; but in embedded and microcontroller environments, where everything is often compiled in the same unit, we do actually need something like this.

It might be worth it to talk to the arm gcc folks about this, actually. I'm just a nobody, and don't exactly relish telling others what they should do to support their users better, but if somebody already has contacts with them, consider pushing this upstream a bit.

#34 Reply
Posted by DiTBho on 10 Dec, 2020 18:28
What about the above two points I posted?
Too bad? or may be interesting alternatives?

#35 Reply
Posted by Nominal Animal on 10 Dec, 2020 18:49
My opinion only:

Using the linker symbol is roughly equivalent to compiling in a separate unit. It works, but it is complicated, and hides the actual issue from human programmers. Future programmers will have to re-discover the problem and the solution for themselves.

Using pragmas to modify optimizations to get code to work like you want is a hack. It hides the problem by making the compiler stupid.

I like the assume-modified marking approach, because it fits the C and C++ standard models, is easy to understand ("okay compiler, this region of memory may have changed, so don't make any assumptions about its contents, okay?"), and is simple to implement.

Others may disagree.

#36 Reply
Posted by brucehoult on 10 Dec, 2020 21:40
Quote from: voltsandjolts on 10 Dec, 2020 16:40
Quote from: brucehoult on 10 Dec, 2020 00:03
Quote from: GromBeestje on 09 Dec, 2020 09:17
The thing with that is, while developing I would like to be able to step through the code line-by-line. That won't work when you optimise.
omg. Why?
<snip>....honestly not something I've done or wanted to do in the last 35 years probably.

Never single stepped out of a trap to see where it occurred?

It's pretty rare when I use an interactive debugger. On embedded systems there's often a lot of background processing, DMA, network things to time out, which mean it's simply not practical to sit in a debugger staring at the screen because everything else will fall to pieces while you do that.

If I *was* in that situation I might set a breakpoint on the RFI instruction and then single-step *once*, if that was supported in that environment.

But more likely I'd look at what the interrupt stored on the stack or in the ExceptionPC CSR or wherever it is stored on that architecture and just read the exception PC from there.

Or log it at the start of the interrupt handler for later analysis.