Author Topic: GCC ARM32 compiler too clever, or not clever enough? (Read 13808 times)

peter-h · « **Reply #75 on:** May 02, 2022, 08:30:09 am »

That explains it then... v10 did this. I see no COMMON generated now.

gf · « **Reply #76 on:** May 02, 2022, 12:53:54 pm »

If the compiler generates common blocks, this allows to define the same global variable e.g. int a[100] in multiple C files, and the linker shares this variable between the corresponding object files, and does not complain about multiple definition of the symbol.
The portable way is rather to define int a[100] in only one source file, while all other source files just declare extern int a[100].

brucehoult · « **Reply #77 on:** May 02, 2022, 01:29:31 pm »

Quote from: peter-h on May 02, 2022, 07:51:42 am

In the old Z80 etc days, statics produced much faster code than stack based variables but I don't think that's true anymore.

z80 is just SO ANNOYING to program. It shouldn't be worse than 6502, but it pretty much is, because it's just so inconsistent.

On z80 some things you can only do with 8 bit registers. Some things you can only do with register pairs. It's super fast to push any register pair onto the stack (11 cycles) or pop it (10 cycles) (+4 for IX/XY in both cases, as usually, because of the extra byte of opcode). Spilling a register pair to static memory takes significantly longer, at 16 cycles for each of load&store for HL and 20 cycles each for other pairs. In 8 bit registers only A can be loaded/stored to a static location, and that takes 13 cycles -- plus 4 more to get it to/from where you really want it. For sequential accesses you can load/store A using (HL), (BC), or (DE) in 7 cycles, then increment/decrement the pointer in 6 cycles -- so no advantage over a static location. The same with load/store of B,C,D,E,H,L using (HL) only. The indirect load/store and inc/dec is only 2 bytes of code vs 3 for load/store to a static location (A only, remember), so there is that. But in general you should push/pop pairs whenever possible, and load/store pairs to static locations when not.

Access to something in the middle of the stack is just awful!! First of all, it's definitely 1 byte at a time. But there is no (SP+offset) addressing. There is (IX+$nn) and (IY+$nn) load/store addressing for all of A,B,C,D,E,H,L, but they're 19 cycles per byte! And you need to somehow get SP into IX or IY first. You can move HL,IX,IY *into* SP, but not the reverse. You can add SP to IX or IY in 15 cycles (11 for HL) but that means you need to zero them or get some other constant offset into them first. You can do LD IX,$nnnn in 14 cycles (or IY, or 10 for HL,BC,DE). You can do "XOR A;LD IXL,A;LD IXH,A" in 4+8+8=20, so that's a non-starter.

So to load BC with bytes from offsets 10 and 11 from SP you have a choice of "LD HL,$000A;ADD HL,SP;LD B,(HL);INC HL;LD C,(HL)" for 7 bytes and 41 cycles or "LD IX,$0000;ADD IX,SP;LD B,(IX,$0A);LD C,(IX,$0B)" for 12 bytes and 67 cycles.

On 6502 you can do the equivalent thing, transferring two bytes from offsets 10 and 11 in the (256 byte) hardware stack to two Zero Page locations (let's say 6&7) using "TSX;LDA $010A,X;STA $06;LDA $010B,X;STA $07" which is 11 bytes and 16 clock cycles -- and I didn't have to think at all about what is the best way to do it ... it's basically the obvious, only way.

If you're not using the very limited hardware stack, but making your own using a pair of Zero Page locations (let's say 8&9) then you'd have "LDY #$0A;LDA ($08),Y;STA $06;INY;LDA ($08),Y;STA $07" for (again) 11 bytes but this time 20 clock cycles (21 or 22 in the somewhat unlucky event one or both LDAs cross a page boundary with the indexing)

The z80 code has the advantage it can do up to 64k offset into the stack while the 6502 code only does up to a 255 byte offset. That would seldom be a factor.

The 6502 code has the advantage that you effectively have 256 8-bit registers, or 128 16-bit/pointer registers vs the z80's 11 8-bit registers or 5 16-bit/pointer registers.

Another example: add two 8 bit quantities and put the result in a 3rd:

6502: "CLC;LDA $05;ADC $06;STA $07" 7 bytes and 11 cycles

z80 #1: "LD A,B;ADD A,C;LD D,A" 3 bytes and 12 cycles. Very good!

z80 #2: "LD A,($0005);LD B,A;LD A,($0006);ADD A,B;LD ($0007),A" 11 bytes and 47 clock cycles. Ugh!

The z80 can have really fast and compact code if you manage to keep everything in its very limited register set. But if you run out and start having to load and store things to RAM then it gets pretty awful pretty quickly.

Siwastaja · « **Reply #78 on:** May 02, 2022, 01:47:15 pm »

Quote from: peter-h on May 02, 2022, 06:42:37 am

I don't get offended if someone just tells me I am wrong But I am not clever enough / have enough time, to read 1000 page compiler standards. Especially when the behaviour is so system dependent.

What the heck is "1000 page compiler standard"? And no, the behavior is not system dependent.

See, it's not that difficult:
https://www.google.com/search?q=wut+hapens+in+C+array+initialize+first+thing+only

First result:
https://stackoverflow.com/questions/42218928/why-does-this-initialize-the-first-element-only

The top accepted answer:

Quote

rest of array is initialized by default value, and for number types this default value is 0.

Did this "waste" a lot of your precious time?

peter-h · « **Reply #79 on:** May 02, 2022, 04:10:24 pm »

Half the stuff on stack overflow is dead ends, or code which doesn't actually work

Quote

z80 is just SO ANNOYING to program

Sure, but it was out in 1976. That's almost "half a century" ago

Lots of people did amazing things with it. Even I did some good stuff with it

I wrote literally megabytes (of binary) in Z80 asm.

The Z280 mostly solved the issues you mention by extending the register set and having a cache so most instructions took just 3 clocks. Zilog told me I was the first significant Z280 design-in in Europe. The stack relative addressing was done with

Code: [Select]

ld hl, stack_offset
add hl, sp
ld a, (hl) or for 16 bits
ld e, (hl)
inc hl
ld d, (hl)
etc

I find the 32F417 runs probably 100x faster than a Z80, but probably 99% of the time it isn't needed. The thing which created really big problems with the Z80 and others wasn't raw speed; it was the 64k addressing limit. The Z180/64180 largely solved that for code but not for data. Same with the Z280. That 64k limit crippled many products because Ethernet was basically impossible; the current 32F4 project is my first ever with ETH, and even that was possible only because it uses the ST libs (which somebody else implemented); it is too complex for me to understand in enough detail. Life has just become so much more complex

SiliconWizard · « **Reply #80 on:** May 02, 2022, 06:32:53 pm »

Oh well. The Z80 was not that bad. Many people - in particular those closer to compiler design, which I think Bruce is - much prefer "orthogonal" instruction sets, because that's fewer exceptions (so easier to learn and remember) and they are easier to deal with when writing compilers. But yeah, it was still usable. Oh and it was just basically a 8080 with extensions and some improvements. Blame Intel for the instruction set.

SiliconWizard · « **Reply #81 on:** May 02, 2022, 06:45:38 pm »

Quote from: peter-h on May 02, 2022, 08:30:09 am

That explains it then... v10 did this. I see no COMMON generated now.

I think GCC now defaults to no-common, which would explain it. That was already pointed out in other threads (but not about data sections specifically.)

Note that if you actually want some global variables never initialized or zero'ed upon startup, you can always put them explicitely in a dedicated, custom section when you declare them. That would be the safest way of doing it, instead of twisting linker scripts to make behavior not compliant with the standard.

brucehoult · « **Reply #82 on:** May 02, 2022, 10:05:08 pm »

Quote from: peter-h on May 02, 2022, 04:10:24 pm

Quote
z80 is just SO ANNOYING to program

Sure, but it was out in 1976. That's almost "half a century" ago Lots of people did amazing things with it.

Yup and 6502 was out in 1975, at 1/8th the price :-) ($25 vs $200) Zilog did drop the price over time and by 1980 when the ZX80 came out there wasn't much difference.

But by 1980 there was also the 8086, 68000, and 6809 to deal with... (at high prices at that point)

Quote

Code: [Select]
ld hl, stack_offset add hl, sp ld a, (hl) or for 16 bits ld e, (hl) inc hl ld d, (hl) etc

Yup, this sequence was established as the best one somewhere in the middle of my 2 AM post :-)

brucehoult · « **Reply #83 on:** May 02, 2022, 10:23:57 pm »

Quote from: SiliconWizard on May 02, 2022, 06:32:53 pm

Oh well. The Z80 was not that bad. Many people - in particular those closer to compiler design, which I think Bruce is - much prefer "orthogonal" instruction sets, because that's fewer exceptions (so easier to learn and remember) and they are easier to deal with when writing compilers.

Compilers can be taught the quirks, and modern computers are fast enough that a compiler can afford to generate a few variations on what variable goes in what register/memory and what instructions are selected and evaluate the size/speed of each. And then no one even has to think about it again. Which is basically what happened with the -- equally annoying, especially in early versions -- x86 family.

Assembly language programmers have to consider the quirks every second of the day. Either that or (more realistically) adopt fixed idioms for common things, even if in any given situation they are probably leaving a lot of size/performance on the table compared to the best compiler generated code.

And compilers now are of course far better than they were in the 70s and early 80s, if you even had access to one then. The code produced by, say, Turbo Pascal 1.0 is pretty awful.

peter-h · « **Reply #84 on:** May 03, 2022, 07:01:23 am »

The old compilers were not very clever. We used IAR; cost over GBP 1000 in 1985. Some of the code was appalling, especially runtimes which were written in C (most compilers were written in C) and with no attempt to optimise. For example sscanf can be dramatically optimised, but IAR didn't bother so it might take 100ms to read in a single precision float.

But, to be fair, most software in a given product does not have to run fast. But it all takes roughly the same time to write. So writing in say C saves a great deal of time.

ISRs and such were always coded in asm, when I was doing that stuff.

I still sell a Z180 based box and on that we used a Hitech C compiler; the famous Clyde Smith-Stubbs in Australia. It was pretty good and did various optimisations. That company sold out to Microchip many years ago and I believe the product line has now been killed off by Microchip.

I had Z80 compilers in 1980 for Fortran, Ada, Cobol, Pascal, Coral etc. The Ada or Coral ones were reportedly used on military contracts.

brucehoult · « **Reply #85 on:** May 03, 2022, 08:52:17 am »

Quote from: peter-h on May 03, 2022, 07:01:23 am

The old compilers were not very clever. We used IAR; cost over GBP 1000 in 1985. Some of the code was appalling, especially runtimes which were written in C (most compilers were written in C) and with no attempt to optimise. For example sscanf can be dramatically optimised, but IAR didn't bother so it might take 100ms to read in a single precision float.

But, to be fair, most software in a given product does not have to run fast. But it all takes roughly the same time to write. So writing in say C saves a great deal of time.

Most of the code doesn't have to run fast, but on a machine limited to 64 KB all of the code should be as compact as possible -- which is the reason it was popular at the time to write most of programs in some kind of interpreted byte-code (whether UCSD Pascal or something like Woz's "Sweet 16") or threaded code. Then only the speed-critical code needed to be written in assembly language. The size of the interpreter needed to be carefully considered if overall gains were to be made. That's where Sweet 16 was pretty good, with the interpreter taking only about 300 bytes and the code running about 10x slower than native.

I had a look today at z80 code from the current version of the SDCC compiler.

Code: [Select]

unsigned fib(unsigned n){
  if (n < 2) return n;
  return fib(n-1) + fib(n-2);
}

This produced:

Code: [Select]

      000000                         47 _fib::
      000000 EB               [ 4]   48         ex      de, hl
                                     49 ;fib.c:2: if (n < 2) return n;
      000001 7B               [ 4]   50         ld      a, e
      000002 D6 02            [ 7]   51         sub     a, #0x02
      000004 7A               [ 4]   52         ld      a, d
      000005 DE 00            [ 7]   53         sbc     a, #0x00
      000007 D8               [11]   54         ret     C
                                     55 ;fib.c:3: return fib(n-1) + fib(n-2);
      000008 6B               [ 4]   56         ld      l, e
      000009 62               [ 4]   57         ld      h, d
      00000A 2B               [ 6]   58         dec     hl
      00000B D5               [11]   59         push    de
      00000C CDr00r00         [17]   60         call    _fib
      00000F EB               [ 4]   61         ex      de, hl
      000010 D1               [10]   62         pop     de
      000011 1B               [ 6]   63         dec     de
      000012 1B               [ 6]   64         dec     de
      000013 EB               [ 4]   65         ex      de, hl
      000014 D5               [11]   66         push    de
      000015 CDr00r00         [17]   67         call    _fib
      000018 E1               [10]   68         pop     hl
      000019 19               [11]   69         add     hl, de
      00001A EB               [ 4]   70         ex      de, hl
                                     71 ;fib.c:4: }
      00001B C9               [10]   72         ret

That's .... not awful. I can do better, but it's not awful. It's nice of the compiler to list the execution time of each instruction in brackets.

Note that the ABI is a 16 bit argument is passed in HL and 16 bit result returned in DE.

The part between the two recursive calls can obviously be improved by dropping the two "EX ED,HL":

Code: [Select]

pop hl
dec hl
dec hl
push de

A little more can be saved by pushing the once-decremented version of argument rather than the original argument, thus saving a DEC later on. But that's honestly about it.

Code: [Select]

_fib::
        ex      de, hl
;fib.c:2: if (n < 2) return n;
        ld      a, e
        sub     a, #0x02
        ld      a, d
        sbc     a, #0x00
        ret     C
;fib.c:3: return fib(n-1) + fib(n-2);
        ex      de, hl
        dec     hl
        push    hl
        call    _fib
        pop     hl
        dec     hl
        push    de
        call    _fib
        pop     hl
        add     hl, de
        ex      de, hl
;fib.c:4: }
        ret

I can't see anything else to improve, given the ABI. If argument and return value were both in HL then all three "EX DE,HL" could be dropped and "EX (SP),HL" used between the two recursive calls...

Code: [Select]

_fib::
        ld      a, l
        sub     a, #0x02
        ld      a, h
        sbc     a, #0x00
        ret     C
        dec     hl
        push    hl
        call    _fib
        ex     (sp), hl
        dec     hl
        call    _fib
        pop     de
        add     hl, de
        ret

The z80 completely kills the 6502 on code size on 16 bit code like in this function. Still a lot more clock cycles though. I haven't worked out exactly how many -- not enough to make up for the clock speed ratio I think, so z80 wins here.

The z80's 20 bytes of code here (31 in the compile-generated version) also completely kills x86_64, ARMv7, and RISC-V, all of which are around 50 bytes of code (±2 or so!) on this, at least out of gcc.

peter-h · « **Reply #86 on:** May 03, 2022, 09:43:31 am »

Code size was however not a big issue on the Z80/Z180/Z280 because the IAR compiler supported a "large model" and you just paged-in any number of pages - I think up to 1MB total code size. The limit was that no function could be bigger than the page size. Typically the page size was 4k which left you 60k for the "base" code+RAM, so you could have 4k base code, 56k RAM, and 4k page size (each page banked-in when a function was called) giving you 56k RAM and 1MB code.

Of course 56k RAM is not enough for many modern apps e.g. ETH, USB, etc. I think the "embedded world" polarises between "16/32k RAM is plenty" and "56k RAM is nowhere near enough"

Also RAM used solely by ISRs could be banked-in/out by the ISR, so you could have loads of 16/32k buffers in RAM. Just not a contiguous area.

SiliconWizard · « **Reply #87 on:** May 03, 2022, 05:40:40 pm »

Quote from: brucehoult on May 02, 2022, 10:23:57 pm

And compilers now are of course far better than they were in the 70s and early 80s, if you even had access to one then. The code produced by, say, Turbo Pascal 1.0 is pretty awful.

Dunno about TP 1.0, but I used TP 3.0 on CP/M back then and it was fine. Sure the compiler was very simple and generated meh code, but it was perfectly usable.

brucehoult · « **Reply #88 on:** May 03, 2022, 10:06:17 pm »

Quote from: SiliconWizard on May 03, 2022, 05:40:40 pm

Quote from: brucehoult on May 02, 2022, 10:23:57 pm
And compilers now are of course far better than they were in the 70s and early 80s, if you even had access to one then. The code produced by, say, Turbo Pascal 1.0 is pretty awful.

Dunno about TP 1.0, but I used TP 3.0 on CP/M back then and it was fine. Sure the compiler was very simple and generated meh code, but it was perfectly usable.

I didn't say it wasn't usable, I used it a lot. I said the code produced was a lot worse than a human could do, if they had time. Slow and big. But of course massively faster than any interpreted language. A truly great product in its time. And cheap.

SiliconWizard · « **Reply #89 on:** May 04, 2022, 02:23:55 am »

Quote from: brucehoult on May 03, 2022, 10:06:17 pm

Quote from: SiliconWizard on May 03, 2022, 05:40:40 pm
Quote from: brucehoult on May 02, 2022, 10:23:57 pm
And compilers now are of course far better than they were in the 70s and early 80s, if you even had access to one then. The code produced by, say, Turbo Pascal 1.0 is pretty awful.

Dunno about TP 1.0, but I used TP 3.0 on CP/M back then and it was fine. Sure the compiler was very simple and generated meh code, but it was perfectly usable.

I didn't say it wasn't usable, I used it a lot. I said the code produced was a lot worse than a human could do, if they had time.

The compiler was very simple. It all fitted with the editor within some 30KB or so. =) And yes, I did write a significant amount of Z80 assembly back then for when the compiler would just not cut it.

To be fair, the code produced by most compilers until about the 2000's was significantly worse than what could be done by a human directly in assembly, generally speaking. I've written some code in assembly still in the early 2000's on x86, for speedups of 2, 3, 4 times, with still very simple hand-written assembly, nothing really fancy. In 2022, I would have a hard time seriously beating a C compiler doing that, or it would take a lot of time and effort. Optimizing compilers have become very good in the last 20 years.

peter-h · « **Reply #90 on:** June 20, 2022, 08:57:12 am »

Another Q on compiler "optimisation":

What are the rules for removing code, after a construct like

while (true)
{
some code
}

which will obviously never execute.

Some of it can be quite subtle, and removal of one thing can lead to removal of everything it calls, and so on. The compiler must build a tree of all related code and work up that tree and if it finds a branch gone it then goes back down and removes all the others that are affected.

But it doesn't always seem to happen. A colleague is working on the same Cube IDE (32F417) project but on a linux machine (I use win7-64) and he's just had half his code go missing, just by commenting out one FreeRTOS task

Presumably we have different compiler options somewhere...

brucehoult · « **Reply #91 on:** June 20, 2022, 10:19:28 am »

Certainly the compiler can (and should!) remove unreachable code. And variables that are used only by unreachable code. And variables that are read but never set, or set but never read.

If a compiler removes code that surprises you, and the compiler is a current version of gcc or llvm, then there is a 99.999% chance that it is you that doesn't understand your program, not the compiler.

SiliconWizard · « **Reply #92 on:** June 20, 2022, 07:01:23 pm »

Quote from: peter-h on June 20, 2022, 08:57:12 am

Another Q on compiler "optimisation":

What are the rules for removing code, after a construct like

while (true)
{
some code
}

which will obviously never execute.

Are you sure?

Quote from: peter-h on June 20, 2022, 08:57:12 am

Some of it can be quite subtle, and removal of one thing can lead to removal of everything it calls, and so on. The compiler must build a tree of all related code and work up that tree and if it finds a branch gone it then goes back down and removes all the others that are affected.

Any code that is statically analyzed as unreachable during compilation will just not yield emitted code from the compiler. That usually happens even at the first level of optimization.

Now for any function call that would be unreachable, the function call itself will not be emitted, but the code of the function itself may still remain, even if it's never called anywhere, as long as said function has external linkage (in other words, if it's not a static function - static-qualified functions that are not called in their compilation unit will get pruned, but compilers usually give you a warning about those anyway.)

As we already talked about, the code of functions that have external linkage and that are never called anywhere will get removed, not by the compiler, but by the linker, and *only* if you have set the corresponding options (which consist of instructing the compiler to put each function in a separate section, and instructing the linker to prune unused sections.) Otherwise, it'll remain in the final object code as dead beef.

OTOH, code removed by a compiler while it should NOT get removed (meaning it is called or there is an execution path that should execute it) should never happen. If it does, this is a compiler bug, and then just open a ticket.

peter-h · « **Reply #93 on:** June 20, 2022, 07:19:41 pm »

OK; the key word is "statically", which is what got me into trouble last time, creating a function located by the linker at a given address and then creating a jump to that address, but the compiler obviously didn't realise what was going on, and on a static analysis the function was not called by anything. So I had to do some hacks to stop that function being removed. In the end all it took was a
dw function-name
statement in an assembler file, to prevent the removal. Doing it from C was difficult because the referring C code was also not called by anything so the whole lot was still getting removed

But assembler code is not removed.

How does one implement function tables (not sure of the right word) where you use an index to jump to one of a list of functions? I have never done this in C and normally use a case statement, but if you had lots of cases then a table makes sense. I used to do this extensively in assembler. Obvously the index needs to be range checked

SiliconWizard · « **Reply #94 on:** June 20, 2022, 07:47:12 pm »

There are a number of ways of doing that.
If you just want to access functions through an index, declare an array of function pointers. Of course, that means that all functions you want to access this way should have the same prototype. Otherwise, I don't see the point or the feasability.

Then it can be something like: (assuming your functions have the prototype defined below, adapt to your use case:)

Code: [Select]

typedef int (*myFunctions_t)(int n, char *s);

int foo1(int n, char *) { (...) }
int foo2(int n, char *) { (...) }
(...)

myFunctions_t myFunctionTable[] = {
    foo1,
    foo2,
   (...)
};

And calling one of those:

Code: [Select]

int ret = myFunctionTable[index](n, s);

Nominal Animal · « **Reply #95 on:** June 20, 2022, 08:23:22 pm »

I occasionally use an array of structures containing the name, optionally a type identifier, and a function pointer:

Code: [Select]

enum {
    FUNCTYPE_NONE = 0,
    FUNCTYPE_FLOAT_UNARY,  /* Returns a float, takes one float as an argument */
    FUNCTYPE_FLOAT_BINARY, /* Returns a float, takes two floats as arguments */
};

typedef struct {
    const char  *name;
    int    type;
    union {
        float (*float_unary)(float);          /* .type = FUNCTYPE_FLOAT_UNARY */
        float (*float_binary)(float, float);  /* .type = FUNCTYPE_FLOAT_BINARY */
    };
} function_descriptor;

const function_descriptor  func[] =
{
    { .name = "sin", .type = FUNCTYPE_FLOAT_UNARY, .float_unary = sinf },
    /* Other functions omitted */
    { .name = NULL, .type = FUNCTYPE_NONE }
};
#define  funcs  ((sizeof func / sizeof func[0]) - 1)

Valid indexes are 0 through funcs-1, inclusive; or you can loop until .name==NULL or .type==FUNCTYPE_NONE.

This is useful when one needs to execute a function when given its name as a string:

Code: [Select]

int  exec_float_unary(const char *name, float *result, float arg);
int  exec_float_binary(const char *name, float *result, float leftarg, float rightarg);

where the return value is 0 if successful and nonzero for error codes, result points to where the result is stored, and arg, leftarg, and rightarg are arguments passed to the function.

(It is pretty obvious that the most recent case I used this at was a calculator/expression evaluator...)

In case of an extensible calculator/expression evaluator -type thingy on a fully featured OS, I like to use

Code: [Select]

static size_t               funcs_max = 0;
static size_t               funcs = 0;
static function_descriptor *func = NULL;

int register_float_unary(const char *name, float (*func)(float));
int register_float_binary(const char *name, float (*func)(float, float));

so that plugins can provide new functions ELF-magically using

Code: [Select]

__attribute__ ((__constructor__))
static void register_functions(void)
{
    register_float_unary("sin", sinf);
    register_float_unary("cos", cosf);
    register_float_unary("tan", tanf);
    register_float_binary("atan2", atan2f);
}

where the constructor function attribute causes the linker to add the address of that function into an ELF section, so that either the startup code (if static) or dynamic linker (if dynamic) will execute it when the ELF object is loaded.

peter-h · « **Reply #96 on:** August 21, 2022, 03:52:02 pm »

How about this one. Just wasted a bit of time on it.

struct fred joe = {0};

converts into a memset(&joe,0,sizeof(joe)); or some such.

This happens even if stdlib or whatever is not #included.

Nominal Animal · « **Reply #97 on:** August 21, 2022, 04:20:13 pm »

Quote from: peter-h on August 21, 2022, 03:52:02 pm

How about this one. Just wasted a bit of time on it.

struct fred joe = {0};

converts into a memset(&joe,0,sizeof(joe)); or some such.

This happens even if stdlib or whatever is not #included.

Yep, I've mentioned this in other threads, when discussing freestanding environments.

It is documented by GCC:

Quote

Most of the compiler support routines used by GCC are present in libgcc, but there are a few exceptions. GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp. Finally, if __builtin_trap is used, and the target does not implement the trap pattern, then GCC emits a call to abort.

I added the links, to the Linux man-pages project, but don't let the Linux in the name distract you. I use them as a C reference because they're actively maintained by Michael Kerrisk, and do describe (in the Conforming to sections) which standards define them, plus the possible Notes and Bugs sections often include useful information not mentioned elsewhere.

Compiler support routines include things like the __udivdi3 (on 32-bit systems for 64-bit integer division).

peter-h · « **Reply #98 on:** August 21, 2022, 10:08:54 pm »

This is quite a gotcha for embedded work.

I know it was pointed out here in the past but it didn't quite sink in that this is hard to avoid. I think zero optimisation (-O0) might work though. I had a look through some of my code an I used that as an attrib on a function which was copying data with a loop, and the compiler discovered that the loop could be done with memcpy, at least partially.

It might mean compiling the whole of a project, or a module, with optimisation set to zero.

brucehoult · « **Reply #99 on:** August 21, 2022, 10:28:19 pm »

Quote from: peter-h on August 21, 2022, 10:08:54 pm

This is quite a gotcha for embedded work.

I know it was pointed out here in the past but it didn't quite sink in that this is hard to avoid. I think zero optimisation (-O0) might work though. I had a look through some of my code an I used that as an attrib on a function which was copying data with a loop, and the compiler discovered that the loop could be done with memcpy, at least partially.

It might mean compiling the whole of a project, or a module, with optimisation set to zero.

What on earth are you trying to achieve here?

You're willing to kill the size and speed of your entire program in order to avoid have a 10 or 20 byte memset() function in it?


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: GCC ARM32 compiler too clever, or not clever enough? (Read 13808 times)

Share me