Products > Programming

CPU instruction utilization with gcc

(1/6) > >>

Scratch.HTF:
In the "Introduction to RISC" article in the 1988 Cypress CMOS Data Book, it mentioned that the Sun C compiler uses only about 30% of available Motorola 68020 instructions and that 80% of the computations for a typical program only requires about 20% of available instructions, which sparked my curiosity about what percentage of and the unused AVR (and other supported architecture) instructions used by the gcc compiler which is used by the official Arduino IDE.

When I contacted someone behind the gcc compiler, their reply was:

* GCC does not generate supervisor instructions as used by certain CPU types such as the Motorola 68000 series.
* Very simple RISC architectures have a high percentage of user mode instruction utilization while CISC architectures require a specific pattern for a special instruction and CISC architectures have the most unused instructions.
* A large survey would be required and available instructions per CPU architecture will vary
I've read that one of the (now) rarely used instructions is for BCD operations which AVR is one of the architectures where it is not implemented (for an infrared carrier generator I am building, it uses BCD coded IR codes to set frequency for ease of use with LIRC which (if I am correct) only accepts hexadecimal function codes).

ataradov:
There is no construct for BCD in C, so how would compiler use that instruction? You can still make an assembly section and use the instruction manually.

Compilers also don't support things like enabling/disabling interrupts. All that stuff is supported by the intrinsic functions, that are defined though inline assembly.

EDIT: Also, I don't think there are special BCD instructions in AVR. There is a half-carry flag in the status register, which can be used to accelerate BCD math. But there is nothing compiler can do with that flag.

T3sl4co1l:
Regarding BCD specifically, I don't know what conditions trigger the compiler to use features when available.  That would have to be one of those "specific patterns" mentioned.  (Anyway, for AVR in particular, you can do division by a constant very easily, by multiplying by a shifted constant.)

Seems to me, avr-gcc doesn't generate postincrement instructions very often, though I still need to try more access patterns and semantics on a recent case, to see if something's holding that up...

Compiler interaction for some internals may be special-cased or supported by libraries; for example util/atomic.h for AVR creates macros I believe, which simply resolve to cli/sei, or buffering SFR_REG.  Concievably, some compilers might implement that kind of functionality at a higher or lower level, that there isn't a clear threshold as to where a thing should be implemented.  (That said, the clearest motivation I can think of, would be: implement everything in libraries that can be.  Special instructions like cli/sei aren't subject to optimization, so hard-coded asm is perfectly adequate.  Whereas things like pointer arithmetic, and memory access, will be subject to redundancy, differencing, interleaving and such, and so the compiler will need to be aware of them.)

Regarding frequency, it's quite natural that some instructions will be used more than others.  Almost everything you're doing, is either moving around data (MOV, LD, ST..), checking data, doing basic arithmetic (ADD/SUB, CMP, TST, conditional bit-expansion or manipulation (set/clear, sign extend, shift..), and doing basic state machine stuff (conditional jumps, loops, calls..).  The few (well, 5 to 20 say) percent left includes everything else -- more in-depth math (MUL and DIV, floating point, SIMD..), fancier bit operations (move, copy, shuffle..), IO (sometimes memory mapped, sometimes special instructions -- which for the AVR, even though it doesn't have a separate IO space like Z80 or x86 does, it does have IN/OUT instructions for quick access to low addresses), API calls (INT?) and OS/kernel functions (privileged, when applicable).

Incidentally, GCC definitely doesn't implement AVR's FMUL instructions at all, providing them as builtins only.  Which you might be better off writing out with MUL and shifts anyway, as it doesn't seem to perform any optimization around those instructions (again, based off very limited experience at present..).  So even on the humble AVR, we have a few examples of that situation.

Tim

brucehoult:

--- Quote from: ataradov on April 21, 2020, 01:44:28 am ---There is no construct for BCD in C, so how would compiler use that instruction? You can still make an assembly section and use the instruction manually.

Compilers also don't support things like enabling/disabling interrupts. All that stuff is supported by the intrinsic functions, that are defined though inline assembly.

EDIT: Also, I don't think there are special BCD instructions in AVR. There is a half-carry flag in the status register, which can be used to accelerate BCD math. But there is nothing compiler can do with that flag.

--- End quote ---

It makes sense to have hardware half-carry if it can feed into a dedicated and fast DAA instruction, but very weird to have it without that! Half carry is I think very easy to synthesize (A ^ B ^ (A+B)) & 0x10 if I haven't screwed up, but actually using it would seem to require a complete matrix of all four cases of whether carry and/or half-carry is set to decide whether to do nothing or add 0x06, 0x60, or 0x66 to get the correct result.

In fact it's worse than that because what you need is more like:


--- Code: ---carry_out = carry;
if (half_carry || (sum & 0x0F) > 0x09) sum += 0x06;
if (carry || sum > 0x9F){sum += 0x60; carry_out |= carry;}

--- End code ---

And do that without the first line disturbing the state of carry -- which probably means saving and restoring the condition codes.

It turns out that if you have bigger registers you can do BCD adds quite efficiently without any hardware support at all. For example for a 64 bit machine working with 16 decimal digits in BCD (and assuming that's enough so you don't need carry-in or carry-out):


--- Code: ---reg BCDadd(reg a, reg b){
  reg sum = a + b;
  reg sum_c = sum + 0x6666666666666666;
  reg carries = ((a ^ b ^ sum_c) >> 4) & 0x1111111111111111; // internal carries
  carries |= (reg)(sum_c < a) << 60; // carry from MSB
  return sum + carries * 6;
}

--- End code ---

AVR has 16 bit adds, so actually you could use this technique for 4 digit BCD on it. Or bigger using multi-precision adds.

David Hess:

--- Quote from: ataradov on April 21, 2020, 01:44:28 am ---There is no construct for BCD in C, so how would compiler use that instruction? You can still make an assembly section and use the instruction manually.
--- End quote ---

C compilers dedicated to specific processors may support things like BCD directly.  Many years ago TI's compiler for their fixed point DSP processors implemented fixed point radix tracking as part of a native fixed point data type.  These are also the same compilers which might transparently produce a 64 bit result from a 32 bit multiply without casting gymnastics on the part of the user.


--- Quote from: brucehoult on April 21, 2020, 12:10:15 pm ---It makes sense to have hardware half-carry if it can feed into a dedicated and fast DAA instruction, but very weird to have it without that! Half carry is I think very easy to synthesize (A ^ B ^ (A+B)) & 0x10 if I haven't screwed up, but actually using it would seem to require a complete matrix of all four cases of whether carry and/or half-carry is set to decide whether to do nothing or add 0x06, 0x60, or 0x66 to get the correct result.

...

It turns out that if you have bigger registers you can do BCD adds quite efficiently without any hardware support at all. For example for a 64 bit machine working with 16 decimal digits in BCD (and assuming that's enough so you don't need carry-in or carry-out):
--- End quote ---

I had this discussion over on the RWT  forums a couple years ago.  It makes sense to me to preserve all stateful flags like carry and half carry (expand the register width to store them instead of using a single flags register) but RISC processors generally do not and instead rely on being able to execute a series of basic integer instructions to implement them or their results like BCD arithmetic in software.

Even if this did not make sense from a performance perspective, and I do not think it does, then if the compilers do not take advantage of it, it is irrelevant so it will not be implemented.

In case it is not clear from the above, I think not reflecting how real hardware works is a design flaw in C, and which JAVA took to an extreme.

Navigation

[0] Message Index

[#] Next page

There was an error while thanking
Thanking...
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod