Author Topic: CPU instruction utilization with gcc  (Read 792 times)

0 Members and 1 Guest are viewing this topic.

Offline Scratch.HTF

  • Regular Contributor
  • *
  • Posts: 77
  • Country: au
CPU instruction utilization with gcc
« on: April 21, 2020, 01:15:17 am »
In the "Introduction to RISC" article in the 1988 Cypress CMOS Data Book, it mentioned that the Sun C compiler uses only about 30% of available Motorola 68020 instructions and that 80% of the computations for a typical program only requires about 20% of available instructions, which sparked my curiosity about what percentage of and the unused AVR (and other supported architecture) instructions used by the gcc compiler which is used by the official Arduino IDE.

When I contacted someone behind the gcc compiler, their reply was:
  • GCC does not generate supervisor instructions as used by certain CPU types such as the Motorola 68000 series.
  • Very simple RISC architectures have a high percentage of user mode instruction utilization while CISC architectures require a specific pattern for a special instruction and CISC architectures have the most unused instructions.
  • A large survey would be required and available instructions per CPU architecture will vary

I've read that one of the (now) rarely used instructions is for BCD operations which AVR is one of the architectures where it is not implemented (for an infrared carrier generator I am building, it uses BCD coded IR codes to set frequency for ease of use with LIRC which (if I am correct) only accepts hexadecimal function codes).
If it runs on Linux, there is some hackability in it.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 6489
  • Country: us
    • Personal site
Re: CPU instruction utilization with gcc
« Reply #1 on: April 21, 2020, 01:44:28 am »
There is no construct for BCD in C, so how would compiler use that instruction? You can still make an assembly section and use the instruction manually.

Compilers also don't support things like enabling/disabling interrupts. All that stuff is supported by the intrinsic functions, that are defined though inline assembly.

EDIT: Also, I don't think there are special BCD instructions in AVR. There is a half-carry flag in the status register, which can be used to accelerate BCD math. But there is nothing compiler can do with that flag.
« Last Edit: April 21, 2020, 01:48:56 am by ataradov »
Alex
 

Online T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 15213
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: CPU instruction utilization with gcc
« Reply #2 on: April 21, 2020, 06:23:05 am »
Regarding BCD specifically, I don't know what conditions trigger the compiler to use features when available.  That would have to be one of those "specific patterns" mentioned.  (Anyway, for AVR in particular, you can do division by a constant very easily, by multiplying by a shifted constant.)

Seems to me, avr-gcc doesn't generate postincrement instructions very often, though I still need to try more access patterns and semantics on a recent case, to see if something's holding that up...

Compiler interaction for some internals may be special-cased or supported by libraries; for example util/atomic.h for AVR creates macros I believe, which simply resolve to cli/sei, or buffering SFR_REG.  Concievably, some compilers might implement that kind of functionality at a higher or lower level, that there isn't a clear threshold as to where a thing should be implemented.  (That said, the clearest motivation I can think of, would be: implement everything in libraries that can be.  Special instructions like cli/sei aren't subject to optimization, so hard-coded asm is perfectly adequate.  Whereas things like pointer arithmetic, and memory access, will be subject to redundancy, differencing, interleaving and such, and so the compiler will need to be aware of them.)

Regarding frequency, it's quite natural that some instructions will be used more than others.  Almost everything you're doing, is either moving around data (MOV, LD, ST..), checking data, doing basic arithmetic (ADD/SUB, CMP, TST, conditional bit-expansion or manipulation (set/clear, sign extend, shift..), and doing basic state machine stuff (conditional jumps, loops, calls..).  The few (well, 5 to 20 say) percent left includes everything else -- more in-depth math (MUL and DIV, floating point, SIMD..), fancier bit operations (move, copy, shuffle..), IO (sometimes memory mapped, sometimes special instructions -- which for the AVR, even though it doesn't have a separate IO space like Z80 or x86 does, it does have IN/OUT instructions for quick access to low addresses), API calls (INT?) and OS/kernel functions (privileged, when applicable).

Incidentally, GCC definitely doesn't implement AVR's FMUL instructions at all, providing them as builtins only.  Which you might be better off writing out with MUL and shifts anyway, as it doesn't seem to perform any optimization around those instructions (again, based off very limited experience at present..).  So even on the humble AVR, we have a few examples of that situation.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1517
  • Country: us
  • Formerly SiFive, Samsung R&D
Re: CPU instruction utilization with gcc
« Reply #3 on: April 21, 2020, 12:10:15 pm »
There is no construct for BCD in C, so how would compiler use that instruction? You can still make an assembly section and use the instruction manually.

Compilers also don't support things like enabling/disabling interrupts. All that stuff is supported by the intrinsic functions, that are defined though inline assembly.

EDIT: Also, I don't think there are special BCD instructions in AVR. There is a half-carry flag in the status register, which can be used to accelerate BCD math. But there is nothing compiler can do with that flag.

It makes sense to have hardware half-carry if it can feed into a dedicated and fast DAA instruction, but very weird to have it without that! Half carry is I think very easy to synthesize (A ^ B ^ (A+B)) & 0x10 if I haven't screwed up, but actually using it would seem to require a complete matrix of all four cases of whether carry and/or half-carry is set to decide whether to do nothing or add 0x06, 0x60, or 0x66 to get the correct result.

In fact it's worse than that because what you need is more like:

Code: [Select]
carry_out = carry;
if (half_carry || (sum & 0x0F) > 0x09) sum += 0x06;
if (carry || sum > 0x9F){sum += 0x60; carry_out |= carry;}

And do that without the first line disturbing the state of carry -- which probably means saving and restoring the condition codes.

It turns out that if you have bigger registers you can do BCD adds quite efficiently without any hardware support at all. For example for a 64 bit machine working with 16 decimal digits in BCD (and assuming that's enough so you don't need carry-in or carry-out):

Code: [Select]
reg BCDadd(reg a, reg b){
  reg sum = a + b;
  reg sum_c = sum + 0x6666666666666666;
  reg carries = ((a ^ b ^ sum_c) >> 4) & 0x1111111111111111; // internal carries
  carries |= (reg)(sum_c < a) << 60; // carry from MSB
  return sum + carries * 6;
}

AVR has 16 bit adds, so actually you could use this technique for 4 digit BCD on it. Or bigger using multi-precision adds.
 
The following users thanked this post: rhodges

Online David Hess

  • Super Contributor
  • ***
  • Posts: 11178
  • Country: us
  • DavidH
Re: CPU instruction utilization with gcc
« Reply #4 on: April 21, 2020, 04:29:39 pm »
There is no construct for BCD in C, so how would compiler use that instruction? You can still make an assembly section and use the instruction manually.

C compilers dedicated to specific processors may support things like BCD directly.  Many years ago TI's compiler for their fixed point DSP processors implemented fixed point radix tracking as part of a native fixed point data type.  These are also the same compilers which might transparently produce a 64 bit result from a 32 bit multiply without casting gymnastics on the part of the user.

It makes sense to have hardware half-carry if it can feed into a dedicated and fast DAA instruction, but very weird to have it without that! Half carry is I think very easy to synthesize (A ^ B ^ (A+B)) & 0x10 if I haven't screwed up, but actually using it would seem to require a complete matrix of all four cases of whether carry and/or half-carry is set to decide whether to do nothing or add 0x06, 0x60, or 0x66 to get the correct result.

...

It turns out that if you have bigger registers you can do BCD adds quite efficiently without any hardware support at all. For example for a 64 bit machine working with 16 decimal digits in BCD (and assuming that's enough so you don't need carry-in or carry-out):

I had this discussion over on the RWT  forums a couple years ago.  It makes sense to me to preserve all stateful flags like carry and half carry (expand the register width to store them instead of using a single flags register) but RISC processors generally do not and instead rely on being able to execute a series of basic integer instructions to implement them or their results like BCD arithmetic in software.

Even if this did not make sense from a performance perspective, and I do not think it does, then if the compilers do not take advantage of it, it is irrelevant so it will not be implemented.

In case it is not clear from the above, I think not reflecting how real hardware works is a design flaw in C, and which JAVA took to an extreme.
« Last Edit: April 21, 2020, 04:32:13 pm by David Hess »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 5316
  • Country: fr
Re: CPU instruction utilization with gcc
« Reply #5 on: April 21, 2020, 04:35:22 pm »
Some code patterns can be recognized so as the compiler can use BCD instructions, but not necessarily trivial. I have currently no example in mind of a target for which GCC would do this, for instance.

But GCC may have some builtin functions for some targets. For instance, I think GCC has at least the cdtbcd() and cbcdtd() builtin functions for PowerPC targets.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 1703
  • Country: fi
    • My home page and email address
Re: CPU instruction utilization with gcc
« Reply #6 on: April 21, 2020, 05:52:58 pm »
Here are the generic built-ins provided by GCC; here are the AVR-specific built-ins; here are the named address space extensions for AVR (__flash, __flashN, and __memx);  here are the AVR-specific variable attributes; here are the <stdfix.h> definitions used to implement the "Embedded C" fixed-point stuff in ISO/IEC TR 18037; and here is how to use inline assembly with GCC (on any processor; the machine constraints are particularly useful).

For bignum stuff (larger than 64-bit, or say 24- or 48-bit on AVRs, or specific fixed point formats like Q47.7), one still kinda-sorta needs to write the low-level base operation functions in inline assembly, if performance and code size are an issue.  The same with vectorising math (using single-instruction-multiple-data extensions).  I've vectorized stuff on SSE and AVX, and often find myself using a different approach (mathematically more like reparametrization, or reordering integrals or differentials and/or data) when I realize I can utilize the data parallelism differently on a higher level, making the lower level calculations much more efficient, so I would not expect compilers to become much more better at this than they are right now.  (Just ask me about augmented Verlet neighbor lists, for example; it's a logical conundrum involving SIMD, cache behaviour, and avoiding calculation whenever possible... but fun!)
 

Online David Hess

  • Super Contributor
  • ***
  • Posts: 11178
  • Country: us
  • DavidH
Re: CPU instruction utilization with gcc
« Reply #7 on: April 23, 2020, 02:12:05 am »
Some code patterns can be recognized so as the compiler can use BCD instructions, but not necessarily trivial. I have currently no example in mind of a target for which GCC would do this, for instance.

That sort of thing must do wonders for reliability and proper operation.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 6489
  • Country: us
    • Personal site
Re: CPU instruction utilization with gcc
« Reply #8 on: April 23, 2020, 02:17:41 am »
That sort of thing must do wonders for reliability and proper operation.
That's why there are optimization flags. Want readability - disable optimization. Optimized code is hard to read sometimes anyway without clever use of instructions. And obviously compiler authors won't do it just for fun, there must be a performance reason.

Misread "reliability". How that would affect reliability? It is on the compiler to ensure that the final code does what C source says. Who cares how  it is achieved?
Alex
 
The following users thanked this post: SiliconWizard

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1517
  • Country: us
  • Formerly SiFive, Samsung R&D
Re: CPU instruction utilization with gcc
« Reply #9 on: April 23, 2020, 02:49:12 am »
That sort of thing must do wonders for reliability and proper operation.
That's why there are optimization flags. Want readability - disable optimization. Optimized code is hard to read sometimes anyway without clever use of instructions. And obviously compiler authors won't do it just for fun, there must be a performance reason.

Misread "reliability". How that would affect reliability? It is on the compiler to ensure that the final code does what C source says. Who cares how  it is achieved?

Exactly so.

I doubt there are any compilers that infer use of BCD hardware support from patterns in C code. For a start, there is very little support by the good compilers for 8 bit CPUs, and I don't know of any 32 bit or 64 bit ISAs that provide hardware support for full register BCD arithmetic. If they still just have it on 8 bit values then it's inferior to the arbitrary register-size code I posted above anyway.

Secondly, even if a CPU provided hardware support, that function I posted above is a very large and hairy pattern to recognize. It would be better to have users use an intrinsic.

There are some common CPU instructions that don't correspond directly to C operators but that are actually routinely generated by pattern matching chunks of C. One obvious example is rotate instructions, and another is instructions to get the top bits (only) of a signed or unsigned multiply without needing to do a full NxN -> 2N multiply and then extract the upper bits.
 

Online David Hess

  • Super Contributor
  • ***
  • Posts: 11178
  • Country: us
  • DavidH
Re: CPU instruction utilization with gcc
« Reply #10 on: April 23, 2020, 01:26:06 pm »
Misread "reliability". How that would affect reliability? It is on the compiler to ensure that the final code does what C source says. Who cares how  it is achieved?

If that was the case, then why does BCD arithmetic exist at all?  Using fixed point in place of BCD arithmetic can lead to incredibly obscure bugs from rounding inconsistency.
 

Online Kleinstein

  • Super Contributor
  • ***
  • Posts: 7502
  • Country: de
Re: CPU instruction utilization with gcc
« Reply #11 on: April 23, 2020, 03:13:00 pm »
AFAIK BCD was a thing of COBOL code and one of the reasons COBOL was / is still around even well after C++ started.
AFAIK not many CPUs do support extra BCD instructions. An old example is the 6502 that has them. Some code used them form decimal / binary conversions (don't remember the direction) so one may find such code also in optimized (e.g. with added manual ASM code) library code for the standard libs.

For the pure compiler generated code I would not expect BCD instructions to appear from C code. There may be an exception via peek hole optimization that could use some BCD functionality an some CPUs - but I kind of doubt it.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 5316
  • Country: fr
Re: CPU instruction utilization with gcc
« Reply #12 on: April 23, 2020, 05:31:10 pm »
As I said above, there is at least the PowerPC architecture which embeds BCD instructions, and which is not that old.

Of course they were especiially useful back in the days of very slow processors, but I guess they can still be useful for heavy manipulation of decimal numbers. Whether BCD instructions are justified for a given processor is the architects' decision.

And it's not just a matter of COBOL or of financial applications in general either. Decimal numbers are used in a large number of applications.

Not just the 6502 had them. AFAIR, almost all CPUs from the 70's and 80's had BCD instructions. Even the 68k had BCD instructions (which may be the reason why the PPC kept this heritage?)

But even these days, on small/slow MCUs with not hardware divide for instance, and on which you're going to use decimal numbers for any reason, BCD instructions can make a significant difference - sometimes in the order of a 100x speedup or something (let's say you need to use software divide/modulo if you don't have BCD instructions...)
 

Online David Hess

  • Super Contributor
  • ***
  • Posts: 11178
  • Country: us
  • DavidH
Re: CPU instruction utilization with gcc
« Reply #13 on: April 23, 2020, 06:53:56 pm »
BCD is useful wherever predictable rounding is required which makes it especially important in financial applications.  Fixed and floating point code can be constructed to give predictable rounding but BCD makes it easier or at least easier to understand.

There have been a variety of bugs over the past decades where fixed or floating point math was used inappropriately:

https://en.wikipedia.org/wiki/MIM-104_Patriot#Failure_at_Dhahran
https://slate.com/technology/2019/10/round-floor-software-errors-stock-market-battlefield.html
https://en.wikipedia.org/wiki/Binary-coded_decimal#Advantages

To be clear, I do not advocate hardware support for BCD in modern processors and in general hardware support has been deprecated where it existed.  IBM's PowerPC is one of the exceptions.  In the past, almost every processor supported BCD because so many hardware interfaces relied on it.

But I do think stateful flags like carry should be retained and that might extend to half-carry which is used to support BCD.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1517
  • Country: us
  • Formerly SiFive, Samsung R&D
Re: CPU instruction utilization with gcc
« Reply #14 on: April 24, 2020, 03:18:12 am »
But I do think stateful flags like carry should be retained and that might extend to half-carry which is used to support BCD.

I completely disagree.

There was perhaps some purpose on an 8 bit processor that *also* has a "DAA" instruction (6800, 8080, z80, 8086) to process the accumulator and flags immediately afterwards, but even then it's basically just as easy to have a dedicated BCD add instruction (or mode as in 6502).

Having a half-carry flag but *not* a DAA instruction, as on AVR, is pretty much useless as far as I can tell.

Once you go to a 16 bit CPU you would need *three* half carry flags, a 32 bit CPU will need *seven*. and a 64 bit CPU *fifteen* in addition to the normal carry out of the MSB. Either add a proper BCD add instruction (which you are free to do on RISC-V for example if you really want one) or use the sequence of a dozen standard instructions I showed in a previous post.

Otherwise you're down to processing one byte at a time (as in the 68000's ABCD instruction) which will be slower on 64 bit than doing them all in parallel using normal instructions, and not much faster for 32 bit.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 5316
  • Country: fr
Re: CPU instruction utilization with gcc
« Reply #15 on: April 24, 2020, 01:03:30 pm »
Agreed. To sum it up - BCD operations can of course be perfectly implemented with binary operations only. The only difference in the end is performance, and I agree that depending on the width on which specific BCD instructions can operate, they would not necessarily be a bonus performance-wise.

I guess the point of David Hess was not really about the IS having BCD instructions, but implementing calculations that would require being done in BCD for correctness, whereas there has been some projects in which they were just implemented using for instance FP, which is obviously completely wrong. The culprit of course was using FP. BCD can always be implemented "by hand" with pure binary operations. (But with reduced performance compared to FP of course...)

As to AVR, I don't know the IS well enough to tell. I guess a half-carry flag would be better than nothing to implement BCD operations slightly more efficiently, but yes, without some kind of DAA instruction, the benefit would be limited.
 

Online Yansi

  • Super Contributor
  • ***
  • Posts: 3277
  • Country: 00
  • STM32, STM8, AVR, 8051
Re: CPU instruction utilization with gcc
« Reply #16 on: April 24, 2020, 01:26:44 pm »
The only time I came across the need of BCD numbering, was a clever trick to convert large (16-32 or even 64bit) binary numbers to text on an 8051 microcontroller.  It was a shift-add algorithm, that converted binary to BCD and then BCD to text very easily and fast, without the need of any divide instructions.  It effectively was a Horner's scheme for evaluating large polynomial values, which can of course be used to convert between number bases.

Never ever seen any practical use of the DA instruction on 8051 past the example above.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1517
  • Country: us
  • Formerly SiFive, Samsung R&D
Re: CPU instruction utilization with gcc
« Reply #17 on: April 24, 2020, 02:31:02 pm »
I guess the point of David Hess was not really about the IS having BCD instructions, but implementing calculations that would require being done in BCD for correctness, whereas there has been some projects in which they were just implemented using for instance FP, which is obviously completely wrong. The culprit of course was using FP. BCD can always be implemented "by hand" with pure binary operations. (But with reduced performance compared to FP of course...)

Why do you say that  using FP "is obviously completely wrong"?

On a 32 bit machine that supports a double precision FPU it makes perfect sense to use FP doubles for higher precision operations. An IEEE double can represent all integers from 0 to +/- 2^53 (9x10^15) exactly, which means it's got almost exactly the same range as storing BCD in a 64 bit integer. Addition, subtraction, and multiplication of integers in this range is guaranteed to be exact.

Of course 64 bit binary integers will be better, but on a 32 bit machine doing double precision integer operations is likely to be slower than the FPU -- certainly for multiply, but maybe addition and subtraction also.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 1703
  • Country: fi
    • My home page and email address
Re: CPU instruction utilization with gcc
« Reply #18 on: April 24, 2020, 02:38:03 pm »
Why do you say that  using FP "is obviously completely wrong"?
Not SiliconWizard, but: decimals.  Exact in BCD, inexact in floating point (unless your radix is a power of 10).
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1517
  • Country: us
  • Formerly SiFive, Samsung R&D
Re: CPU instruction utilization with gcc
« Reply #19 on: April 24, 2020, 04:10:51 pm »
As to AVR, I don't know the IS well enough to tell. I guess a half-carry flag would be better than nothing to implement BCD operations slightly more efficiently, but yes, without some kind of DAA instruction, the benefit would be limited.

Here is Atmel's official sample code to do an 8 bit BCD add by doing a decimal add and then adjusting based on the carry and half-carry flags. Note this routine returns a carry-out in the same register as the 2nd argument (r17) but there is no provision for carry-in. I think simply substituting ADC for ADD in the 2nd instruction would allow for a carry-in in the carry flag. So the routine would probably be better written to do that, and to return the carry-out in the carry flag. But, whatever...

Code: [Select]
;***************************************************************************
;*
;* "BCDadd" - 2-digit packed BCD addition
;*
;* This subroutine adds the two unsigned 2-digit BCD numbers
;* "BCD1" and "BCD2". The result is returned in "BCD1", and the overflow
;* carry in "BCD2".
;*
;* Number of words :21
;* Number of cycles :23/25 (Min/Max)
;* Low registers used :None
;* High registers used  :3 (BCD1,BCD2,tmpadd)
;*
;***************************************************************************

;***** Subroutine Register Variables

.def BCD1 =r16 ;BCD input value #1
.def BCD2 =r17 ;BCD input value #2
.def tmpadd =r18 ;temporary register

;***** Code

BCDadd:
ldi tmpadd,6 ;value to be added later
add BCD1,BCD2 ;add the numbers binary
clr BCD2 ;clear BCD carry
brcc add_0 ;if carry not clear
ldi BCD2,1 ;    set BCD carry
add_0:
brhs add_1 ;if half carry not set
add BCD1,tmpadd ;    add 6 to LSD
brhs add_2 ;    if half carry not set (LSD <= 9)
subi BCD1,6 ;        restore value
rjmp add_2 ;else
add_1:
add BCD1,tmpadd ;    add 6 to LSD
add_2:
brcc add_2a
ldi BCD2,1
add_2a:
swap tmpadd
add BCD1,tmpadd ;add 6 to MSD
brcs add_4 ;if carry not set (MSD <= 9)
sbrs BCD2,0 ;    if previous carry not set
subi BCD1,$60 ; restore value
add_3:
ret ;else
add_4:
ldi BCD2,1 ;    set BCD carry
ret

Note that if there was a DAA instruction as on 6800, 8080, z80, 8086 then this would simply be:

Code: [Select]
adc BCD1,BCD2
daa BDC1

The C BCDadd() function I posted previously, modified to work on a single byte, uses 22 instructions (one more!), has only one conditional branch, and doesn't use either the carry or half-carry flags.

The flags are actually useless.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1517
  • Country: us
  • Formerly SiFive, Samsung R&D
Re: CPU instruction utilization with gcc
« Reply #20 on: April 24, 2020, 04:13:22 pm »
Why do you say that  using FP "is obviously completely wrong"?
Not SiliconWizard, but: decimals.  Exact in BCD, inexact in floating point (unless your radix is a power of 10).

You don't use decimals. You use integers, just as BCD is fundamentally integer, and insert a "." character only when printing.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 1703
  • Country: fi
    • My home page and email address
Re: CPU instruction utilization with gcc
« Reply #21 on: April 24, 2020, 05:39:43 pm »
You don't use decimals. You use integers, just as BCD is fundamentally integer, and insert a "." character only when printing.
Then, your abstract decimal data type needs to have a separate field for the base power of ten -- which literally makes it a radix-10 floating point.
You can obviously have radix-10 fixed point types as well.

AFAIK, financial software uses 2- and 4-decimal numeric types.  Mathematically, there is no difference between using an integer type that represents the numeric value in units of 1/100 or 1/10,000 in binary or in BCD, but verification for correctness may be easier when the underlying datatype uses BCD.  I would not use BCD at the application programming level, but in an interpreter or runtime I might.  (Apologies for the conditionals, but I just don't have experience with financial software to know how important additional verification would be.  I also wonder if any financial software uses checksums or hashes internally (not just when in storage, but during calculations) to verify the results.)

Feel free to disagree; this is just my current understanding.  To simplify: I am not sure if BCD is necessary for current processors to support, I certainly would not miss it if they didn't.  But I do understand why it was seen as important in the past.



All that said, I'm sure there are instructions that are not used by any C compilers, but that nevertheless are indispensable for the hardware architecture to be useful.  A lot of code is written at the assembly level -- locking primitives and complex atomic operations --, for example; similarly for hardware interrupt handlers/service routines, privilege-domain crossing functionality (like kernel/userspace boundary), and others.

I don't think the answer to the question in title is that interesting or nowhere near as important as the answer to "how could we do better?"

That answer includes both the instruction set development side, as well as the compiler code generation side.  I'm sure there are millions of unused patterns that if used by the compilers, could yield measurable performance increases.  Similarly, I am sure there are patterns, like inability to use carry/overflow flag, and instead recreate it via extra arithmetic-logic tests, that existing instruction sets support, but we don't know how to get the compilers to better exploit to our advantage.  SIMD/vectorization is one of these (and a particularly difficult one, because it affects the order of operations, and is highly dependent on data order/arrangement in memory).

One fundamental problem is that the C standard itself isn't that great: in many cases, it allows a compiler to generate completely idiotic, un-usable code, even when there is a single code pattern that makes practical sense.  (Just look at some discussions between gcc and Linux kernel folks: one looks from the standard side, the other looks at the practice.  Even the C standard has its failures and bugs.)
« Last Edit: April 24, 2020, 05:42:51 pm by Nominal Animal »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 5316
  • Country: fr
Re: CPU instruction utilization with gcc
« Reply #22 on: April 24, 2020, 06:31:41 pm »
Why do you say that  using FP "is obviously completely wrong"?
Not SiliconWizard, but: decimals.  Exact in BCD, inexact in floating point (unless your radix is a power of 10).

You don't use decimals. You use integers, just as BCD is fundamentally integer, and insert a "." character only when printing.

As I said above, you don't need BCD to do BCD. Integers are fine.

OTOH, I don't see the point of using FP if you're strictly using FP as integers. It would require proper care from the developers side to begin with, and wouldn't make much sense unless you target CPUs with much faster FPU than their integer ALU. It's just asking for shooting yourself in the foot, and there's a very large probability this would lead to bugs due to improper use.

Just use integers. OK you were trying to make a point with FP, but frankly this doesn't make much sense in this context IMHO. Developers tempted to use FP for financial applications will usually do this to make their life easier, and we have a few examples of this leading to very bad software. If they have to use FP with even more caution than if they were using integers, that's pretty twisted.

As to using BCD instead of pure binary integers, that can be debated to no end. Obviously for financial stuff, exactness is key, and properly used integers can give you this. (FP used strictly as integers, I again don't see much point, but it would fall into the same category.)

There's nothing you can do with pure BCD you can't do with binary integers. So it's mainly a matter of performance. And of course having BCD instructions serves no purpose if the language you're using can't make use of them (back to the topic.)

Nominal Animal: You may be making a good point about verification. It *might* be easier to verify correctness of financial calculations with BCD compared to binary (basically because a pure binary implementation will have slightly more overhead to properly deal with the fractional part, which has to be limited to a certain number of *decimal* digits, so that's more potential for bugs, even though it's not rocket science.)
« Last Edit: April 24, 2020, 06:33:18 pm by SiliconWizard »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 1703
  • Country: fi
    • My home page and email address
Re: CPU instruction utilization with gcc
« Reply #23 on: April 24, 2020, 08:00:48 pm »
It *might* be easier to verify correctness of financial calculations with BCD compared to binary
One interesting thing would be to poison-values-after-use with an invalid BCD pattern (each nibble ≥10).  It is easy to detect (the same as per-digit carry check after addition) before they are used, and the detection will work even if the value was only partially overwritten.  (The runtime would hide these, only providing an error (C) or exception (C++) if such are ever detected.  Kinda-sorta like IEEE-754 floating-point operations with NANs, for example.)

I am pretty sure there are some patterns that would be even more effective for binary data -- any error correction scheme, for example! --, it is just that BCD is the simplest for us humans to grok in this region of the solution space.

However, I don't actually believe our current financial software is nowhere near that careful.  I would be surprised if they even do basic checks, like verifying a sum of two values via subtraction.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1517
  • Country: us
  • Formerly SiFive, Samsung R&D
Re: CPU instruction utilization with gcc
« Reply #24 on: April 24, 2020, 09:51:33 pm »
You don't use decimals. You use integers, just as BCD is fundamentally integer, and insert a "." character only when printing.
Then, your abstract decimal data type needs to have a separate field for the base power of ten -- which literally makes it a radix-10 floating point.

No. It is only necessary that you as the programmer know how many digits there are after the decimal. Usually all variables for e.g. currency have the same number: 2 for dollars, pounds, rubles and so forth in ordinary accounting applications. In certain financial (rather than accounting) settings where you are doing things such as interest calculations it is specified by regulation that you use 4 or 6 digits after the decimal.

The only operations in accounting are normally adding and subtracting currency values, and multiplying a currency value by a plain integer. Rarely, you might have to multiply a price per unit by a number of units that is not an integer. Probably in exactly one place in your codebase :-)

Quote
AFAIK, financial software uses 2- and 4-decimal numeric types.  Mathematically, there is no difference between using an integer type that represents the numeric value in units of 1/100 or 1/10,000 in binary or in BCD, but verification for correctness may be easier when the underlying datatype uses BCD.  I would not use BCD at the application programming level, but in an interpreter or runtime I might.  (Apologies for the conditionals, but I just don't have experience with financial software to know how important additional verification would be.

I've been writing financial software since the mid 80s. Well, mostly from 1985 to 1995 or so when I worked for a series of related stockbroking / fixed interest / foreign exchange / mergers&acquisitions companies. But sometimes since then too.

I absolutely did use FP values to represent money in all that software running on 68020+68882 and up to PowerPC. And, rarely, x86.

All calculations are absolutely exact as long as you stay in the range of 15 decimal digits. If you do something that results in an inexact answer then a flag is set in the FPU that stays set until you deliberately reset it. It is perfectly safe if you take the amount of care that you should take with *any* financial sofrtware.
 
The following users thanked this post: I wanted a rude username

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1517
  • Country: us
  • Formerly SiFive, Samsung R&D
Re: CPU instruction utilization with gcc
« Reply #25 on: April 24, 2020, 10:01:56 pm »
OTOH, I don't see the point of using FP if you're strictly using FP as integers. It would require proper care from the developers side to begin with, and wouldn't make much sense unless you target CPUs with much faster FPU than their integer ALU. It's just asking for shooting yourself in the foot, and there's a very large probability this would lead to bugs due to improper use.

ALL programming, especially financial programming, requires proper care.

What is the difference between using 53 bit integers in an FP value and 32 bit integers in an integer value?  *Nothing*, except an extra 6 digits of precision. Using two 32 bit integers explicitly is just a PITA. Ok, modern C compilers have "long long". but that's new.

If you have a 64 bit CPU then of course you have a better working range. But there are still other advantages of FP. See below.

Quote
Just use integers. OK you were trying to make a point with FP, but frankly this doesn't make much sense in this context IMHO. Developers tempted to use FP for financial applications will usually do this to make their life easier, and we have a few examples of this leading to very bad software. If they have to use FP with even more caution than if they were using integers, that's pretty twisted.

It is precisely the same amount of caution -- you need to make sure you do operations that don't overflow the available range, and that don't produce inexact results, such as division.

The differences are:

1) an FP double holding an integer gives you 21 bits more precision -- more than 6 decimal digits.

2) if an integer operation overflows then you'll never know unless you very expensively check the result of every single operation. If an FP operation overflows the exact integer range or produces an inexact value in any other way then a flag is set that *stays* set until you clear it, so you can verify a whole series of calculations with a single check.

Using FP is not some slow dangerous kludge -- it is *safer*.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 1703
  • Country: fi
    • My home page and email address
Re: CPU instruction utilization with gcc
« Reply #26 on: April 24, 2020, 11:44:44 pm »
You don't use decimals. You use integers, just as BCD is fundamentally integer, and insert a "." character only when printing.
Then, your abstract decimal data type needs to have a separate field for the base power of ten -- which literally makes it a radix-10 floating point.
No.
Nice of you to not include the next sentence in that paragraph, "You can obviously have radix-10 fixed point types as well.", so you can say you used radix-10 fixed-point instead; i.e. integer values in units of 10-n of the base currency unit.

And I'm the one who gets called a troll here.  Hmph.

I absolutely did use FP values to represent money in all that software running on 68020+68882 and up to PowerPC. And, rarely, x86.
Explains a lot about the state of our financial system.  (Kidding!)

No, of course you can write safe and mathematically correct software using BCD, binary integers, floating-point numbers, or even ASCII decimal strings.

It is the obvious or naïve way of using floating-point numbers (not pure integers, but using decimals to represent fractional amounts of the base currency unit) in a financial system that is bad -- because often used values cannot be represented exactly in binary floating point.

Yes, you could even work out how to do all needed operations (including interest calculations) that work mathematically correctly even when you use floating-point numbers the obvious way, although it is nontrivial and slow, and you need to be Kahan sum -level careful when implementing each.

The fundamental problem lies, as usual, completely within us humans: what you mean exactly when you say "use floating-point types", and so on.
Even if you the original developer understand it, you also need to understand the rules by which your compiler works; and anyone modifying or maintaining the code later on must do so too.  I like to use the term "maintainability", because it is not just readability; it is about how easy or hard it is to understand the entire logic structure that governs the functioning of the program, and work within it.

For financial software, you don't get to do approximate totals; everything must match to the smallest currency unit recorded.  If you use floating point values like you would with normal numerical computation, with fractional parts representing fractional currency units, that won't happen unless your hardware FP is radix-10, because commonly used values cannot be represented exactly.  You can work around it, and brucehoult has outlined some experience on how that has been done by using integers only, in units of the smallest fraction needed, while still technically using FP and FP operations.  Another way to do it is via BCD.  A yet another way is via pure binary integers.  You can even use ASCII strings representing the numerical values -- it's easier than what one might think.

The issue is that just writing the code and claim it does what is asked, is just a small part of software engineering.  Sometimes you use an inefficient implementation, because the tradeoff between efficiency and maintainability warrants it.  If you need something to be efficient and reliable, you test the hell out of it, and acknowledge that if it needs to be "modified", it is better rewritten from scratch using the full sets of (changed) requirements, and ensure the requirements defining the operational rules are well documented.

It would be so nice to write code just for myself, and to hell with maintainability -- but then we wouldn't be able to stand on the shoulders of giants; we'd just all be standing out there in the field, reinventing the same wheels again and again.

To circle back at the original topic, this also means that the subset of instructions say GCC uses isn't really indicative of much: it is difficult to tell if the unused subset is unused because GCC is inefficient, or because that subset is just not useful.  Similarly for BCD: just because a bunch of programmers cannot immediately see a killer use case for it, does not mean there isn't one.
« Last Edit: April 24, 2020, 11:47:12 pm by Nominal Animal »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf