Electronics > Microcontrollers

Non-IEEE (software) float option

**peter-h**:

In the old Z80 etc days, you had e.g.

- Hisoft Pascal, non-IEEE floats, FP divide taking say 500us, and Borland Turbo Pascal may have been the same

- IAR Pascal or C, IEEE floats, FP divide taking say 10ms

Now, aside from IAR compilers back then generating bloated code (the runtimes were often written in C, which is a rubbish way to do stuff like software floats) the IEEE float format does produce much longer runtimes. A part of this may be the specific requirements on e.g. rounding.

On the more basic CPUs today there are no hardware floats (I think those are always IEEE compliant) so getting software floats to run several times faster would be useful to a lot of people.

The non-IEEE code is also a lot more compact. In another thread someone mentioned 20k for software floats (arm32?). The Z80 non-IEEE version was under 4k, and Z80 code is not that compact.

Another curious thing is that in Hisoft Pascal, the simple code meant that the 24-bit mantissa was an "exact integer" if you did whole-number add/subtract i.e. if you wanted a 24 bit counter (their ints were only int16) you could use a float and just keep adding 1 and it would do it right, all the way to (2^24)-1, whereas I am not sure IEEE floats do exactly that.

I do see a problem with generic tools like GCC not wanting to do this, instead writing their software floats in C, but that will immediately produce a big performance hit.

**brucehoult**:

--- Quote from: peter-h on June 12, 2024, 02:47:13 pm ---In the old Z80 etc days, you had e.g.

- Hisoft Pascal, non-IEEE floats, FP divide taking say 500us, and Borland Turbo Pascal may have been the same

- IAR Pascal or C, IEEE floats, FP divide taking say 10ms

--- End quote ---

There is for example the FP library used by Arduino on AVR. It's around 5us for add/sub/mul on a 16 MHz chip.

--- Quote ---Another curious thing is that in Hisoft Pascal, the simple code meant that the 24-bit mantissa was an "exact integer" if you did whole-number add/subtract i.e. if you wanted a 24 bit counter (their ints were only int16) you could use a float and just keep adding 1 and it would do it right, all the way to (2^24)-1, whereas I am not sure IEEE floats do exactly that.

--- End quote ---

Of course they do. Any IEEE implementation (hardware or software) is absolutely guaranteed to give exact results for add/sub/mul of integers out to the limits of 23 bits on single precision or 53 bits in double precision. And in fact to a result of 2^23 or 2^53. After that the odd numbers can't be represented but you get all the even numbers out to 2^24 or 2^54.

This is a consequence of IEEE implementations being required to give bit exact results for EVERY operation for which the result is representable -- and the nearest value for the rest. I mean for the fundamental operations -- this doesn't apply to trig and logs.

**ejeffrey**:

--- Quote from: brucehoult on June 12, 2024, 03:01:10 pm ---

--- Quote ---Another curious thing is that in Hisoft Pascal, the simple code meant that the 24-bit mantissa was an "exact integer" if you did whole-number add/subtract i.e. if you wanted a 24 bit counter (their ints were only int16) you could use a float and just keep adding 1 and it would do it right, all the way to (2^24)-1, whereas I am not sure IEEE floats do exactly that.

--- End quote ---

Of course they do. Any IEEE implementation (hardware or software) is absolutely guaranteed to give exact results for add/sub/mul of integers out to the limits of 23 bits on single precision or 53 bits in double precision. And in fact to a result of 2^23 or 2^53. After that the odd numbers can't be represented but you get all the even numbers out to 2^24 or 2^54.

--- End quote ---

Hence the use of a single base numeric data type in languages like JavaScript and Matlab. Of course the "overflow" behavior is different...

**brucehoult**:

--- Quote from: brucehoult on June 12, 2024, 03:01:10 pm ---Of course they do. Any IEEE implementation (hardware or software) is absolutely guaranteed to give exact results for add/sub/mul of integers out to the limits of 23 bits on single precision or 53 bits in double precision. And in fact to a result of 2^23 or 2^53. After that the odd numbers can't be represented but you get all the even numbers out to 2^24 or 2^54.

--- End quote ---

Oops I miscalculated.

With the exponent at the value where the different between successive numbers is 1.0 (no fractions any more), you get 2^23 or 2^53 integers between 2^23 and 2^24-1, or between 2^53 and 2^54-1. It is the next higher exponent value, after 2^24 and 2^54 where odd numbers can't be represented.

So, yeah, fp32 gives you exact arithmetic on integers to 16,777,216 (2^24) and fp64 to 18,014,398,509,481,984 (2^54). And the same for negative.

**peter-h**:

What I was getting at is whether there is a big overhead in coding IEEE compliant floats.

All those 1970s coders implementing non-IEEE floats must have been doing it for a reason. I knew a number of them personally and they were super bright coders. People like Clive Smith-Stubbs (Hitech C) and Dave Nutkins (HiSoft).

Navigation

[0] Message Index

[#] Next page

Go to full version