Author Topic: Non-IEEE (software) float option  (Read 2926 times)

0 Members and 1 Guest are viewing this topic.

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14922
  • Country: fr
Re: Non-IEEE (software) float option
« Reply #25 on: June 15, 2024, 09:41:42 pm »
One way of not having to bother with this in C (but make sure that's what you want of course) is to include tgmath.h instead of math.h. It provides 'generic' math functions, which will call the corresponding variant of the function depending on the type of the argument(s). Particularly handy if you want to make some piece of code agnostic to the FP type and reserve the possibility to change types with just a typedef or macro. Sure you can roll that yourself with a bunch of macros, but it can get ugly, while this is already made, and standard. Requires C11 minimum.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4242
  • Country: us
Re: Non-IEEE (software) float option
« Reply #26 on: June 16, 2024, 08:19:50 am »
Quote
The more you spam the f suffix, the more you remind yourself and others the importance of it. And sometimes it's very important for performance.
Particularly on the recent crop of microcontrollers with HW support for single precision floats, but not for doubles.

A non-IEEE variation I'm wondering about would be something that produces near-double precision results in two single-precision variables, and thus uses single-precision hardware.  (48bit mantissa, 8bit exponent, essentially.  Pi = 3.14159e0 + 2.65359e-6, or binary equivalent thereof.)
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8963
  • Country: gb
Re: Non-IEEE (software) float option
« Reply #27 on: June 16, 2024, 03:47:44 pm »
It's a good idea to always use the f suffix with every literal when you want single-precision floats.
If you want to turn the warning level to maximum with some C and most C++ compilers you need to add all those f suffices, or you get a lot of warnings, :)
 

Offline Twoflower

  • Frequent Contributor
  • **
  • Posts: 739
  • Country: de
Re: Non-IEEE (software) float option
« Reply #28 on: June 16, 2024, 04:04:38 pm »
If you want to turn the warning level to maximum with some C and most C++ compilers you need to add all those f suffices, or you get a lot of warnings, :)
For gcc
Code: [Select]
-Wdouble-promotionwill be sufficient to inform you about that. As mentioned, very useful if your hardware supports SP in hardware.
 

Offline eutectique

  • Frequent Contributor
  • **
  • Posts: 423
  • Country: be
Re: Non-IEEE (software) float option
« Reply #29 on: June 16, 2024, 04:37:52 pm »
There are certainly places in my code where stuff is done as a double, unintentionally, but none of them are even remotely time-critical.
[...]
Is there some easy way to search for all instances of such code?

In "Cross Reference Table" of your map file, search for functions __aeabi_f2d (float to double) and __aeabi_d2f (double to float). Also check for __adddf3 (add two doubles), __divdf3 (divide two doubles), and similar. This will get you to the compilation unit.

To find the line number, disassemble the corresponding object file with suitable options: arm-none-eabi-objdump -Sl file-with-doubles.o
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3808
  • Country: gb
  • Doing electronics since the 1960s...
Re: Non-IEEE (software) float option
« Reply #30 on: June 16, 2024, 05:28:57 pm »
Right;  __aeabi_f2d in the .map takes me to the .o files of the very .c modules where I was doing the dodgy doubles :)

Same for __aeabi_d2f.

None of them are performance related though. And almost all impossible to fix because the value eventually goes to a printf type function.
« Last Edit: June 16, 2024, 05:34:44 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3210
  • Country: ca
Re: Non-IEEE (software) float option
« Reply #31 on: June 16, 2024, 05:48:54 pm »
Floating point takes off some precision, because it needs space to store exponent. If you compare single precision floating point to 32-bit integer (both take the same space), integer values from 0 to about 16 million can be represented exactly by both, in range from 16 million to 4 billion 32-bit integer will be more accurate. Of course, 32-bit integers cannot store values about above 4 billion.

But 4 billion is a huge range. Like if you measure voltage in uV, it will take you up to 4000 V. Even if you use things with big dynamic range, such as capacitors, it's still not that bad - measuring in pF you can represent any capacitance from 1 pF to 4 mF.

Of course if you're doing something like inverting nearly singular matrices, you must use floats. But for simple operations, such as calculating averages, filters etc. using floats will only lose precision without any positive effect.
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8963
  • Country: gb
Re: Non-IEEE (software) float option
« Reply #32 on: June 16, 2024, 05:57:18 pm »
Floating point takes off some precision, because it needs space to store exponent. If you compare single precision floating point to 32-bit integer (both take the same space), integer values from 0 to about 16 million can be represented exactly by both, in range from 16 million to 4 billion 32-bit integer will be more accurate. Of course, 32-bit integers cannot store values about above 4 billion.

But 4 billion is a huge range. Like if you measure voltage in uV, it will take you up to 4000 V. Even if you use things with big dynamic range, such as capacitors, it's still not that bad - measuring in pF you can represent any capacitance from 1 pF to 4 mF.

Of course if you're doing something like inverting nearly singular matrices, you must use floats. But for simple operations, such as calculating averages, filters etc. using floats will only lose precision without any positive effect.
Floating point gives you a reasonably consistent level of precision over a wide dynamic. Integers gives you a highly variable level of precision, depending where you are in their number range. When the integer is 2, the options of 1 and 3 as the finest possible changes you can make represent bloody awful precision.

 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3210
  • Country: ca
Re: Non-IEEE (software) float option
« Reply #33 on: June 16, 2024, 06:37:00 pm »
When the integer is 2, the options of 1 and 3 as the finest possible changes you can make represent bloody awful precision.

If you measure voltage from 0 to 3 V and you decide to measure it in volts with choices of 0, 1, 2, or 3, this is of course terrible. However, if you switch to nV then it is excellent resolution despite that there's nothing between 2 and 3 nV because your ADC step is in hundreds of nV at best.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3808
  • Country: gb
  • Doing electronics since the 1960s...
Re: Non-IEEE (software) float option
« Reply #34 on: June 16, 2024, 06:50:12 pm »
This has been a super bit of learning. In this project it has not mattered (due to eventual snprintf etc) but in others it will matter. There must be a 100x perf difference...

Thank you all.

Yes I know about integer maths :) Many years ago, Z80, I did various products which all used 32 bit integer maths. It does work but you have to know what is going on in great detail.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8963
  • Country: gb
Re: Non-IEEE (software) float option
« Reply #35 on: June 16, 2024, 06:57:18 pm »
When the integer is 2, the options of 1 and 3 as the finest possible changes you can make represent bloody awful precision.

If you measure voltage from 0 to 3 V and you decide to measure it in volts with choices of 0, 1, 2, or 3, this is of course terrible. However, if you switch to nV then it is excellent resolution despite that there's nothing between 2 and 3 nV because your ADC step is in hundreds of nV at best.
If you range switch you have a form of floating point. That's exactly what floating point does.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3210
  • Country: ca
Re: Non-IEEE (software) float option
« Reply #36 on: June 16, 2024, 07:42:08 pm »
When the integer is 2, the options of 1 and 3 as the finest possible changes you can make represent bloody awful precision.

If you measure voltage from 0 to 3 V and you decide to measure it in volts with choices of 0, 1, 2, or 3, this is of course terrible. However, if you switch to nV then it is excellent resolution despite that there's nothing between 2 and 3 nV because your ADC step is in hundreds of nV at best.
If you range switch you have a form of floating point. That's exactly what floating point does.

Floating point calculations switch ranges at runtime (hence processing overhead). With fixed point integers you set the range manually at design time (hence higher resolution and lower overhead).
 
The following users thanked this post: Siwastaja

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3808
  • Country: gb
  • Doing electronics since the 1960s...
Re: Non-IEEE (software) float option
« Reply #37 on: June 16, 2024, 10:03:56 pm »
That's a good way to put it.

The problem with fixed point is that in many cases you need to put in a whole load of your time to make sure the required dynamic range of the data is maintained. If say filtering the output of an ADC (taking last 100 values, adding them up, and dividing by 100 - an FIR filter AFAIK) then this is trivial using uint32. The PID control loop of a temperature controller will be a lot less trivial (although PID controllers are an old hat now and everybody and their dog knows how to do it efficiently) and the control loop of an autopilot will be much less trivial still. In Apollo they did it all using basically int32 (I've read the books) but they had a bunch of super clever guys there who came in with algorithms from the Polaris missile etc project. In modern times, working to a commercial schedule, that would just be masochism. Single floats are a great solution, 24 bits is way more accuracy (or even real noise-free resolution) than any manufacturable analog system can achieve, and at 7ns for a multiply... ;)


Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6544
  • Country: fi
    • My home page and email address
Re: Non-IEEE (software) float option
« Reply #38 on: June 16, 2024, 11:28:01 pm »
I was thinking of a soft-fp format on 8-bit MCUs with 40-bit mantissa \$m\$, radix 256, and a 7-bit exponent \$p\$, as in \$m \cdot 256^{p - 67} = m \cdot 2^{8 p - 536}\$ with a separate sign bit.

While it would mean the actual precision would vary in steps, cyclically, between 32 to 40 bits, it would eliminate bit shifts.  It would seriously speed up additions and subtractions, but also multiplications.  Calculating c += a×b, if we label the five bytes 0 (least significant) to 4 (most significant), then
    c4:c3 += a4×b4
    c3:c2 += a4×b3 + a3×b4
    c2:c1 += a4×b2 + a3×b3 + a2×b4
    c1:c0 += a4×b1 + a3×b2 + a2×b3 + a1×b4
    c0:u4 += a4×b0 + a3×b1 + a2×b2 + a1×b3 + a0×b4
    u4:u3 += a3×b0 + a2×b1 + a1×b2 + a0×b3
    u3:u2 += a2×b0 + a1×b1 + a0×b2
    u2:u1 += a1×b0 + a0×b1
    u1:u0 += a0×b0
While c3 will be nonzero for nonzero products, c4 may become zero (whenever a4×b4 < 256).  Then, u4:u3 products may need to be calculated, to obtain the eight bits "shifted in" (while decrementing the exponent).  Typically 15 multiplications and 30 byte-wise additions are needed (not counting carry rippling, which isn't that common).

The problem isn't speed, but the amount of machine instructions this kind of code generates, in my opinion.  Most 8-bitters don't have that much Flash.  Even if you implement the above using loops, you end up generating many instructions to compute the addresses.  One could write the multiply-add in assembly by hand, but even then it is quite a few instructions.  It ends up bulky, not slow.



On 32-bit MCUs with a fast 32×32-bit multiplication returning the high or low 32 bits of the product, software floating-point is usually not worth it, because you can do multi-word products like above.  For example, if you have a fixed-point format with 32 fractional bits, 31 integral bits, and a sign bit, then c = a×b is
    c3 = HI(a1×b1)
    c2 = HI(a1×b0) + HI(a0×b1)
       + LO(a1×b1)
    c1 = HI(a0×b0)
       + LO(a0×b1) + LO(a1×b0)
    c0 = LO(a0×b0)
of which only c2:c1 forms the result c.  We only need the high bit of c0 for half-ULP correct rounding, and that is the same as AND operation between the most significant bits of a0 and b0.  (When that bit and the least significant bit of c1 are both set, increment c1, and carry over to c2 and c3 if overflow.  Otherwise it is ignored.)  If c3 is nonzero, the product overflows.

Thus, for modular arithmetic, you only need 6 32-bit unsigned multiplications (three low, three high), some additions, and a couple of bit checks for half-ULP correct rounding.  Addition and subtraction is just two 32-bit operations.  Division can be implemented either bitwise, or via 32-bit/32-bit division-and-remainder.

Extending to 96 bits,
    c5 = HI(a2×b2)
    c4 = HI(a2×b1) + HI(a1×b2)
       + LO(a2×b2)
    c3 = HI(a2×b0) + HI(a1×b1) + HI(a0×b2)
       + LO(a2×b1) + LO(a1×b2)
    c2 = HI(a1×b0) + HI(a0×b1)
         + LO(a2×b0) + LO(a1×b1) + LO(a0×b2)
    c1 = HI(a0×b0)
         + LO(a0×b1) + LO(a1×b0)
    c0 = LO(a0×b0)
For 32 fractional bits, you use c1, c2, and c3.  Again, only the most significant bit of c0 matters, and if c4 or c5 are nonzero, the calculation overflows.  For modular arithmetic, you only need 13 multiplications.
For 64 fractional bits, you use c2, c3, and c4.  For half-ULP correct rounding, you need to calculate the most significant bit of c1, but it again only matters when the least significant bit of c2 is also set.
For branchless ripple carry, after adding to e.g. c2, you need to add zero with carry to c3 (and if using 64 fractional bits, to c4), so there are quite a few additions as well, but they tend to be very fast.

13 multiplications can sound like a lot, but 96 bits is a crapton of precision, easily beating even IEEE 754 double-precision floating point, unless you for some strange reason need the range.  I rarely do, even in atomic simulations.



From above, we can easily extend into arbitrary precision numbers, simply by making the number of integral and fractional words – "limbs" – variable, in both directions.

Arbitrary precision numbers are not floating-point, because the decimal point is always where the two branches (integral and fractional) meet.

Combining both, you can represent any real number as a continuous sequence of (say 32-bit unsigned) words, i.e.
$$v = \left(-1\right)^S \, \sum_{k=0}^{N-1} m_k \, 2^{32 ( k + B) }$$
where \$S\$ is the sign bit, \$B\$ is the position of the least significant bit (in units of 32 bits), \$N\$ is the number of words in the value, and \$m_0\$ through \$m_{N-1}\$ contain the bit representation of the value.

Arbitrary precision fixed- and floating-point types are generally called multiprecision numbers.

These are scary powerful: no limits on what you can do, really.  For example, you can easily use these – say, as provided by the GNU MPFR (multiple-precision floating-point library) or GNU MPC (multi-precision library), or any of the many alternatives to calculate exactly-correct IEEE Binary32 (float), Binary64 (double), Binary128(quad), or whatever precision you need.  In fact, most Linux/BSD/Unix machines have bc installed, which is a command-line arbitrary-precision calculator.  For example, if you want the first 101 decimal digits of pi, you use the arctan(1) function:
    echo 'scale=100; 4*a(1)' | bc -l
(Yes, bc shortens arctan() to a(), sin to s() cosine to c(), natural logarithm to l(), exponential to e(), when the math library is enabled via the -l option.)

Do let me know if you see any bugs in above.  I've been buggy lately!
« Last Edit: June 16, 2024, 11:35:07 pm by Nominal Animal »
 
The following users thanked this post: oPossum

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3210
  • Country: ca
Re: Non-IEEE (software) float option
« Reply #39 on: June 16, 2024, 11:48:58 pm »
that would just be masochism.

Many (if not all) manufacturers offer Q31 libraries which can be used for coordinate transforms in machines, robots, solar trackers etc. They will have sin, cos and other common functions. So it should not be too much harder than with floats.
 

Offline iMo

  • Super Contributor
  • ***
  • Posts: 4897
  • Country: vc
Re: Non-IEEE (software) float option
« Reply #40 on: June 17, 2024, 05:32:52 am »
that would just be masochism.

Many (if not all) manufacturers offer Q31 libraries which can be used for coordinate transforms in machines, robots, solar trackers etc. They will have sin, cos and other common functions. So it should not be too much harder than with floats.

Guys, you do not know what the real masochism  actually means, indeed..  ;D
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf