IEEE 754 Binary32

`float` storage representations:

`0x00000000 = `+0.0f

`0x00000000 .. 0x007FFFFF = `positive subnormals, 2⁻¹⁴⁹ .. 8388607×2⁻¹⁴⁹, inclusive; < 2⁻¹²⁶

`0x00800000 = `2⁻¹²⁶, smallest positive normal

` :``0x3F800000 = `+1.0f

`0x4B800000 = `+16777216.0f, largest integer in the consecutive range

`0x4B800001 = `+16777218.0f

`0x7F7FFFFF = `16777215×2¹⁰⁴ < 2¹²⁸, largest positive value representable

`0x7F80000 = `+Infinity

`0x7F80001 .. 0x7FFFFFFF = `+NaN

`0x80000000 = `-0.0f

`0x80000000 .. 0x807FFFFF = `negative subnormals, -2⁻¹⁴⁹ .. -8388607×2⁻¹⁴⁹, inclusive; > -2⁻¹²⁶

`0x80800000 = `-2⁻¹²⁶, largest negative normal

` :``0xBF800000 = `-1.0f

`0xCB800000 = `-16777216.0f, smallest integer in the consecutive range

`0xFF7FFFFF = `-16777215×2¹⁰⁴ > -2¹²⁸, smallest positive value representable

`0xFF800000 = `-Infinity

`0xFF800001 .. 0xFFFFFFFF = `-NaN

Zero is the only repeated numeric value, and only because +0.0f and -0.0f have their own separate representations.

There is exactly one positive infinity and one negative infinity, but 8388607 positive NaNs and 8388607 negative NaNs, and thus 16777214 NaN representations. Because any arithmetic operation with a NaN yields NaN (but retains the sign of the operation), you have millions of non-numeric "marker" values you can use. For example, a stack-based single-precision floating-point calculator might use the NaNs to describe the operators (say positive NaNs), and functions (say negative NaNs), and thus only need a single stack of

`float` values.

On architectures like x86, x86-64, any ARM with hardware floating-point support, any Linux architecture with hardware or software floating-point support, you can use the

`float` representation examination tool I posted in

reply #37 to explore and check these; I use it constantly when working with

`float`s.

If you want to add a check for architecture support to it, I'd add

` if (sizeof (float) != sizeof (uint32_t) ||`` ((word32){ .u = 0x00000000 }).f != ((word32){ .u = 0x80000000 }).f ||`` ((word32){ .u = 0xC0490FDB }).f != 3.1415927410125732421875f ||`` ((word32){ .u = 0x4B3C614E }).f != 12345678.0f) {`` fprintf(stderr, "Unsupported 'float' type: not IEEE 754 Binary32.\n");`` exit(EXIT_FAILURE);`` }`to the beginning of

`main()`. I do not normally bother, because I haven't had access to a system where that would trigger, in decades. That includes all my SBCs and microcontrollers. (Notably, I do not have any DSPs, which are the exception, and actually could still use a non-IEEE 754 Binary32

`float` type.)

The function in

reply #50 yields the difference between any two non-NaN

`float` values, in number of unique non-NaN

`float` representations between the two. If they are the same value, it returns 0; if they are consecutive representable numeric

`float` values, it returns 1; if there is one representable numeric

`float` between the two values, it returns 2, and so on. For example, for the difference between the smallest positive subnormal and the largest negative subnormal (storage representations 0x00000001 and 0x80000001) it returns 3.

The key idea in that difference is that negative

`float` values representation is subtracted from 0x80000000, which turns their storage representation to the integer negative of the storage representation of the corresponding positive

`float` value. Note that this also maps -0.0f to the same integer representation as +0.0f does, 0x00000000. The difference of such modified signed integer storage representations is the difference in number of representable float values as described in the previous paragraph.

For radix sorting, the storage representation is kept as a 32-bit unsigned integer. For representations that have the sign bit clear, you set the most significant bit. For all other representations, you invert all bits (~). This keeps the representations of +0.0f and -0.0f separate (0x80000000 and 0x7FFFFFFF, respectively), puts the positive

`float`s representation above the negative ones, and inverts the order of the representations of the negative ones, essentially ensuring that

*all* non-NaN

`float` values order the same way as their unsigned 32-bit modified storage representations.

After radix sorting, the inverse operation needs to invert all bits (~) if the most significant bit is clear, and only clear the most significant bit if it is set. This undoes the storage representation modification, restoring the exact original

`float` representations.

I, too, use fixed point types quite often. However, this thread is about the specific case when you have

*hardware* *float* *support*, and want to leverage that instead. So, commenting that one should use fixed-point here instead of hardfloat is an apples-to-oranges replacement suggestion.