(some atom modeling scientific guy might disagree)

For all the ones I know –– including at least LAMMPS, Gromacs, VASP, Dalton, SIESTA, CHARRM ––

`double` suffices.

There is

GNU Quad-Precision Math Library aka libquadmath, which provides 128-bit

__float128 and __complex128 types and a wide range of

math functions. I use it sometimes to calculate coefficients for double precision approximations, and to compare my approximations to "best known values", but that's about it.

When I suspect cancellation error may be an issue –– as in a sum with both large and tiny terms, with the large terms cancelling out ––, either sorting them in descending order of magnitude, or

Kahan summation will deal with it.

The more you spam the f suffix, the more you remind yourself and others the importance of it. And sometimes it's very important for performance.

It also reminds one to use the ..f() functions from

`<math.h>`, for example

`sinf()`,

`fabsf()`,

`expf()`, and so on.

Typical speed-enhanced non-IEEE 754 about-single-precision floating-point libraries use a 8-bit signed exponent, and a 24-bit signed mantissa, often in two's complement format, and all bits explicit. Multiplication yields a 48-bit temporary, where only the 25 most significant bits matter; 24 for the actual result, and one for rounding. Similarly, about-double-precision can use either a 8-bit or 16-bit signed exponent, and a 56-bit or 48-bit mantissa.

It is also useful to know that you can represent any finite unsigned

`float` value in magnitude (ignoring sign), including subnormals, using a 277-bit fixed point format Q128.149. Since its integer and fractional parts are converted to decimal separately, you only need a 160-bit unsigned integer bit scratch pad for the conversion; or about 20 bytes. We had a thread about the details some time ago.

`double` may need up to 2098 bits, Q1024.1074, but that too is doable with a 1088-bit or 136-byte scratch pad. Fractional part happens to be easier to convert to decimal, via repeated multiplication by 10; that pushes the next fractional digit to the units place, and is why the scratch pad needs about 4 more bits than the full fractional range (i.e. Q4.1074). Because of rounding, the fractional part may round up to the next larger integer, and thus should be converted before the decimal part. The conversion of the integer part is easiest to implement via repeated division-by-10 with remainder, each consecutive remainder yielding the decimal digits from right to left.

"AI" calculations often use either integers or a half-precision floating point with 11-bit mantissa (implicit most significant bit, which is 1 for all normals and 0 for subnormals) and 5-bit exponent. It has the nice feature that all its finite values, including subnormals, can be represented exactly in magnitude (i.e. ignoring sign bit) using 40-bit Q16.Q24 fixed point format. Multiplication is only 11-bit by 11-bit with 22-bit result, of which 11 high bits remain in the result, and the highest dropped bit determines rounding; addition and subtraction requires a 21-bit scratchpad.