A fixed point format such as Q8.24 has \$\log_{10}(2^{24}) \approx 7.2\$ decimal fractional digits (right of the decimal point), with total precision being \$\log_{10}(2^{32}) \approx 9.6\$ decimal digits.
A decimal fixed point format using d=8 in a 32-bit signed integer has range -21.47483648 to +21.47483647, and exactly eight decimal digits.
Yes fixed point can represent numbers with this accuracy, but you can't keep that accuracy without using more bits while doing most mathematical functions. Clearest example being division.
True, except I'd argue the clearest example is multiplication.
TL;DR: Even floating-point arithmetic is really affected, but the implementations deal with it internally, by rounding the high-precision intermediate results to the floating-point type.
When you multiply an
A-bit value with a
B-bit value, the result has
A+
B bits, arithmetically speaking. In other words, for multiplication, you need a temporary value that is as wide in bits as the sum of the widths of the multiplicands.
For Q8.24, the temporary result has 16 integer bits (including sign bit), and 48 fractional bits.
This is part and parcel of both integer and fixed point arithmetic: an unavoidable fact.
Floating-point arithmetic works around this by using a mantissa-exponent format to describe each value,
v =
m·B
x, where
m is the mantissa (left aligned, so without superfluous leading zeroes), and
x is the exponent, and B is the radix (2 for binary floating-point formats, 10 for decimal floating-point formats).
The product of two such values has a double-wide mantissa which is immediately (except in cases like fused multiply-add) rounded to the normal size; and the exponent is the sum of the terms' exponents.
So, in a way, even floating-point arithmetic does not completely avoid this issue, especially because IEEE 754 requires exact correct rounding; it's just that the floating-point arithmetic implementations take care of it internally.
Addition and subtraction between two values can only underflow or overflow by one bit. But, division is complicated, because many rational values cannot be expressed exactly in binary. The most common example is 0.1 = 1/10, which in binary is 0b0.0000
1100... and cannot be exactly represented as a scalar binary value. So, some kind of rounding is needed. Arithmetically, we usually implement integer division in terms of division and modulus, i.e.
v /
d =
n with remainder
r, such that
d·
n+
r =
v (and the remainder
r is crucial when implementing BigNum division, division for arithmetic types larger than the native register size), and 0 ≤ abs(
r) < abs(
nd), and preferably (but not necessarily)
r and
n having the same sign. With floating point division,
n will be rounded to the stated precision, but be correct to within half an unit in the least significant place (ULP), i.e. within half a mantissa bit for IEEE 754 Binary32 (float) and Binary64 (double).
It is this rounding why one should not use exact comparison between floating-point values, but a range instead, i.e. abs(
a-
b) ≤
eps, where
eps is the largest value with respect to
a and
b that one considers "zero"; small enough to ignore.