Let's recap. (I promise, this will be useful!)
A linear mapping from X0..X1 to Y0..Y1 is
y = Y0 + (x - X0) × (Y1 - Y0) / (X1 - X0)
and its inverse is
x = X0 + (y - Y0) × (X1 - X0) / (Y1 - Y0)
Let's assume we limit x to X0..X1, inclusive, and y to Y0..Y1, inclusive; and that X0 < X1 and Y0 < Y1. Then, we need one unsigned integer multiplication and one unsigned integer division with an unsigned integer type capable of describing values from zero to (Y1-Y0)×(X1-X0), inclusive. (The final or initial addition is in the type of x or y.)
Let's consider a situation where x represents a 12-bit ADC reading (unsigned integer counts between 0 and 4095) and y represents voltage in tens of microvolts (unsigned integer count between 0 and 499878, inclusive, with 500000 = 5.0V). In other words, X0=0, X1=4096, Y0=0, Y1=500000. The product (Y1-Y0)×(X1-X0)=2048000000 < 2³¹, so we can use 32-bit integer math here.
static inline uint32_t adc_to_internal_voltage_units(const uint16_t adc)
{
return ((uint32_t)adc * UINT32_C(500000)) / 4096;
}
static inline uint32_t internal_voltage_units_to_adc(const uint32_t units)
{
return units * 4096 / UINT32_C(500000);
}
Because the next power of ten of 500000 is one million, which is less than 2²⁰, we only need 20-bit unsigned integer multiplication by ten, and a 20-bit unsigned integer comparison to and subtraction by 100000, to convert the internal voltage units to a string. We can do the same thing by converting the internal units as an unsigned integer, then put a decimal point between the fifth and the sixth digit, too. An example implementation:
static inline char *internal_voltage_units_to_string(uint32_t units)
{
static char buffer[7]; /* V.vvvv plus end-of-string terminator. */
for (char i = 1; i < 6; i++) {
char digit = '0';
while (units >= 100000) {
units -= 100000;
digit++;
}
buffer[i] = digit;
if (i < 5)
units *= 10;
}
buffer[0] = buffer[1]; /* Move units digit left of decimal point */
buffer[1] = '.'; /* Decimal point */
buffer[6] = '\0'; /* End-of-string mark */
return (char *)buffer;
}
Note that 10000 internal units is 1.0 V.
Compiling the above functions (removing static inline) on AVR-GCC 5.4.0 with -O2 for ATmega32U4 generates 30, 32, and 150-byte functions, respectively; using -Os shrinks the last one to 90 bytes. The first function uses __muluhisi3 for 32-bit unsigned integer multiplication, the second uses __udivmodsi4 for 32-bit unsigned integer division, and the third is self-contained using -O2. The third uses __muluhisi3 if using -Os; the difference between -O2 and -Os here is whether the multiply-by-ten is implemented via a call to __muluhisi3 or inlined as bit shifts and additions (the difference being about 60 bytes of code).
This same logic can be expanded into any internal units. If the string form is needed, it is useful to have the unsigned integer units have the same digits (i.e., some power of ten, positive or negative, of the human-useful value), because that makes the string conversion easy and cheap. Otherwise, a second linear mapping is needed (to convert the internal units to a decimal representation); which itself is not that costly, code-wise, as you can see from above.
In particular, the fully variable (run-time calibratable) form,
#include <stdint.h>
extern const int16_t y_0;
extern const uint16_t y_delta; /* ydelta = y1 - y0, ydelta > 0 */
extern const int32_t x_0;
extern const uint32_t x_delta; /* xdelta = x1 - x0, xdelta > 0 */
typedef uint64_t xy_type;
int32_t x(const int16_t y)
{
return x_0 + ((xy_type)(y - y_0) * x_delta) / y_delta;
}
int16_t y(const int32_t x)
{
return y_0 + ((xy_type)(x - x_0) * y_delta) / x_delta;
}
generates 140-byte and 160-byte functions using the same settings as above (for both -O2 and -Os). x() contains one call to __muldi3 and one call to __udivdi3, and y() one call to __mulsidi3 and one call to __udivdi3, so I would not consider them "slow". Note that the xytype is an unsigned integer type that can describe values from 0 to xdelta*ydelta, inclusive; here, a 48-bit unsigned type would suffice. With x_delta=500000 and y_delta=4096, uint32_t would suffice; that would shrink the functions a bit and use faster calls.