Electronics > Microcontrollers

Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware

(1/14) > >>

Nominal Animal:
The following CC0-1.0 -licensed C99 code implements unity period sine (and cosine) functions using very few float operations, needing only a (144-byte) precalculated coefficient table.  The functions are
usinf(x) = sin(2·π·x)
ucosf(x) = cos(2·π·x)
While this is not the best, the fastest, or the smallest possible implementation, it is lightweight.  Compiling to Cortex-M4F or Cortex-M7F with hardware single-precision floating-point support, the code size is 250 to 300 bytes depending on your optimization options, plus the 144-byte lookup table, for a total of 400 to 440 bytes of Flash used.

All finite normal arguments x yields a result within ±7 ULPs of the expected value (verified for all 1,056,964,609 finite normal values between 0 and 1, i.e. one full period).  In other words, the result has at least 21 bits of precision; six significant decimal digits.  Also, for integer n, usinf(0.5·n) = 0 and usinf(0.5·n+0.5) = ±1; similarly for ucosf().

// Nominal Animal, 2024

#include <stdint.h>
#include <math.h>

const struct {
float  v;  // Value
float  t;  // One third of tangent
} usinf_coefficient[18] = {
{ .v = 0.0f,        .t = 0.032724924f  },
{ .v = 0.09801714f, .t = 0.032567345f  },
{ .v = 0.19509032f, .t = 0.03209612f   },
{ .v = 0.29028466f, .t = 0.0313158f    },
{ .v = 0.38268343f, .t = 0.030233886f  },
{ .v = 0.47139674f, .t = 0.028860806f  },
{ .v = 0.55557024f, .t = 0.02720978f   },
{ .v = 0.6343933f,  .t = 0.025296709f  },
{ .v = 0.70710677f, .t = 0.023140015f  },
{ .v = 0.77301043f, .t = 0.020760471f  },
{ .v = 0.8314696f,  .t = 0.018180992f  },
{ .v = 0.8819213f,  .t = 0.0154264225f },
{ .v = 0.9238795f,  .t = 0.012523286f  },
{ .v = 0.95694035f, .t = 0.009499544f  },
{ .v = 0.98078525f, .t = 0.006384316f  },
{ .v = 0.9951847f,  .t = 0.0032076035f },
{ .v = 1.0f,        .t = 0.0f          },
{ .v = 0.9951847f,  .t = 0.0032076035f },
};

// 64 sectors form a full turn.
// phase = 0..1, and corresponds to fraction within sector.
float usinf_sector(int32_t sector, float phase) {
int_fast8_t  index;

// Bit 4 of the sector indicates phase inversion.
if (sector & 16) {
phase = 1.0f - phase;
index = 15 - (sector & 15);
} else {
index = sector & 15;
}

// For the De Casteljau evaluation, we need the opposite phase.
const float  iphase = 1.0f - phase;

// Values at sector endpoints
const float  v0 = usinf_coefficient[index].v;
const float  v1 = usinf_coefficient[index+1].v;

// One thirds of tangents at sector endpoints
const float  t0 = usinf_coefficient[index].t;
const float  t1 = usinf_coefficient[index+1].t;

// Bezier points; b0 = v0, b3 = v1
const float  b1 = v0 + t0,
b2 = v1 - t1;

// First de Casteljau step, with z0 and z2 optimized.
const float  z0 = v0 + (phase * t0),
z1 = (iphase * b1) + (phase * b2),
z2 = v1 - (iphase * t1);

// Second de Casteljau step.
const float  y0 = (iphase * z0) + (phase * z1),
y1 = (iphase * z1) + (phase * z2);

// Final de casteljau step.
const float  x0 = (iphase * y0) + (phase * y1);

// Bit 5 of the sector indicates negation
return (sector & 32) ? -x0 : x0;
}

float usinf(float x) {
x *= 64.0f;  // Power-of-two multiplication is exact!

int32_t sector = (int32_t)x;
float   phase  = (x - (float)sector);
if (phase < 0.0f) {
phase += 1.0f;
sector--;
} else
if (phase >= 1.0f) {
phase -= 1.0f;
sector++;
}

return usinf_sector(sector, phase);
}

float ucosf(float x) {
x *= 64.0f;

int32_t sector = (int32_t)x;
float   phase  = (x - (float)sector);
if (phase < 0.0f) {
phase += 1.0f;
sector--;
} else
if (phase >= 1.0f) {
phase -= 1.0f;
sector++;
}

return usinf_sector(sector + 16, phase);
}

--- End code ---
The implementation is straightforward.  A full rotation is divided into 64 sectors.  One quarter sine wave is thus 16 sectors.  Each sector is described by the value at that point, and by the slope or tangent at that point (divided by three).  A cubic one-dimensional Bézier curve/function is used to interpolate within each sector, using the De Casteljau algorithm for better numerical stability and accuracy.

If we calculate the error distances in units in the least significant place in the expected value (as obtained by IEEE 754 double-precision sin() and cos() functions, multiplying the argument by two pi), the error counts among normal (subnormals excluded) IEEE-754 float values between 0 and 1 are as follows:
$$\begin{array}{l|r:r|r:r} \text{ ULPs } & \text{ Count } & \text{ Fraction } & \text{ Count } & \text{ Fraction } \\ \hline +7 & 90 & 0.000\% & 141 & 0.000\% \\ +6 & 11,250 & 0.001\% & 16,940 & 0.002\% \\ +5 & 181,952 & 0.017\% & 272,940 & 0.026\% \\ +4 & 982,297 & 0.093\% & 1,321,792 & 0.125\% \\ +3 & 3,346,595 & 0.317\% & 2,546,888 & 0.241\% \\ +2 & 19,657,519 & 1.860\% & 3,137,123 & 0.297\% \\ +1 & 325,906,619 & 30.834\% & 9,457,058 & 0.895\% \\ \hdashline +0 & 581,617,000 & 55.027\% & 975,874,671 & 92.328\% \\ \hdashline -1 & 76,779,488 & 7.264\% & 23,267,170 & 2.201\% \\ -2 & 25,569,180 & 2.419\% & 14,774,399 & 1.398\% \\ -3 & 15,692,769 & 1.485\% & 13,570,569 & 1.284\% \\ -4 & 5,872,646 & 0.556\% & 10,183,454 & 0.963\% \\ -5 & 1,229,700 & 0.116\% & 2,434,365 & 0.230\% \\ -6 & 115,202 & 0.011\% & 106,555 & 0.010\% \\ -7 & 2,297 & 0.000\% & 542 & 0.000\% \\ -8 & 3 & 0.000\% & 0 & 0.000\% \\ \end{array}$$
over all \$1,056,964,609\$ normal (non-subnormal) IEEE 754 single-precision (float aka Binary32) values between 0 and 1.

Instead of float, the same approach can be used with fixed-point types as well.  The De Casteljau is then best rewritten in terms of a linear interpolation helper function, as that can almost halve the number of multiplications needed.

Picuino:

--- Code: ---/*
C program for double floating point sin and cos.
Calls modf.
There are no error exits.
Coefficients are #3370 from Hart & Cheney (18.80D).
*/
#include "cmath.h"

static const double p0 =  0.1357884097877375669092680e8;
static const double p1 = -0.4942908100902844161158627e7;
static const double p2 =  0.4401030535375266501944918e6;
static const double p3 = -0.1384727249982452873054457e5;
static const double p4 =  0.1459688406665768722226959e3;
static const double q0 =  0.8644558652922534429915149e7;
static const double q1 =  0.4081792252343299749395779e6;
static const double q2 =  0.9463096101538208180571257e4;
static const double q3 =  0.1326534908786136358911494e3;

static double sinus(double arg, int quad) {
double x, x2, temp;
int integ;

/* Negative argument */
if (arg < 0) {
arg = -arg;
}

/* Argument > 90 degrees */
x = arg * TWO_PI;
integ = (int) floor(x);
x = x - integ;

x = 1.0 - x;
x = - x;

/* Compute sin of x */
x2 = x * x;
temp = ((((p4 * x2 + p3) * x2 + p2) * x2 + p1) * x2 + p0) * x;
temp = temp / ((((x2 + q3) * x2 + q2) * x2 + q1) * x2 + q0);
return(temp);
}

double cos(double arg) {
if (arg<0)  arg = -arg;
return(sinus(arg, 1));
}

double sin(double arg) {
return(sinus(arg, 0));
}

--- End code ---

EDIT:
9 multiplications and one division to get the sine or cosine of argument, in double precision.

Nominal Animal:

--- Quote from: Picuino on June 16, 2024, 08:01:42 pm ---9 multiplications and one division to get the sine or cosine of argument, in double precision.
--- End quote ---
Needs double precision support, and the argument is in radians, not in cycles, so that basically suggests replacing apples with oranges.

KE5FX:
Frequently when you're calculating both sin and cos, you're rotating something.  If the phase advances by the same delta each time, there's a really cool recurrence relation that gets you out of doing any sin/cos calls at all.

From Numerical Recipes 2nd ed:

This trick will be obvious to most Math Guys, I'm sure, but I was pretty glad to stumble across it when I needed it.

Picuino:

--- Quote from: Nominal Animal on June 16, 2024, 08:09:43 pm ---Needs double precision support, and the argument is in radians, not in cycles, so that basically suggests replacing apples with oranges.

--- End quote ---

Ok, another orange:
- Simple precision
- Only 7 multiplications, no division
- Only 7 coefficients (float numbers of simple precision)
- Input argument range from -1 to +1

--- Code: ---import numpy as np
import matplotlib.pyplot as plt

def cos_poly(x):
p0 = +1.61608342e-03
p1 = -2.54255600e-02
p2 = +2.35104787e-01
p3 = -1.33519736e+00
p4 = +4.05870398e+00
p5 = -4.93480191e+00
p6 = +1.00000000e+00
xx = x * x
return ((((((p0 * xx + p1)*xx + p2)*xx + p3)*xx + p4)*xx + p5)*xx + p6)

def main():
xx = np.linspace(-1, 1, 500)
yy = cos_true(xx) - cos_poly(xx)

fig, ax = plt.subplots()
ax.plot(xx, yy)
ax.grid(True)
plt.show()

def cos_true(x):
return np.cos(x * np.pi)

main()

--- End code ---

Attached: Error (+- 2 * 1e-8).