Author Topic: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware  (Read 5960 times)

0 Members and 1 Guest are viewing this topic.

Online Nominal AnimalTopic starter

  • Super Contributor
  • ***
  • Posts: 6788
  • Country: fi
    • My home page and email address
The following CC0-1.0 -licensed C99 code implements unity period sine (and cosine) functions using very few float operations, needing only a (144-byte) precalculated coefficient table.  The functions are
    usinf(x) = sin(2·π·x)
    ucosf(x) = cos(2·π·x)
While this is not the best, the fastest, or the smallest possible implementation, it is lightweight.  Compiling to Cortex-M4F or Cortex-M7F with hardware single-precision floating-point support, the code size is 250 to 300 bytes depending on your optimization options, plus the 144-byte lookup table, for a total of 400 to 440 bytes of Flash used.

All finite normal arguments x yields a result within ±7 ULPs of the expected value (verified for all 1,056,964,609 finite normal values between 0 and 1, i.e. one full period).  In other words, the result has at least 21 bits of precision; six significant decimal digits.  Also, for integer n, usinf(0.5·n) = 0 and usinf(0.5·n+0.5) = ±1; similarly for ucosf().
Code: [Select]
// SPDX-License-Identifier: CC0-1.0
// Nominal Animal, 2024

#include <stdint.h>
#include <math.h>

const struct {
    float  v;  // Value
    float  t;  // One third of tangent
} usinf_coefficient[18] = {
    { .v = 0.0f,        .t = 0.032724924f  },
    { .v = 0.09801714f, .t = 0.032567345f  },
    { .v = 0.19509032f, .t = 0.03209612f   },
    { .v = 0.29028466f, .t = 0.0313158f    },
    { .v = 0.38268343f, .t = 0.030233886f  },
    { .v = 0.47139674f, .t = 0.028860806f  },
    { .v = 0.55557024f, .t = 0.02720978f   },
    { .v = 0.6343933f,  .t = 0.025296709f  },
    { .v = 0.70710677f, .t = 0.023140015f  },
    { .v = 0.77301043f, .t = 0.020760471f  },
    { .v = 0.8314696f,  .t = 0.018180992f  },
    { .v = 0.8819213f,  .t = 0.0154264225f },
    { .v = 0.9238795f,  .t = 0.012523286f  },
    { .v = 0.95694035f, .t = 0.009499544f  },
    { .v = 0.98078525f, .t = 0.006384316f  },
    { .v = 0.9951847f,  .t = 0.0032076035f },
    { .v = 1.0f,        .t = 0.0f          },
    { .v = 0.9951847f,  .t = 0.0032076035f },
};

// 64 sectors form a full turn.
// phase = 0..1, and corresponds to fraction within sector.
float usinf_sector(int32_t sector, float phase) {
    int_fast8_t  index;

    // Bit 4 of the sector indicates phase inversion.
    if (sector & 16) {
        phase = 1.0f - phase;
        index = 15 - (sector & 15);
    } else {
        index = sector & 15;
    }

    // For the De Casteljau evaluation, we need the opposite phase.
    const float  iphase = 1.0f - phase;

    // Values at sector endpoints
    const float  v0 = usinf_coefficient[index].v;
    const float  v1 = usinf_coefficient[index+1].v;

    // One thirds of tangents at sector endpoints
    const float  t0 = usinf_coefficient[index].t;
    const float  t1 = usinf_coefficient[index+1].t;

    // Bezier points; b0 = v0, b3 = v1
    const float  b1 = v0 + t0,
                 b2 = v1 - t1;

    // First de Casteljau step, with z0 and z2 optimized.
    const float  z0 = v0 + (phase * t0),
                 z1 = (iphase * b1) + (phase * b2),
                 z2 = v1 - (iphase * t1);

    // Second de Casteljau step.
    const float  y0 = (iphase * z0) + (phase * z1),
                 y1 = (iphase * z1) + (phase * z2);

    // Final de casteljau step.
    const float  x0 = (iphase * y0) + (phase * y1);

    // Bit 5 of the sector indicates negation
    return (sector & 32) ? -x0 : x0;
}

float usinf(float x) {
    x *= 64.0f;  // Power-of-two multiplication is exact!

    int32_t sector = (int32_t)x;
    float   phase  = (x - (float)sector);
    if (phase < 0.0f) {
        phase += 1.0f;
        sector--;
    } else
    if (phase >= 1.0f) {
        phase -= 1.0f;
        sector++;
    }

    return usinf_sector(sector, phase);
}

float ucosf(float x) {
    x *= 64.0f;

    int32_t sector = (int32_t)x;
    float   phase  = (x - (float)sector);
    if (phase < 0.0f) {
        phase += 1.0f;
        sector--;
    } else
    if (phase >= 1.0f) {
        phase -= 1.0f;
        sector++;
    }

    return usinf_sector(sector + 16, phase);
}
The implementation is straightforward.  A full rotation is divided into 64 sectors.  One quarter sine wave is thus 16 sectors.  Each sector is described by the value at that point, and by the slope or tangent at that point (divided by three).  A cubic one-dimensional Bézier curve/function is used to interpolate within each sector, using the De Casteljau algorithm for better numerical stability and accuracy.

If we calculate the error distances in units in the least significant place in the expected value (as obtained by IEEE 754 double-precision sin() and cos() functions, multiplying the argument by two pi), the error counts among normal (subnormals excluded) IEEE-754 float values between 0 and 1 are as follows:
$$\begin{array}{l|r:r|r:r}
\text{ ULPs } & \text{ Count } & \text{ Fraction } & \text{ Count } & \text{ Fraction } \\
\hline
+7 & 90 & 0.000\% & 141 & 0.000\% \\
+6 & 11,250 & 0.001\% & 16,940 & 0.002\% \\
+5 & 181,952 & 0.017\% & 272,940 & 0.026\% \\
+4 & 982,297 & 0.093\% & 1,321,792 & 0.125\% \\
+3 & 3,346,595 & 0.317\% & 2,546,888 & 0.241\% \\
+2 & 19,657,519 & 1.860\% & 3,137,123 & 0.297\% \\
+1 & 325,906,619 & 30.834\% & 9,457,058 & 0.895\% \\
\hdashline
+0 & 581,617,000 & 55.027\% & 975,874,671 & 92.328\% \\
\hdashline
-1 & 76,779,488 & 7.264\% & 23,267,170 & 2.201\% \\
-2 & 25,569,180 & 2.419\% & 14,774,399 & 1.398\% \\
-3 & 15,692,769 & 1.485\% & 13,570,569 & 1.284\% \\
-4 & 5,872,646 & 0.556\% & 10,183,454 & 0.963\% \\
-5 & 1,229,700 & 0.116\% & 2,434,365 & 0.230\% \\
-6 & 115,202 & 0.011\% & 106,555 & 0.010\% \\
-7 & 2,297 & 0.000\% & 542 & 0.000\% \\
-8 & 3 & 0.000\% & 0 & 0.000\% \\
\end{array}$$
over all \$1,056,964,609\$ normal (non-subnormal) IEEE 754 single-precision (float aka Binary32) values between 0 and 1.

Instead of float, the same approach can be used with fixed-point types as well.  The De Casteljau is then best rewritten in terms of a linear interpolation helper function, as that can almost halve the number of multiplications needed.
« Last Edit: June 16, 2024, 07:45:15 pm by Nominal Animal »
 
The following users thanked this post: oPossum, mikerj, newbrain, RoGeorge, iMo, harerod, Picuino, wek

Offline Picuino

  • Super Contributor
  • ***
  • Posts: 1032
  • Country: es
    • Picuino web
Code: [Select]
/*
  C program for double floating point sin and cos.
  Calls modf.
  There are no error exits.
  Coefficients are #3370 from Hart & Cheney (18.80D).
*/
#include "cmath.h"

static const double p0 =  0.1357884097877375669092680e8;
static const double p1 = -0.4942908100902844161158627e7;
static const double p2 =  0.4401030535375266501944918e6;
static const double p3 = -0.1384727249982452873054457e5;
static const double p4 =  0.1459688406665768722226959e3;
static const double q0 =  0.8644558652922534429915149e7;
static const double q1 =  0.4081792252343299749395779e6;
static const double q2 =  0.9463096101538208180571257e4;
static const double q3 =  0.1326534908786136358911494e3;


static double sinus(double arg, int quad) {
   double x, x2, temp;
   int integ;

   /* Negative argument */
   if (arg < 0) {
      arg = -arg;
      quad = quad + 2;
   }

   /* Argument > 90 degrees */
   x = arg * TWO_PI;
   integ = (int) floor(x);
   x = x - integ;

   quad = (quad + integ) & 0x03;
   if (quad & 01)
      x = 1.0 - x;
   if (quad > 1)   
      x = - x;

   /* Compute sin of x */
   x2 = x * x;
   temp = ((((p4 * x2 + p3) * x2 + p2) * x2 + p1) * x2 + p0) * x;
   temp = temp / ((((x2 + q3) * x2 + q2) * x2 + q1) * x2 + q0);
   return(temp);
}

double cos(double arg) {
   if (arg<0)  arg = -arg;
   return(sinus(arg, 1));
}

double sin(double arg) {
   return(sinus(arg, 0));
}


EDIT:
9 multiplications and one division to get the sine or cosine of argument, in double precision.
« Last Edit: June 16, 2024, 08:04:06 pm by Picuino »
 
The following users thanked this post: oPossum, LaserSteve

Online Nominal AnimalTopic starter

  • Super Contributor
  • ***
  • Posts: 6788
  • Country: fi
    • My home page and email address
9 multiplications and one division to get the sine or cosine of argument, in double precision.
Needs double precision support, and the argument is in radians, not in cycles, so that basically suggests replacing apples with oranges.
 

Offline KE5FX

  • Super Contributor
  • ***
  • Posts: 1991
  • Country: us
    • KE5FX.COM
Frequently when you're calculating both sin and cos, you're rotating something.  If the phase advances by the same delta each time, there's a really cool recurrence relation that gets you out of doing any sin/cos calls at all.

From Numerical Recipes 2nd ed:



This trick will be obvious to most Math Guys, I'm sure, but I was pretty glad to stumble across it when I needed it.
 
The following users thanked this post: newbrain, RoGeorge, RAPo

Offline Picuino

  • Super Contributor
  • ***
  • Posts: 1032
  • Country: es
    • Picuino web
Needs double precision support, and the argument is in radians, not in cycles, so that basically suggests replacing apples with oranges.

Ok, another orange:
 - Simple precision
 - Only 7 multiplications, no division
 - Only 7 coefficients (float numbers of simple precision)
 - Input argument range from -1 to +1

Code: [Select]
import numpy as np
import matplotlib.pyplot as plt


def cos_poly(x):
   p0 = +1.61608342e-03
   p1 = -2.54255600e-02
   p2 = +2.35104787e-01
   p3 = -1.33519736e+00
   p4 = +4.05870398e+00
   p5 = -4.93480191e+00
   p6 = +1.00000000e+00
   xx = x * x
   return ((((((p0 * xx + p1)*xx + p2)*xx + p3)*xx + p4)*xx + p5)*xx + p6)


def main():
   xx = np.linspace(-1, 1, 500)
   yy = cos_true(xx) - cos_poly(xx)

   fig, ax = plt.subplots()
   ax.plot(xx, yy)
   ax.grid(True)
   plt.show()


def cos_true(x):
   return np.cos(x * np.pi)

   
main()

Attached: Error (+- 2 * 1e-8).
« Last Edit: June 16, 2024, 09:53:21 pm by Picuino »
 

Offline nimish

  • Regular Contributor
  • *
  • Posts: 185
  • Country: us
Neat, here are some with better tradeoffs: https://gist.github.com/publik-void/067f7f2fef32dbe5c27d6e215f824c91

The rel error is 10^4 better than Picuino's for cos, given the same order (12).

Estrin's scheme is better than Horner's for OOO processors. And these should be coded using FMA intrinsics.


Frequently when you're calculating both sin and cos, you're rotating something.  If the phase advances by the same delta each time, there's a really cool recurrence relation that gets you out of doing any sin/cos calls at all.

From Numerical Recipes 2nd ed:

This trick will be obvious to most Math Guys, I'm sure, but I was pretty glad to stumble across it when I needed it.

I want to say this is basically software CORDIC since you're basically just applying the same Lie algebraic infinitesimal rotations, though CORDIC uses binary arctan to allow strength-reducing the operations.
« Last Edit: June 17, 2024, 03:08:35 am by nimish »
 

Online Nominal AnimalTopic starter

  • Super Contributor
  • ***
  • Posts: 6788
  • Country: fi
    • My home page and email address
Ok, another orange:
Needs fixing!  ;)

Python uses double precision, not single-precision.  Also, you implemented period 2, not period 1.  (Use np.cos(2*x*np.pi).)

Writing the fixed version in C (or some other language that supports single-precision without resorting to double precision), you'd see that it diverges quite soon, with significant deviations near 0.25, and it utterly fails above ±0.75 or so.

The idea is that we want to be able to use usinf(x - (int)x) or ucosf(x - (int)x) or similar expressions for any x within -16777216..16777216 or so, although the larger you go, the fewer phase/angle bits there are.  You have zero angle bits left at ±16777216, one at ±±8388608, but the point is that you can use say [-4, +4] or whatever fits your needs, and Just Not Worry About It.



Anyway, that one is based on the bog-standard polynomial series expansion for cosine:
$$\operatorname{ucos}(x) = \cos(2 \pi x) = \sum_{k=0}^\infty (-1)^k \frac{(2 \pi)^{2 k}}{(2 k)!} x^{2 k} = \sum_{k=0}^\infty C_k x^{2 k}$$
where
$$\begin{aligned}
C_0 &= 1 \\
C_1 &= -19.73920880217871 \\
C_2 &= 64.93939402266828 \\
C_3 &= -85.45681720669372 \\
C_4 &= 60.24464137187665 \\
C_5 &= -26.42625678337438 \\
C_6 &= 7.903536371318467 \\
C_7 &= -1.714390711088671 \\
C_8 &= 0.282005968455791 \\
C_9 &= -0.03638284114254565 \\
C_k &= (-1)^k \frac{(2 \pi)^{2 k}}{(2 k)!} \\
C_{k+1} &= - C_k \frac{4 \pi^2}{(2 k + 1)(2 k + 2)} \\
\end{aligned}$$
and so on.

Nothing wrong in using that, though; it is a well-known formulation, and useful expansion around zero.

However, it is nowhere near as good as you asserted.  You simply sampled it on a regular grid, and assumed that covers all the errors.  Not so; exhaustive tests are needed when rounding errors and such cause a small range or high errors.

This is why my tests are exhaustive: I literally test every single non-subnormal float between 0 and 1.  Subnormal floats are approximately in range \$-2^{126} \dots +2^{126}\$, and so close to zero we can safely consider them zeroes.  (Many embedded C math libraries do exactly that, with a note somewhere saying subnormal numbers are not supported or are rounded to zero.)  They are definitely smaller than any machine epsilon I tend to care about.

(And yes, I sometimes use nextafterf() to tweak one coefficient by one ULP at a time, to see if there is a nearby set of coefficients that yields smaller errors, simply because it changes the set of rounding coincidents that happen.)

If you evaluate the corrected first-seven-terms approximation it at x = 0.23118135, you'll find that it is 14 ULPs low (relative error 14/16777216 ≃ 0.0000008).  It's even more off at x = 0.24430753.

The corresponding sine series is
$$\operatorname{usin}(x) = \sin(2 \pi x) = \sum_{k=0}^\infty (-1)^k \frac{(2 \pi)^{2 k + 1}}{(2 k + 1)!} x^{2 k + 1} = \sum_{k=0}^\infty S_k x^{2 k + 1}$$
where
$$\begin{aligned}
S_0 &= 6.283185307179586 = 2 \pi \\
S_1 &= -41.34170224039975 \\
S_2 &= 81.60524927607504 \\
S_3 &= -76.70585975306136 \\
S_4 &= 42.05869394489765 \\
S_5 &= -15.09464257682298 \\
S_6 &= 3.81995258484828 \\
S_7 &= -0.7181223017785002 \\
S_8 &= 0.1042291622081397 \\
S_9 &= -0.01203158594212062 \\
S_k &= (-1)^k \frac{(2 \pi)^{2 k + 1}}{(2 k + 1)!} \\
S_{k+1} &= -S_k \frac{4 \pi^2}{(2 k + 2)(2 k + 3)} \\
\end{aligned}$$
and it fares slightly better, but not much.

I really should rewrite my crude bench program and post it, I guess.  I really should vectorize it for AVX/SSE, as that cuts the full check duration to one quarter.  And clean up the output to e.g. a nicely formatted HTML table.
 

Online Nominal AnimalTopic starter

  • Super Contributor
  • ***
  • Posts: 6788
  • Country: fi
    • My home page and email address
Frequently when you're calculating both sin and cos, you're rotating something.  If the phase advances by the same delta each time, there's a really cool recurrence relation that gets you out of doing any sin/cos calls at all.
For me, the typical use case is constructing an orientation quaternion/bivector (the math is the same whichever you pick) via
$$\mathbf{q} = \Biggl( \cos\left(\frac{\varphi}{2}\right), ~ ~ \frac{x}{n} \sin\left(\frac{\varphi}{2}\right), ~ ~ \frac{y}{n} \sin\left(\frac{\varphi}{2}\right), ~ ~ \frac{z}{n} \sin\left(\frac{\varphi}{2}\right) \Biggr), \quad n = \sqrt{x^2 + y^2 + z^2}$$
for rotating 3D data by angle \$\varphi\$ around axis \$(x, y, z)\$.  Unlike rotation matrices, one can safely chain orientation quaternions/bivectors, because there is no loss of orthogonality, or harm if rounding errors compound a bit.

Aside from testing and verification of code and algorithms – which I like to do thoroughly – I don't even recall when I needed a sequence of sine or cosine values, instead of semi-random specific values.

(I utterly despise Euler and Tait-Bryan angles.  Not only are they vague – there are literally dozens of them, and nobody bothers to specify the exact one they are referring to – but they all suffer from gimbal lock.  I'll do orientation quaternions/versors/bivectors any day instead, even if Hamilton product is a bit cumbersome and conversion to and from rotation matrices is a bit fiddly.)
« Last Edit: June 17, 2024, 04:45:57 am by Nominal Animal »
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 4002
  • Country: gb
  • Doing electronics since the 1960s...
Quote
verified for all 1,056,964,609 finite normal values between 0 and 1, i.e. one full period

Impressive :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 
The following users thanked this post: RAPo

Offline Picuino

  • Super Contributor
  • ***
  • Posts: 1032
  • Country: es
    • Picuino web
I have a series that approximates the sine function with only 3 coefficients and 4 multiplications, with an error of 6e-7. It does not take full advantage of simple precision arithmetic, but it is very simple and fast.
I will make a program to test the errors in the series and post it.


EDIT:
For now I will post the python program to calculate the coefficients.
Code: [Select]
import numpy as np
import matplotlib.pyplot as plt


def main():
    nodes = 7
    x_nodes = cheby_nodes(-0.125, 0.125, nodes)
    y_nodes = sin(x_nodes)

    pol = np.polyfit(x_nodes, y_nodes, nodes-1)
    for i in range(0, nodes, 2):
        pol[i] = 0
    for i in range(1, nodes, 2):
        print(f"   p{i} = {pol[i]:+11.8e}")

    xx = np.linspace(-0.125, 0.125, 500)
    yy = np.polyval(pol, xx) - sin(xx)
   
    plt.plot(xx, yy)
    plt.ylabel('sin(x)')
    plt.show()


def sin(x):
    return np.sin(x * 2 * np.pi)


def cheby_nodes(a, b, nodes):
    i = np.array(range(nodes))
    x = np.cos((2 * i + 1) * np.pi / (2 * nodes))
    return 0.5 * (b - a) * x + 0.5 * (b + a)


main()

I am going to calculate the sine and cosine from the trigonometric properties that allow to calculate them having only the angles from 0 to 45º.


Code: [Select]
   p1 = +7.95301472e+01
   p3 = -4.13255427e+01
   p5 = +6.28315378e+00
« Last Edit: June 17, 2024, 08:47:29 am by Picuino »
 

Offline Picuino

  • Super Contributor
  • ***
  • Posts: 1032
  • Country: es
    • Picuino web
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #10 on: June 17, 2024, 08:53:53 am »
Cosine with error of 4e-8:

Code: [Select]
import numpy as np
import matplotlib.pyplot as plt


def main():
    nodes = 7
    x_nodes = cheby_nodes(-0.125, 0.125, nodes)
    y_nodes = cos(x_nodes)

    pol = np.polyfit(x_nodes, y_nodes, nodes-1)
    for i in range(1, nodes, 2):
        pol[i] = 0
    for i in range(0, nodes, 2):
        print(f"   p{i} = {pol[i]:+11.8e}")

    xx = np.linspace(-0.125, 0.125, 500)
    yy = np.polyval(pol, xx) - cos(xx)
   
    plt.plot(xx, yy)
    plt.ylabel('cos(x)')
    plt.show()


def cos(x):
    return np.cos(x * 2 * np.pi)

def sin(x):
    return np.sin(x * 2 * np.pi)


def cheby_nodes(a, b, nodes):
    i = np.array(range(nodes))
    x = np.cos((2 * i + 1) * np.pi / (2 * nodes))
    return 0.5 * (b - a) * x + 0.5 * (b + a)


main()


Coefficients:
Code: [Select]
   p0 = -8.38235436e+01
   p2 = +6.49266669e+01
   p4 = -1.97391840e+01
   p6 = +1.00000000e+00
 

Offline Picuino

  • Super Contributor
  • ***
  • Posts: 1032
  • Country: es
    • Picuino web
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #11 on: June 17, 2024, 10:55:32 am »
C file with implementation of the function sine(x * 2 * pi) between -1 and +1.

Code: [Select]
#include <math.h>
#include <stdio.h>

float sin_pol(float x);
float sin_pol_r(float x);
float cos_pol_r(float x);

main(void) {
   double x, y1, y2;

   for(x = -1.0; x <= 1.0; x += 1.0/100) {
      y1 = sin_pol(x);
      y2 = sin(x * M_PI * 2);
      printf("x=%+11.8e  sin(x)=%+11.8e  Error=%+11.8e\n", x, y1, y1-y2);
   }
}


float sin_pol(float x) {
    signed char sign = 0;
    if (x < 0) {
        sign = -1;
        x = -x;
    }
    if (x > 0.5) {
        x = x - 0.5;
        if (sign == 0)
            sign = -1;
        else
            sign = 0;
    }
    if (x > 0.25) {
        x = 0.5 - x;
    }
    if (x > 0.125) {
        if (sign == 0)
            return cos_pol_r(0.25 - x);
        else
            return -cos_pol_r(0.25 - x);
    }
    else {
        if (sign == 0)
            return sin_pol_r(x);
        else
            return -sin_pol_r(x);
    }
}


float cos_pol_r(float x) {
   float p0 = -8.38235436e+01;
   float p2 = +6.49266669e+01;
   float p4 = -1.97391840e+01;
   float p6 = +1.00000000e+00;
   float xx = x * x;
   return ((p0 * xx + p2) * xx + p4) * xx + p6;
}


float sin_pol_r(float x) {
   float p1 = +7.95301472e+01;
   float p3 = -4.13255427e+01;
   float p5 = +6.28315378e+00;
   float xx = x * x;
   return ((p1 * xx + p3) * xx + p5) * x;
}



Output (sin values and error):
Code: [Select]
x=-1.00000000e+000  sin(x)=+0.00000000e+000  Error=-2.44921271e-016
x=-9.90000000e-001  sin(x)=+6.27901627e-002  Error=-3.56813253e-007
x=-9.80000000e-001  sin(x)=+1.25332611e-001  Error=-6.22085800e-007
x=-9.70000000e-001  sin(x)=+1.87380587e-001  Error=-7.27921819e-007
x=-9.60000000e-001  sin(x)=+2.48689600e-001  Error=-2.86958976e-007
x=-9.50000000e-001  sin(x)=+3.09016932e-001  Error=-6.20652434e-008
x=-9.40000000e-001  sin(x)=+3.68124780e-001  Error=+2.27642126e-007
x=-9.30000000e-001  sin(x)=+4.25779745e-001  Error=+4.53413709e-007
x=-9.20000000e-001  sin(x)=+4.81754155e-001  Error=+4.80928170e-007
x=-9.10000000e-001  sin(x)=+5.35827018e-001  Error=+2.22972283e-007
x=-9.00000000e-001  sin(x)=+5.87785283e-001  Error=+3.02962440e-008
x=-8.90000000e-001  sin(x)=+6.37423556e-001  Error=-4.33577147e-007
x=-8.80000000e-001  sin(x)=+6.84546932e-001  Error=-1.73519201e-007
x=-8.70000000e-001  sin(x)=+7.28968644e-001  Error=+1.67323476e-008
x=-8.60000000e-001  sin(x)=+7.70513246e-001  Error=+2.78052437e-009
x=-8.50000000e-001  sin(x)=+8.09016927e-001  Error=-6.74160083e-008
x=-8.40000000e-001  sin(x)=+8.44327989e-001  Error=+6.32185192e-008
x=-8.30000000e-001  sin(x)=+8.76306696e-001  Error=+1.55469245e-008
x=-8.20000000e-001  sin(x)=+9.04827045e-001  Error=-7.06649284e-009
x=-8.10000000e-001  sin(x)=+9.29776475e-001  Error=-1.07940462e-008
x=-8.00000000e-001  sin(x)=+9.51056502e-001  Error=-1.44570964e-008
x=-7.90000000e-001  sin(x)=+9.68583142e-001  Error=-1.90036964e-008
x=-7.80000000e-001  sin(x)=+9.82287298e-001  Error=+4.71069131e-008
x=-7.70000000e-001  sin(x)=+9.92114725e-001  Error=+2.33133507e-008
x=-7.60000000e-001  sin(x)=+9.98026735e-001  Error=+6.19351159e-009
x=-7.50000000e-001  sin(x)=+1.00000000e+000  Error=+0.00000000e+000
x=-7.40000000e-001  sin(x)=+9.98026735e-001  Error=+6.19351181e-009
x=-7.30000000e-001  sin(x)=+9.92114725e-001  Error=+2.33133511e-008
x=-7.20000000e-001  sin(x)=+9.82287298e-001  Error=+4.71069137e-008
x=-7.10000000e-001  sin(x)=+9.68583142e-001  Error=-1.90036956e-008
x=-7.00000000e-001  sin(x)=+9.51056502e-001  Error=-1.44570955e-008
x=-6.90000000e-001  sin(x)=+9.29776475e-001  Error=-1.07940451e-008
x=-6.80000000e-001  sin(x)=+9.04827045e-001  Error=-7.06649150e-009
x=-6.70000000e-001  sin(x)=+8.76306696e-001  Error=+1.55469261e-008
x=-6.60000000e-001  sin(x)=+8.44327989e-001  Error=+6.32185209e-008
x=-6.50000000e-001  sin(x)=+8.09016927e-001  Error=-6.74160066e-008
x=-6.40000000e-001  sin(x)=+7.70513246e-001  Error=+2.78052625e-009
x=-6.30000000e-001  sin(x)=+7.28968644e-001  Error=+1.67323496e-008
x=-6.20000000e-001  sin(x)=+6.84546932e-001  Error=-1.73519199e-007
x=-6.10000000e-001  sin(x)=+6.37423556e-001  Error=-4.33577145e-007
x=-6.00000000e-001  sin(x)=+5.87785283e-001  Error=+3.02962467e-008
x=-5.90000000e-001  sin(x)=+5.35827018e-001  Error=+2.22972285e-007
x=-5.80000000e-001  sin(x)=+4.81754155e-001  Error=+4.80928173e-007
x=-5.70000000e-001  sin(x)=+4.25779745e-001  Error=+4.53413712e-007
x=-5.60000000e-001  sin(x)=+3.68124780e-001  Error=+2.27642129e-007
x=-5.50000000e-001  sin(x)=+3.09016932e-001  Error=-6.20652406e-008
x=-5.40000000e-001  sin(x)=+2.48689600e-001  Error=-2.86958973e-007
x=-5.30000000e-001  sin(x)=+1.87380587e-001  Error=-7.27921816e-007
x=-5.20000000e-001  sin(x)=+1.25332611e-001  Error=-6.22085797e-007
x=-5.10000000e-001  sin(x)=+6.27901627e-002  Error=-3.56813251e-007
x=-5.00000000e-001  sin(x)=-0.00000000e+000  Error=+2.78699589e-015
x=-4.90000000e-001  sin(x)=-6.27901629e-002  Error=+3.56667969e-007
x=-4.80000000e-001  sin(x)=-1.25332803e-001  Error=+4.31030204e-007
x=-4.70000000e-001  sin(x)=-1.87380776e-001  Error=+5.38776440e-007
x=-4.60000000e-001  sin(x)=-2.48689413e-001  Error=+4.74094179e-007
x=-4.50000000e-001  sin(x)=-3.09016943e-001  Error=+5.13970452e-008
x=-4.40000000e-001  sin(x)=-3.68124783e-001  Error=-2.30354412e-007
x=-4.30000000e-001  sin(x)=-4.25779730e-001  Error=-4.38516483e-007
x=-4.20000000e-001  sin(x)=-4.81754333e-001  Error=-6.58679120e-007
x=-4.10000000e-001  sin(x)=-5.35827160e-001  Error=-3.64902592e-007
x=-4.00000000e-001  sin(x)=-5.87785125e-001  Error=+1.27513729e-007
x=-3.90000000e-001  sin(x)=-6.37423575e-001  Error=+4.14824223e-007
x=-3.80000000e-001  sin(x)=-6.84546947e-001  Error=+1.58449443e-007
x=-3.70000000e-001  sin(x)=-7.28968620e-001  Error=+7.12112092e-009
x=-3.60000000e-001  sin(x)=-7.70513237e-001  Error=+6.25311691e-009
x=-3.50000000e-001  sin(x)=-8.09017003e-001  Error=-8.20760049e-009
x=-3.40000000e-001  sin(x)=-8.44327927e-001  Error=-1.13372511e-009
x=-3.30000000e-001  sin(x)=-8.76306593e-001  Error=+8.66257441e-008
x=-3.20000000e-001  sin(x)=-9.04827058e-001  Error=-5.84925586e-009
x=-3.10000000e-001  sin(x)=-9.29776490e-001  Error=-3.84639676e-009
x=-3.00000000e-001  sin(x)=-9.51056480e-001  Error=+3.58874400e-008
x=-2.90000000e-001  sin(x)=-9.68583167e-001  Error=-5.47064161e-009
x=-2.80000000e-001  sin(x)=-9.82287288e-001  Error=-3.69834078e-008
x=-2.70000000e-001  sin(x)=-9.92114723e-001  Error=-2.14142509e-008
x=-2.60000000e-001  sin(x)=-9.98026729e-001  Error=-2.01794137e-010
x=-2.50000000e-001  sin(x)=-1.00000000e+000  Error=+0.00000000e+000
x=-2.40000000e-001  sin(x)=-9.98026729e-001  Error=-2.01794581e-010
x=-2.30000000e-001  sin(x)=-9.92114723e-001  Error=-2.14142519e-008
x=-2.20000000e-001  sin(x)=-9.82287288e-001  Error=-3.69834093e-008
x=-2.10000000e-001  sin(x)=-9.68583167e-001  Error=-5.47064360e-009
x=-2.00000000e-001  sin(x)=-9.51056540e-001  Error=-2.37172074e-008
x=-1.90000000e-001  sin(x)=-9.29776490e-001  Error=-3.84639998e-009
x=-1.80000000e-001  sin(x)=-9.04827058e-001  Error=-5.84925952e-009
x=-1.70000000e-001  sin(x)=-8.76306653e-001  Error=+2.70210951e-008
x=-1.60000000e-001  sin(x)=-8.44327927e-001  Error=-1.13372967e-009
x=-1.50000000e-001  sin(x)=-8.09017003e-001  Error=-8.20760548e-009
x=-1.40000000e-001  sin(x)=-7.70513296e-001  Error=-5.33515332e-008
x=-1.30000000e-001  sin(x)=-7.28968620e-001  Error=+7.12111525e-009
x=-1.20000000e-001  sin(x)=-6.84546888e-001  Error=+2.18054082e-007
x=-1.10000000e-001  sin(x)=-6.37423456e-001  Error=+5.34033507e-007
x=-1.00000000e-001  sin(x)=-5.87785184e-001  Error=+6.79090769e-008
x=-9.00000000e-002  sin(x)=-5.35827160e-001  Error=-3.64902599e-007
x=-8.00000000e-002  sin(x)=-4.81754243e-001  Error=-5.69272160e-007
x=-7.00000000e-002  sin(x)=-4.25779790e-001  Error=-4.98121135e-007
x=-6.00000000e-002  sin(x)=-3.68124753e-001  Error=-2.00552097e-007
x=-5.00000000e-002  sin(x)=-3.09016854e-001  Error=+1.40804005e-007
x=-4.00000000e-002  sin(x)=-2.48689458e-001  Error=+4.29390688e-007
x=-3.00000000e-002  sin(x)=-1.87380761e-001  Error=+5.53677593e-007
x=-2.00000000e-002  sin(x)=-1.25332728e-001  Error=+5.05536002e-007
x=-1.00000000e-002  sin(x)=-6.27902225e-002  Error=+2.97063317e-007
x=+7.52869989e-016  sin(x)=+4.73039809e-015  Error=-2.35621255e-020
x=+1.00000000e-002  sin(x)=+6.27902211e-002  Error=-2.98413454e-007
x=+2.00000000e-002  sin(x)=+1.25332728e-001  Error=-5.05985060e-007
x=+3.00000000e-002  sin(x)=+1.87380759e-001  Error=-5.55495390e-007
x=+4.00000000e-002  sin(x)=+2.48689464e-001  Error=-4.23056269e-007
x=+5.00000000e-002  sin(x)=+3.09016865e-001  Error=-1.29230045e-007
x=+6.00000000e-002  sin(x)=+3.68124758e-001  Error=+2.05355240e-007
x=+7.00000000e-002  sin(x)=+4.25779788e-001  Error=+4.96089063e-007
x=+8.00000000e-002  sin(x)=+4.81754237e-001  Error=+5.63154022e-007
x=+9.00000000e-002  sin(x)=+5.35827179e-001  Error=+3.83625059e-007
x=+1.00000000e-001  sin(x)=+5.87785166e-001  Error=-8.62815578e-008
x=+1.10000000e-001  sin(x)=+6.37423482e-001  Error=-5.07816449e-007
x=+1.20000000e-001  sin(x)=+6.84546894e-001  Error=-2.11658960e-007
x=+1.30000000e-001  sin(x)=+7.28968644e-001  Error=+1.67323451e-008
x=+1.40000000e-001  sin(x)=+7.70513296e-001  Error=+5.36440776e-008
x=+1.50000000e-001  sin(x)=+8.09017030e-001  Error=+3.57696711e-008
x=+1.60000000e-001  sin(x)=+8.44327902e-001  Error=-2.38781680e-008
x=+1.70000000e-001  sin(x)=+8.76306652e-001  Error=-2.85009620e-008
x=+1.80000000e-001  sin(x)=+9.04827045e-001  Error=-7.06649450e-009
x=+1.90000000e-001  sin(x)=+9.29776475e-001  Error=-1.07940474e-008
x=+2.00000000e-001  sin(x)=+9.51056529e-001  Error=+1.26669111e-008
x=+2.10000000e-001  sin(x)=+9.68583165e-001  Error=+3.73461528e-009
x=+2.20000000e-001  sin(x)=+9.82287264e-001  Error=+1.28414841e-008
x=+2.30000000e-001  sin(x)=+9.92114713e-001  Error=+1.12808685e-008
x=+2.40000000e-001  sin(x)=+9.98026729e-001  Error=+3.08904458e-010
x=+2.50000000e-001  sin(x)=+1.00000000e+000  Error=+0.00000000e+000
x=+2.60000000e-001  sin(x)=+9.98026735e-001  Error=+6.19351204e-009
x=+2.70000000e-001  sin(x)=+9.92114701e-001  Error=-1.78636883e-010
x=+2.80000000e-001  sin(x)=+9.82287264e-001  Error=+1.28414860e-008
x=+2.90000000e-001  sin(x)=+9.68583188e-001  Error=+2.64729305e-008
x=+3.00000000e-001  sin(x)=+9.51056502e-001  Error=-1.44570944e-008
x=+3.10000000e-001  sin(x)=+9.29776475e-001  Error=-1.07940437e-008
x=+3.20000000e-001  sin(x)=+9.04827045e-001  Error=-7.06649017e-009
x=+3.30000000e-001  sin(x)=+8.76306607e-001  Error=-7.25488413e-008
x=+3.40000000e-001  sin(x)=+8.44327902e-001  Error=-2.38781624e-008
x=+3.50000000e-001  sin(x)=+8.09017030e-001  Error=+3.57696771e-008
x=+3.60000000e-001  sin(x)=+7.70513246e-001  Error=+2.78052836e-009
x=+3.70000000e-001  sin(x)=+7.28968644e-001  Error=+1.67323522e-008
x=+3.80000000e-001  sin(x)=+6.84546932e-001  Error=-1.73519196e-007
x=+3.90000000e-001  sin(x)=+6.37423556e-001  Error=-4.33577142e-007
x=+4.00000000e-001  sin(x)=+5.87785130e-001  Error=-1.22673747e-007
x=+4.10000000e-001  sin(x)=+5.35827179e-001  Error=+3.83625068e-007
x=+4.20000000e-001  sin(x)=+4.81754319e-001  Error=+6.45379884e-007
x=+4.30000000e-001  sin(x)=+4.25779745e-001  Error=+4.53413715e-007
x=+4.40000000e-001  sin(x)=+3.68124780e-001  Error=+2.27642132e-007
x=+4.50000000e-001  sin(x)=+3.09016932e-001  Error=-6.20652374e-008
x=+4.60000000e-001  sin(x)=+2.48689419e-001  Error=-4.68422023e-007
x=+4.70000000e-001  sin(x)=+1.87380771e-001  Error=-5.43933164e-007
x=+4.80000000e-001  sin(x)=+1.25332797e-001  Error=-4.36310193e-007
x=+4.90000000e-001  sin(x)=+6.27901627e-002  Error=-3.56813247e-007
x=+5.00000000e-001  sin(x)=+0.00000000e+000  Error=+6.09478830e-015
x=+5.10000000e-001  sin(x)=-6.27901629e-002  Error=+3.56667972e-007
x=+5.20000000e-001  sin(x)=-1.25332609e-001  Error=+6.24745303e-007
x=+5.30000000e-001  sin(x)=-1.87380582e-001  Error=+7.32491538e-007
x=+5.40000000e-001  sin(x)=-2.48689607e-001  Error=+2.80379087e-007
x=+5.50000000e-001  sin(x)=-3.09016943e-001  Error=+5.13970483e-008
x=+5.60000000e-001  sin(x)=-3.68124783e-001  Error=-2.30354409e-007
x=+5.70000000e-001  sin(x)=-4.25779730e-001  Error=-4.38516480e-007
x=+5.80000000e-001  sin(x)=-4.81754154e-001  Error=-4.79865182e-007
x=+5.90000000e-001  sin(x)=-5.35827041e-001  Error=-2.45693300e-007
x=+6.00000000e-001  sin(x)=-5.87785304e-001  Error=-5.13002035e-008
x=+6.10000000e-001  sin(x)=-6.37423575e-001  Error=+4.14824226e-007
x=+6.20000000e-001  sin(x)=-6.84546947e-001  Error=+1.58449445e-007
x=+6.30000000e-001  sin(x)=-7.28968620e-001  Error=+7.12112325e-009
x=+6.40000000e-001  sin(x)=-7.70513237e-001  Error=+6.25311936e-009
x=+6.50000000e-001  sin(x)=-8.09016943e-001  Error=+5.13970463e-008
x=+6.60000000e-001  sin(x)=-8.44327986e-001  Error=-6.07383683e-008
x=+6.70000000e-001  sin(x)=-8.76306713e-001  Error=-3.25835439e-008
x=+6.80000000e-001  sin(x)=-9.04827058e-001  Error=-5.84925475e-009
x=+6.90000000e-001  sin(x)=-9.29776490e-001  Error=-3.84639554e-009
x=+7.00000000e-001  sin(x)=-9.51056480e-001  Error=+3.58874409e-008
x=+7.10000000e-001  sin(x)=-9.68583167e-001  Error=-5.47064072e-009
x=+7.20000000e-001  sin(x)=-9.82287288e-001  Error=-3.69834071e-008
x=+7.30000000e-001  sin(x)=-9.92114723e-001  Error=-2.14142505e-008
x=+7.40000000e-001  sin(x)=-9.98026729e-001  Error=-2.01793915e-010
x=+7.50000000e-001  sin(x)=-1.00000000e+000  Error=+0.00000000e+000
x=+7.60000000e-001  sin(x)=-9.98026729e-001  Error=-2.01794803e-010
x=+7.70000000e-001  sin(x)=-9.92114723e-001  Error=-2.14142524e-008
x=+7.80000000e-001  sin(x)=-9.82287288e-001  Error=-3.69834099e-008
x=+7.90000000e-001  sin(x)=-9.68583167e-001  Error=-5.47064438e-009
x=+8.00000000e-001  sin(x)=-9.51056480e-001  Error=+3.58874364e-008
x=+8.10000000e-001  sin(x)=-9.29776490e-001  Error=-3.84640098e-009
x=+8.20000000e-001  sin(x)=-9.04827058e-001  Error=-5.84926096e-009
x=+8.30000000e-001  sin(x)=-8.76306713e-001  Error=-3.25835510e-008
x=+8.40000000e-001  sin(x)=-8.44327986e-001  Error=-6.07383762e-008
x=+8.50000000e-001  sin(x)=-8.09016943e-001  Error=+5.13970376e-008
x=+8.60000000e-001  sin(x)=-7.70513237e-001  Error=+6.25310936e-009
x=+8.70000000e-001  sin(x)=-7.28968620e-001  Error=+7.12111314e-009
x=+8.80000000e-001  sin(x)=-6.84546947e-001  Error=+1.58449434e-007
x=+8.90000000e-001  sin(x)=-6.37423575e-001  Error=+4.14824214e-007
x=+9.00000000e-001  sin(x)=-5.87785304e-001  Error=-5.13002157e-008
x=+9.10000000e-001  sin(x)=-5.35827041e-001  Error=-2.45693313e-007
x=+9.20000000e-001  sin(x)=-4.81754154e-001  Error=-4.79865196e-007
x=+9.30000000e-001  sin(x)=-4.25779730e-001  Error=-4.38516493e-007
x=+9.40000000e-001  sin(x)=-3.68124783e-001  Error=-2.30354423e-007
x=+9.50000000e-001  sin(x)=-3.09016943e-001  Error=+5.13970339e-008
x=+9.60000000e-001  sin(x)=-2.48689607e-001  Error=+2.80379073e-007
x=+9.70000000e-001  sin(x)=-1.87380582e-001  Error=+7.32491523e-007
x=+9.80000000e-001  sin(x)=-1.25332609e-001  Error=+6.24745288e-007
x=+9.90000000e-001  sin(x)=-6.27901629e-002  Error=+3.56667957e-007

« Last Edit: June 17, 2024, 11:00:27 am by Picuino »
 

Offline mark03

  • Frequent Contributor
  • **
  • Posts: 730
  • Country: us
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #12 on: June 17, 2024, 10:33:42 pm »
I don't even recall when I needed a sequence of sine or cosine values, instead of semi-random specific values.
Everyone's different.  Most of my work is signal-processing centric, so I most often deal with sequences of sin/cos, and "semi-random specific values" are the weird outlier.  Of course, there is a large body of literature on this... we call it Direct Digital Synthesis (DDS).  Performance is evaluated a bit differently, though.  Instead of absolute errors in the time domain we usually want to know the amplitude of the spurs relative to the intended sinusoid, for a given computational complexity.
 

Online Nominal AnimalTopic starter

  • Super Contributor
  • ***
  • Posts: 6788
  • Country: fi
    • My home page and email address
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #13 on: June 18, 2024, 07:32:39 am »
Quote
verified for all 1,056,964,609 finite normal values between 0 and 1, i.e. one full period
Impressive :)
No, the right word is useful.  It only takes a minute on my ye olde Intel Core i5 7200U laptop, even without vectorization.  (And if we count 0 and 1 both, then it is 1,056,964,610 tests.)

IEEE 754 Binary32 aka float is what microcontrollers with hardware support for floats implement.  float has
  • Two zeroes: -0.0 and +0.0.  These compare equal.
  • 16,777,214 subnormal values, 8,388,607 positive and 8,388,607 negative.  Their magnitude is less than 2-126 ≃ 1.1754944e-38, so they are all essentially zero.  Many math libraries do not support subnormals, and instead treat them as zero.
  • Two infinities: -INFINITY and +INFINITY.
  • 16,777,214 NaNs, 8,388,607 positive and 8,388,607 negative.  These compare inequal to everything, even themselves.  If they are used in an arithmetic operation, the result will also be NaN.
  • 4,261,412,864 normal values, 2,130,706,432 positive and 2,130,706,432 negative.  Zero is not counted as a normal value.
    1,056,964,608 of these are less than 1 in magnitude, and 1,073,741,823 are greater than 1 in magnitude.
Because the sign is a separate bit in float (that you can access with signbit(x) in C), you only need to test positive zeros and normals.  Thus, there are only 2,130,706,433 unique float values to test.  Or, if you are only interested in 0.0f <= x <= 1.0f, as is the case here, 1,056,964,610 unique values.

On architectures with IEEE 754 Binary32 support with float byte order the same as uint32_t byte order, the storage representation of positive zero is 0x00000000 (and negative zero is 0x80000000), and the storage representation of positive normals ranges from 0x00800000 to 0x7F7FFFFF.  +1.0f is represented as 0x3f800000.

If you construct a function that checks your approximation for some float x, and records any deviation from the correct value, you can use a very simple loop:
    for (float x = 0.0f; isfinite(x); x = nexttowardf(x, 1e64)) check(x);
or use the storage representation:
    union {
        uint32_t  u32;
        float   f;
    } x;
    check(0.0f);
    for (x.u32 = 0x00800000; x.u32 < 0x7F800000; x++) check(u.f);

If you are only interested in finite normals up to 1.0f, then use
    for (float x = 0.0f; x <= 1.0f; x = nextafterf(x, +HUGE_VALF)) check(x);
or use the storage representation:
    union {
        uint32_t  u32;
        float   f;
    } x;
    check(0.0f);
    for (x.u32 = 0x00800000; x.u32 <= 0x3F800000; x++) check(u.f);

While the above check the values in increasing order, the order obviously doesn't matter.

If you are interested in finding out if replacing a float division by a constant with a float multiplication with a constant having the reciprocal value, you can use the above to check.  If you need it to yield the exact same answer for an integer divisor, you can use the Markstein approach to turn the division into one multiplication and two fused multiply-adds (fmaf); my pseudocode there tells you how often the given integer divisor will yield a value different to the division, across all normal positive float dividends.

To leverage vector extensions using GCC (or Clang), you only need to declare a suitable type, say
    typedef  float  float4  __attribute__((vector_size (4 * sizeof (float))));
and basic arithmetic operators –– +, -, *, /, negation, ^ (XOR), | (OR), & (AND), ~ (binary NOT), % (modulus or remainder) -– will do the operation element-wise.  You can also access such vector as an array.  If you use a scalar for the other operand for +, -, *, /, ^, |, &, or %, GCC will automatically convert the scalar to a vector with all components the same value.

If you dynamically allocate vector types, you should use aligned_alloc(_Alignof (vectortype, (size_t)count * sizeof (vectortype)), because malloc()/calloc()/realloc() may not enforce sufficient alignment for hardware-supported vector types.  It is annoying, yes, but the compiler developers insist this is according to the C standards.  (The same also affects libquadmath.)

Conversion from a vector type to another vector type newvectortype with the same number of components is done using __builtin_convertvector(vector, newvectortype).  It behaves exactly like casting each member separately to newvectortype.

This works across all architectures.  If the target does not support such a vector type, GCC (and Clang) combine it from smaller vectors, or uses the plain scalar type, and duplicates the operations as many times as necessary.  It does not always generate optimal code then (especially for sequences of operations that need more than half the available vector registers), but it will work.

Unfortunately, for vector verification of the results, like comparing two vectors component-wise, you do need to use architecture-specific vector extension built-ins.  It is best to check what operations are available on the different architectures you use for verification, and hide the details behind static inline functions.  For each function, start with a version that does the checking component by component, so you can easily verify your vectorized and component-by-component checking functions always produce the same results.  I like to keep a compile-time switch to select between vectorized and non-vectorized checking for that reason.

Vectorized checking of piecewise polynomial functions are complicated and funky, so I suggest you don't bother with those until you have a firm grasp of to handle single operation multiple-data vectors in general.
« Last Edit: June 18, 2024, 08:03:55 am by Nominal Animal »
 
The following users thanked this post: glenenglish

Online Nominal AnimalTopic starter

  • Super Contributor
  • ***
  • Posts: 6788
  • Country: fi
    • My home page and email address
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #14 on: June 18, 2024, 07:49:25 am »
I don't even recall when I needed a sequence of sine or cosine values, instead of semi-random specific values.
Everyone's different.
Of course!  That's exactly why I started with "For me, the typical use case is".  As the OP, I thought that describing my typical use cases, and why I started a thread about calculating individual random arguments, instead of sequences of arguments, would be informative –– and help others decide whether it is relevant to them or their problem domain ––; that's all.

Most of my work is signal-processing centric, so I most often deal with sequences of sin/cos, and "semi-random specific values" are the weird outlier.  Of course, there is a large body of literature on this... we call it Direct Digital Synthesis (DDS).  Performance is evaluated a bit differently, though.  Instead of absolute errors in the time domain we usually want to know the amplitude of the spurs relative to the intended sinusoid, for a given computational complexity.
Exactly: the problem domain is quite different.

While I have fiddled with trackers (music software), especially the low-down details of resampling, I'm not very familiar with direct digital synthesis, especially not on the algorithmic front.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15192
  • Country: fr
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #15 on: June 18, 2024, 08:33:10 am »
Generating a sine wave incrementally, and computing arbitrary values are two use cases that are both useful in practice. The former is more for implementing some kind of oscillator, and a rotation-based algo allows to change the frequency very easily, so that makes it attractive compared to storing a pre-computed period, and it takes a lot less memory.

We had discussed many options too, as I remember, in a previous thread, but I can't find it anymore.
 

Online radiolistener

  • Super Contributor
  • ***
  • Posts: 3928
  • Country: ua
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #16 on: June 18, 2024, 11:31:26 am »
9 multiplications and one division to get the sine or cosine of argument, in double precision.

Well, here is my approach: 1 multiplication and 3 additions  :)

Code: [Select]
#define SR 48000        // Sample rate
#define FREQ 1000       // 1000 Hz sine

#define PI  3.1415926535897932384626433832795
#define PW  ((float)FREQ * PI / SR)

static float R = (float)(4 * PW * PW); // for better approximation you can use R = pow(2*sin(PW), 2);
static float V = 0;
static float X = 2000;//4096/3.3 * 0.5;      // init value
static float S = 2048;//1500; // bias offset

uint16_t getSineNextSample() {
V -= X * R;
X += V;
return (uint16_t)(X + S);
}
 

Offline Picuino

  • Super Contributor
  • ***
  • Posts: 1032
  • Country: es
    • Picuino web
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #17 on: June 18, 2024, 11:37:33 am »
By adjusting the ranges of each polynomial series a little more, a smaller error over the entire x-range of the function is achieved.

 - 4 multiplications per calc
 - Error max: 3e-7

Python program:
Code: [Select]
import numpy as np
import matplotlib.pyplot as plt


def main():
    nodes = 7
    xmax = 0.10546875
    calc_sin(nodes, xmax)
    calc_cos(nodes, 0.25-xmax)
   

def calc_cos(nodes, xmax):
    x_nodes = cheby_nodes(-xmax, +xmax, nodes)
    y_nodes = cos(x_nodes)

    pol = np.polyfit(x_nodes, y_nodes, nodes-1)
    for i in range(1, nodes, 2):
        pol[i] = 0
    print()
    for i in range(0, nodes, 2):
        print(f"   float p{i} = {pol[i]:+11.8e};")

    xx = np.linspace(-xmax, xmax, 500)
    yy = np.polyval(pol, xx) - cos(xx)
   
    plt.plot(xx, yy)
    plt.xlabel('cos(x)')
    plt.show()


def calc_sin(nodes, xmax):
    x_nodes = cheby_nodes(-xmax, +xmax, nodes)
    y_nodes = sin(x_nodes)

    pol = np.polyfit(x_nodes, y_nodes, nodes-1)
    for i in range(0, nodes, 2):
        pol[i] = 0
    print()
    for i in range(1, nodes, 2):
        print(f"   float p{i} = {pol[i]:+11.8e};")

    xx = np.linspace(-xmax, xmax, 500)
    yy = np.polyval(pol, xx) - sin(xx)
   
    plt.plot(xx, yy)
    plt.xlabel('sin(x)')
    plt.show()


def cos(x):
    return np.cos(x * 2 * np.pi)

def sin(x):
    return np.sin(x * 2 * np.pi)


def cheby_nodes(a, b, nodes):
    i = np.array(range(nodes))
    x = np.cos((2 * i + 1) * np.pi / (2 * nodes))
    return 0.5 * (b - a) * x + 0.5 * (b + a)


main()


c program:
Code: [Select]
#include <math.h>
#include <stdio.h>

float sin_pol(float x);
float sin_pol_r(float x);
float cos_pol_r(float x);


main(void) {
   double x, y1, y2;

   printf("x;sin(x);Error\n", x, y1, y1-y2);
   for(x = -1.0; x <= 1.0; x += 1.0/1000) {
      y1 = sin_pol(x);
      y2 = sin(x * M_PI * 2);
      printf("%+11.8e;%+11.8e;%+11.8e\n", x, y1, y1-y2);
   }
}


float sin_pol(float x) {
    signed char sign = 0;
    if (x < 0) {
        sign = -1;
        x = -x;
    }
    if (x > 0.5) {
        x = x - 0.5;
        if (sign == 0)
            sign = -1;
        else
            sign = 0;
    }
    if (x > 0.25) {
        x = 0.5 - x;
    }
    if (x > 0.10546875) {
        if (sign == 0)
            return cos_pol_r(0.25 - x);
        else
            return -cos_pol_r(0.25 - x);
    }
    else {
        if (sign == 0)
            return sin_pol_r(x);
        else
            return -sin_pol_r(x);
    }
}


float cos_pol_r(float x) {
   float p0 = -8.32795518e+01;
   float p2 = +6.49167315e+01;
   float p4 = -1.97391497e+01;
   float p6 = +1.00000000e+00;
   float xx = x * x;
   return ((p0 * xx + p2) * xx + p4) * xx + p6;
}


float sin_pol_r(float x) {
   float p1 = +8.01233966e+01;
   float p3 = -4.13334793e+01;
   float p5 = +6.28317388e+00;
   float xx = x * x;
   return ((p1 * xx + p3) * xx + p5) * x;
}
« Last Edit: June 18, 2024, 11:39:23 am by Picuino »
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 9291
  • Country: gb
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #18 on: June 18, 2024, 11:46:57 am »
9 multiplications and one division to get the sine or cosine of argument, in double precision.

Well, here is my approach: 1 multiplication and 3 additions  :)

Code: [Select]
#define SR 48000        // Sample rate
#define FREQ 1000       // 1000 Hz sine

#define PI  3.1415926535897932384626433832795
#define PW  ((float)FREQ * PI / SR)

static float R = (float)(4 * PW * PW); // for better approximation you can use R = pow(2*sin(PW), 2);
static float V = 0;
static float X = 2000;//4096/3.3 * 0.5;      // init value
static float S = 2048;//1500; // bias offset

uint16_t getSineNextSample() {
V -= X * R;
X += V;
return (uint16_t)(X + S);
}
Be real careful with that kind of design, if you need a persistent signal. Rounding errors can easily make the output blow up or fade to zero.
 

Offline AVI-crak

  • Regular Contributor
  • *
  • Posts: 133
  • Country: ru
    • Rtos
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #19 on: June 18, 2024, 02:11:15 pm »
https://github.com/AVI-crak/Rtos_cortex/blob/master/math_s.c
6 multiplications, 6 simple additions, and some logic, the error is 1 minor floating point bit.
 

Online Nominal AnimalTopic starter

  • Super Contributor
  • ***
  • Posts: 6788
  • Country: fi
    • My home page and email address
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #20 on: June 18, 2024, 03:42:12 pm »
Ok, now I'm getting annoyed.

If you want to post your normal sine or cosine approximations, or your code uses doubles, then at least say that they are not about what the original post is about.

If you suggest a solution because maximum error at a few thousand regularly sampled intervals is small, then you are only posting a guess.  To know the error bounds, you have to check all valid float input values.

No ifs, no buts.

https://github.com/AVI-crak/Rtos_cortex/blob/master/math_s.c
6 multiplications, 6 simple additions, and some logic, the error is 1 minor floating point bit.
Nope: only normal sin() and cos() implementations there, with period 2·π.  No sines or cosines with period 1.

If you think you can just divide the argument by two pi and get a good unit period sine or cosine, you're wrong: the errors will be much larger.
 
The following users thanked this post: T3sl4co1l

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3238
  • Country: ca
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #21 on: June 18, 2024, 05:47:20 pm »
If you want to post your normal sine or cosine approximations, or your code uses doubles, then at least say that they are not about what the original post is about.

Obviously, any algorithm can be implemented in singles, doubles, fractionals, or whatnot. Also algorithms can be scaled as pleased horizontally or vertically. The scaling changes coefficients, but not the principle.

With normal (Taylor) calculation, the error will depend on the number of coefficients you use. By using more Taylor members you can decrease the error to any arbitrary value. If I remember correctly, for single precision floats you only need 3 Taylor members to calculate sin() with errors smaller than the least significant digit of the IEEE presentation. This produces very compact, fast, and efficient calculation which is hard to beat in any respect.

There are certainly many ways to calculate sin(), which may be interesting to discuss, but in practical terms, Taylor is probably the best.
 

Offline Picuino

  • Super Contributor
  • ***
  • Posts: 1032
  • Country: es
    • Picuino web
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #22 on: June 18, 2024, 06:19:10 pm »
With Taylor you get to approximate a function in such a way that the approximation is more accurate the closer it is to x=0.
If what you want is the error to be constant and minimum in all the approximated range, it is better to use interpolating polynomials choosing the zeros of the polynomial with the chebyshev coefficients.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3238
  • Country: ca
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #23 on: June 18, 2024, 06:36:20 pm »
With Taylor you get to approximate a function in such a way that the approximation is more accurate the closer it is to x=0.
If what you want is the error to be constant and minimum in all the approximated range, it is better to use interpolating polynomials choosing the zeros of the polynomial with the chebyshev coefficients.

There are two considerations for numbers close to 0.

Typically, in embedded application you will be concerned with absolute (rather than relative) errors. The fact that the presentation resolution increases as the number gets smaller is an idiosyncrasy of floating point numbers. If you use integers instead, the problem disappears.

Second, as you get closer to zero, sine function gets closer and closer to sin(x) = x which you can use for any number lower than certain threshold.
 

Online Nominal AnimalTopic starter

  • Super Contributor
  • ***
  • Posts: 6788
  • Country: fi
    • My home page and email address
Re: Lightweight sin(2·π·x), cos(2·π·x) for MCUs with float hardware
« Reply #24 on: June 18, 2024, 07:31:53 pm »
Six terms of the sin series expansion (as I listed in reply #6) suffices to get within ±3 ULPs for both unit-period sine and cosine,
    usinf(x) = (float)sin(x * 2 * Pi)
    ucosf(x) = (float)cos(x * 2 * Pi)

(The problem with using sinf(x * 6.2831855f) is that it has lots of error near x = ±1, at x = ±0.99999994f, ±0.9999999f, ±0.9999998f, ±0.99999976f, and so on.  6.2831855f is the closest value to two times pi you can get with floats.)

Example implementation:
Code: [Select]
// x = -0.25f .. 0.25f, returns (float)sin((double)x * 6.283185307179586)
static float usinf_internal(float x) {
    float  xx = x * x;
    return x * (6.2831855f - xx * (41.3417f - xx * (81.60525f - xx * (76.70586f - xx * (42.058693f - xx * 15.094643f)))));
}

// returns (float)sin((double)x * 6.283185307179586) with at most 3 ULPs of error from the expected value
float usinf(float x) {
    // Limit to one period
    x -= (int32_t)x;
    if (x < 0.0f)
        x += 1.0f;

    if (x >= 0.75f)
        return  usinf_internal(x - 1.0f);
    else
    if (x >= 0.25f)
        return -usinf_internal(x - 0.5f);
    else
        return  usinf_internal(x);
}

// returns (float)cos((double)x * 6.283185307179586) with at most 3 ULPs of error from the expected value
float ucosf(float x) {
    // Cosine is an even function
    if (x < 0.0f)
        x = -x;

    // Limit to one full period
    x -= (int32_t)x;

    if (x < 0.5f)
        return -usinf_internal(x - 0.25f);
    else
        return  usinf_internal(x - 0.75f);
}

Here is the error distribution for all non-subnormal float x between 0 and 1, inclusive:
$$\begin{array}{r|r:r|r:r}
~ & \text{usin} & ~ & \text{ucos} & ~ \\
\text{ULPs} & \text{Count} & \text{Fraction} & \text{Count} & \text{Fraction} \\
\hline
-3 &       720 &  0.00\% &     21020 &  0.00\% \\
-2 &    247531 &  0.02\% &   5355330 &  0.51\% \\
-1 &   4785888 &  0.45\% &  56436657 &  5.34\% \\
\hdashline
 0 & 698222628 & 66.06\% & 952910427 & 90.16\% \\
\hdashline
 1 & 353023465 & 33.40\% &  36374217 &  3.44\% \\
 2 &    682383 &  0.06\% &   5835909 &  0.55\% \\
 3 &      1995 &  0.00\% &     31050 &  0.00\% \\
\end{array}$$

This is in all ways superior to the solution in my initial post.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf