Author Topic: Compact sin(x) which gets to within 0.1%, x = 0 to 2xPI (Read 8086 times)

Nominal Animal · « **Reply #50 on:** December 22, 2021, 12:41:17 pm »

In case anyone is interested, I found single-precision coefficients for a sine approximation (0.99968741f*x - 0.16566247f*x**3 + 0.0075119016f*x**5) with absolute error less than ±0.0000706 over all unique single-precision floating point numbers in the first quadrant (x from 0 to PI/2, or from -PI/2 to PI/2 since sin is an odd ("antisymmetric") function):

Code: [Select]

float sinf_first_quadrant(float x)
{
    const float  xx = x * x;
    float  result = x * 0.0075119016f;
    result += -0.16566247f;
    result *= xx;
    result += 0.99968741f;
    return result * x;
}

float sinp(float phase, float scale)
{
    if (phase < 0.0f) {
        phase = -phase;
        scale = -scale;
    }
    if (phase >= 1.0f) {
        phase -= (int)phase;
    }
    if (phase >= 0.5f) {
        phase -= 0.5f;
        scale = -scale;
    }
    if (phase > 0.25f) {
        phase = 0.5f - phase;
    }
    return scale * sinf_first_quadrant(phase * 6.2831853f);
}

int sinip(float phase, float scale)
{
    if (phase < 0.0f) {
        phase = -phase;
        scale = -scale;
    }
    if (phase >= 1.0f) {
        phase -= (int)phase;
    }
    if (phase >= 0.5f) {
        phase -= 0.5f;
        scale = -scale;
    }
    if (phase > 0.25f) {
        phase = 0.5f - phase;
    }
    float  result = scale * sinf_first_quadrant(phase * 6.2831853f);
    /* return (result < 0.0f) ? (result - 0.5f) : (result + 0.5f); */
    return roundf(result);
}

This means that you can scale the result by up to 14178 or so, and when rounded, the result will be within ±1 of round(scale * sin(x)), regardless of the array size.

sinp(p, 1.0f) differs from sinf(p*6.2831853f) by at most ±0.0000706 within -2*PI to +2*PI, but
sinp(x/6.2831853f, 1.0f) can differ from sinf(x) by at most ±0.000071 within -2*PI to +2*PI.
For larger ranges, note that for sinp(), 2*PI is exactly 6.2831853f. (Near ±200*PI, the error increases to almost ±0.00011.)

You can use sinip(p, s) to approximate (int)round(s*sin(p*2*PI)) with any finite p: but note the above accuracy limitation if you go far from zero. I suggest using it to populate an array with a full sine wave, using a simple loop,
for (int i = 0; i < N; i++) sintab[i] = sinip((float)i / (float)N, S);
where N is the size of the table (one full period), and S is the maximum amplitude.

There are only 1,070,141,419 unique float values between 0 and PI/2, so I verified the absolute errors brute-force; and sinip() with a few hundred million pseudorandom numbers.

Picuino · « **Reply #51 on:** December 22, 2021, 04:44:10 pm »

Quote from: Nominal Animal on December 22, 2021, 12:41:17 pm

Code: [Select]
float sinf_first_quadrant(float x) { const float xx = x * x; float result = x * 0.0075119016f; result += -0.16566247f; result *= xx; result += 0.99968741f; return result * x; } float sinp(float phase, float scale) { if (phase < 0.0f) { phase = -phase; scale = -scale; } if (phase >= 1.0f) { phase -= (int)phase; } if (phase >= 0.5f) { phase -= 0.5f; scale = -scale; } if (phase > 0.25f) { phase = 0.5f - phase; } return scale * sinf_first_quadrant(phase * 6.2831853f); } int sinip(float phase, float scale) { if (phase < 0.0f) { phase = -phase; scale = -scale; } if (phase >= 1.0f) { phase -= (int)phase; } if (phase >= 0.5f) { phase -= 0.5f; scale = -scale; } if (phase > 0.25f) { phase = 0.5f - phase; } float result = scale * sinf_first_quadrant(phase * 6.2831853f); /* return (result < 0.0f) ? (result - 0.5f) : (result + 0.5f); */ return roundf(result); }

You can compute the coefficients with this program:

Code: (python) [Select]

import numpy as np

num_nodes = 6
interval = [-np.pi/2, np.pi/2]

def f(x):
    return np.sin(x)

def chebyshev_nodes(n, interval):
    """Return n chebyshev nodes over interval [a, b]"""
    i = np.arange(n)
    x = np.cos((2*i+1)*np.pi/(2*n)) # nodes over interval [-1,1]
    a, b = interval
    return 0.5*(b-a)*x+0.5*(b+a) # nodes over interval [a, b]

def main():
    x_dots = chebyshev_nodes(num_nodes, interval)
    y_dots = f(x_dots)
    degrees = np.arange(1, num_nodes, 2) # Only odd coefficients
    poly_coefs = np.polynomial.polynomial.polyfit(x_dots, y_dots, degrees)
    print(poly_coefs)

main()

Result:

Code: [Select]

[ 0. 0.99991153 0. -0.16602 0. 0.00762666]
Edit: I have modified the program so that it only calculates the odd coefficients of the polynomial.

Picuino · « **Reply #52 on:** December 22, 2021, 05:08:53 pm »

The advantage of my method is that you can get coefficients of any function with the precision you want in any range you want.

Mechatrommer · « **Reply #53 on:** December 22, 2021, 05:23:45 pm »

i think there will be unbreakable limit if using 5th degree polynomials.. can you fit better than 0.1% for larger range of angle with 5th degree (x^5)? i guess the solution is to increase the degree to say x^7 or x^9

Picuino · « **Reply #54 on:** December 22, 2021, 05:28:26 pm »

Yes. That is what I meant.

Code: (python) [Select]

import numpy as np

num_nodes = 8
interval = [-np.pi/2, np.pi/2]

def f(x):
    return np.sin(x)

def chebyshev_nodes(n, interval):
    """Return n chebyshev nodes over interval [a, b]"""
    i = np.arange(n)
    x = np.cos((2*i+1)*np.pi/(2*n)) # nodes over interval [-1,1]
    a, b = interval
    return 0.5*(b-a)*x+0.5*(b+a) # nodes over interval [a, b]

def main():
    x_dots = chebyshev_nodes(num_nodes, interval)
    y_dots = f(x_dots)
    degrees = np.arange(1, num_nodes, 2) # Only odd coefficients
    poly_coefs = np.polynomial.polynomial.polyfit(x_dots, y_dots, degrees)
    for i in range(len(poly_coefs)):
        coef = poly_coefs[i]
        print(f"a_{i} = {coef:.14g}")

main()

Result:

Code: [Select]

a_0 = 0
a_1 = 0.99999923706153
a_2 = 0
a_3 = -0.16665676500414
a_4 = 0
a_5 = 0.0083131914143761
a_6 = 0
a_7 = -0.000185225393246

Code: (c) [Select]

float sinf_first_quadrant(float x)
{
    const float  xx = x * x;
    float  result = - x * 0.000185225393246f;
    result += 0.0083131914143761f;
    result *= xx;
    result += -0.16665676500414f;
    result *= xx;
    result += 0.99999923706153f;
    return result * x;
}

emece67 · « **Reply #55 on:** December 22, 2021, 11:25:55 pm »

Nominal Animal · « **Reply #56 on:** December 23, 2021, 12:38:33 am »

You can use the following C (C99 or later) program to find the maximum absolute error of a single-precision odd power series expansion of the sine function in the first quadrant, with at least three coefficients (fifth degree polynomial).

Because this considers the absolute error and not the relative error as is usual, this is best suited for comparing approximations that calculate integer-valued samples of a sine wave. The reported amplitude limit is approximately the maximum integer amplitude where the sample error does not exceed ±1. I would not use this to comparing sine approximations for general use.

Code: [Select]

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <math.h>

#ifndef  MAX_COEFFS
#define  MAX_COEFFS  8
#endif

static inline float  approx(float  x, const float coeff[], int n)
{
    const float  xx = x * x;
    float    result = coeff[--n] * xx;
    while (n-->1) {
        result += coeff[n];
        result *= xx;
    }
    result += coeff[0];
    return result * x;
}

static const char *floatstring(float value)
{
    static char  buffer[1024];
    int          decimals = 1;
    float        parsed;

    while (1) {
        int  len = snprintf(buffer, sizeof buffer, "%.*f", decimals, value);
        if (len < 1 || len >= (int)sizeof buffer)
            break;

        errno = 0;
        parsed = strtof(buffer, NULL);
        if (errno)
            break;

        if (value == parsed)
            return (const char *)buffer;

        decimals++;
    }

    return "(error)";
}

static inline const char *skipws(const char *src)
{
    if (src)
        while (*src == '\t' || *src == '\n' || *src == '\v' ||
               *src == '\f' || *src == '\r' || *src == ' ')
            src++;
    return src;
}

static int  parse_float(const char *src, float *to)
{
    const char *end;
    float       val;

    src = skipws(src);
    if (!src || !*src)
        return -1;

    end = src;
    errno = 0;
    val = strtof(src, (char **)&end);
    if (errno)
        return -1;
    if (!end || end == src)
        return -1;

    end = skipws(end);
    if (*end == 'f')
        end = skipws(end + 1);
    if (*end)
        return -1;

    if (!isfinite(val))
        return -1;

    if (to)
        *to = val;
    return 0;
}

static inline float fabsmin2(const float a, const float b)
{
    const float  f1 = fabsf(a), f2 = fabsf(b);
    return (f1 < f2) ? f1 : f2;
}

int main(int argc, char *argv[])
{
    int    coeffs;
    float  coeff[MAX_COEFFS];

    if (argc < 4 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
        const char *arg0 = (argc > 0 && argv && argv[0] && argv[0][0]) ? argv[0] : "(this)";
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s [ -h | --help ]\n", arg0);
        fprintf(stderr, "       %s C1 C3 C5 [...]\n", arg0);
        fprintf(stderr, "\n");
        fprintf(stderr, "This program compares the sin approximation\n");
        fprintf(stderr, "       C1*x + C3*x*x*x + C5*x*x*x*x*x [ + ... ]\n");
        fprintf(stderr, "to the correct value of sin, at single precision (float).\n");
        fprintf(stderr, "for each possible x between 0 and PI/2, and reports\n");
        fprintf(stderr, "the maximum absolute deviation.\n");
        fprintf(stderr, "\n");
        fprintf(stderr, "Example:\n");
        fprintf(stderr, "       %s 0.99968741 -0.16566247 0.0075119016\n", arg0);
        fprintf(stderr, "\n");
        return EXIT_SUCCESS;
    }

    coeffs = argc - 1;
    if (coeffs > MAX_COEFFS) {
        fprintf(stderr, "Too many coefficients.  Maximum limit is %d.\n", MAX_COEFFS);
        return EXIT_FAILURE;
    }

    for (int i = 0; i < coeffs; i++) {
        if (parse_float(argv[i+1], coeff + i)) {
            fprintf(stderr, "%s: Invalid coefficient C%d.\n", argv[i+1], 2*i+1);
            return EXIT_FAILURE;
        }
    }

    int     n = 0;
    float   x = (float)(0.5 * 3.14159265358979323846);
    float   smin = 1.0f, s;
    double  err, err_min = -0.0, err_max = +0.0;
    do {
        s = approx(x, coeff, coeffs);
        err = (double)s - sin((double)x);
        if (err_min > err) { err_min = err; smin = 0.25f * fabsmin2(err_min, err_max); }
        if (err_max < err) { err_max = err; smin = 0.25f * fabsmin2(err_min, err_max); }
        ++n;
        if (!(n & 16777215)) {
            fprintf(stderr, "\r%+.9f .. %+.9f  (%.9f to PI/2) ", err_min, err_max, x);
            fflush(stderr);
        }
        x = nextafterf(x, -1.0f);
    } while (x > 0.0f && s > smin);

    fprintf(stderr, "\n");
    fflush(stderr);

    printf("Coefficients:");
    for (int i = 0; i < coeffs; i++)
        printf(" %s", floatstring(coeff[i]));
    printf("\nMaximum absolute error: %s", floatstring((float)err_min));
    printf(" .. %s\n", floatstring((float)err_max));
    if (err_max < -err_min) err_max = -err_min;
    printf("Approximate amplitude limit: %.0f\n", 1.0 / err_max - 0.5);

    return EXIT_SUCCESS;
}

It evaluates the polynomial at single precision, and compares to the standard C library sin() function evaluated at double precision.

This assumes that the standard C library sin() function uses correct rounding per IEEE 754, so its error in the first quadrant is within one unit in the least significant place. Typically, standard C library trigonometric functions are accurate, but not the fastest possible; if you are using a commonly used one, you should be safe assuming its implementation is accurate. I am using the GNU C library, which provides this (unless you tell the compiler to use unsafe math optimizations). I verified this with a brute-force __float128 (quadmath) sine power series approximation, too.

The iteration starts at PI/2, and checks every unique float value in decreasing order, until the series yields a value less than a quarter of the smaller error bound. At this point, for any sane approximation with first coefficient around one (say, 0.75 to 1.5), the argument is so small that a new maximum absolute error cannot be reached anymore even for the smaller error bound. That is, this limit is based on mathematical analysis of sine approximations, and not a "I think this suffices" one.

On my machine, this can evaluate the error bounds for three coefficients in about two and a half seconds; four coefficients in about three seconds.

To compile it using GCC, I use
gcc -Wall -O2 approx.c -lm -o approx
and using clang,
clang -Wall -O2 approx.c -lm -o approx
They produce identical results for the few cases I tried.

To produce better approximations, I use polynomial fit to x = 0.5*PI*(1 - (i/N)**K), y = sin(x) for N=100000, i=0..N, varying K around 1.8 or so. This way the fit emphasizes larger values of x, with K acting as the weight.

Results from my testing runs with three coefficients:

Edited: Best coefficients I've found (K=1.84697)
Coefficients: 0.99968743 -0.16566254 0.0075119236
Maximum absolute error: -0.000070561255 .. 0.000070453505
Approximate amplitude limit: 14172
My suggested coefficients
Coefficients: 0.99968743 -0.16566247 0.0075119017
Maximum absolute error: -0.00007055788 .. 0.000070572176
Approximate amplitude limit: 14169
Picuino
Coefficients: 0.99991155 -0.16602 0.00762666
Maximum absolute error: -0.00011782573 .. 0.0001343489
Approximate amplitude limit: 7443
emece67
Coefficients: 0.99991304 -0.1660249 0.007628645
Maximum absolute error: -0.00011876599 .. 0.00013673306
Approximate amplitude limit: 7313
Abramowitz & Stegun 4.3.96
Coefficients: 1.0 -0.16605 0.00761
Maximum absolute error: -0.00016428306 .. 0.00012107952
Approximate amplitude limit: 6087
Truncated power series
Coefficients: 1.0 -0.16666667 0.008333334
Maximum absolute error: -0.000000011740082 .. 0.004524946
Approximate amplitude limit: 220

Results from my testing with four coefficients:

Edited: New best coefficients I've found (K=2.109)
Coefficients: 0.9999963 -0.16664736 0.008305594 -0.00018346867
Maximum absolute error: -0.00000068000065 .. 0.00000068940767
Approximate amplitude limit: 1450520
(K=1.8390)
Coefficients: 0.9999965 -0.166648 0.008306158 -0.00018360758
Maximum absolute error: -0.00000074344683 .. 0.0000007067032
Approximate amplitude limit: 1345086
Picuino
Coefficients: 0.9999992 -0.16665676 0.008313191 -0.00018522539
Maximum absolute error: -0.0000013112358 .. 0.0000012125032
Approximate amplitude limit: 762639
Truncated power series
Coefficients: 1.0 -0.16666667 0.008333334 -0.0001984127
Maximum absolute error: -0.00015699863 .. 0.00000003811595
Approximate amplitude limit: 6369

Note that the three-term polynomial takes four multiplications and two additions, and the four-term polynomial five multiplications and three additions, using single-precision floating-point; with single-precision hardware multiplication and addition, these are surprisingly fast.

Nominal Animal · « **Reply #57 on:** December 23, 2021, 07:37:14 am »

Quote from: Mechatrommer on December 23, 2021, 06:10:05 am

i use brucehoult true taylor series loop as reference (considered as true sin).

You need higher precision to evaluate the power series accurately. Each term you calculate is subject to rounding and domain cancellation error due to limited precision, and the only way around that is more precision.

That is, the series expansion does NOT give a correct approximation up to the precision available in the floating-point type, unlike library sin() and cos() functions are supposed to.

I use GCC 7.5.0 and its quad-precision math library with

Code: [Select]

static __float128  sin_series(__float128 x)
{
    __float128  xx = -(x*x);
    __float128  result = x, old;
    int         terms = 0;

    do {
        terms++;
        old = result;
        x /= (__float128)(2*terms*(2*terms + 1));
        x *= xx;
        result += x;
    } while (result != old);

    /* Omitted: Update min and max 'terms' */

    return result;
}

with
sin power series expansion using __float128 terms that affect the result: x / (2*PI)
37 terms (x - x^3/3! + x^5/5! ... - x^71/71! + x^73/73!) up to 2*PI (0.500 .. 1.000)
29 terms (x - x^3/3! + x^5/5! ... - x^55/55! + x^57/57!) up to PI (0.250 .. 0.500)
18 terms (x - x^3/3! + x^5/5! ... + x^33/33! - x^35/35!) up to PI/2 (0.125 .. 0.250)
[ 15 terms (x - x^3/3! + x^5/5! ... - x^27/27! + x^29/29!) up to PI/4 (0.000 .. 0.125) ]

cos power series expansion using __float128 terms that affect the result:
31 terms (1 - x^2/2! + x^4/4! ... - x^58/58! + x^60/60!) up to 2*PI (0.500 .. 1.000)
24 terms (1 - x^2/2! + x^4/4! ... + x^44/44! - x^46/46!) up to PI (0.250 .. 0.500)
24 terms (1 - x^2/2! + x^4/4! ... + x^44/44! - x^46/46!) up to PI/2 (0.125 .. 0.250)
[ 15 terms (1 - x^2/2! + x^4/4! ... - x^26/26! + x^28/28!) up to PI/4 (0.000 .. 0.125) ]

but obviously only verified around the most problematic key values of x, and only uniform-randomly checked elsewhere (a few hundred million points per number of terms, in the given ranges of x).

Fewer terms are needed to match correctly-rounded IEEE-754 double-precision sin() and cos(), though. (For the first quadrant of the sine approximation, i.e. x in 0 to PI/2, or -PI/2 to PI/2, 11 terms seems to be enough. Cosine needs 17 near ±PI/2, something like 14 around 1. The degree, obviously, is 2*terms-1 for sine, and 2*terms-2 for cosine.)

I could compare the series obtained thus way to libquadmath sinq() and cosq(), as they are supposed to produce results with at most 1 ULP of error, but only by sampling; the parameter space is just too large by many orders of magnitude to check thoroughly.

brucehoult · « **Reply #58 on:** December 23, 2021, 08:13:31 am »

Quote from: Nominal Animal on December 23, 2021, 07:37:14 am

Quote from: Mechatrommer on December 23, 2021, 06:10:05 am
i use brucehoult true taylor series loop as reference (considered as true sin).
You need higher precision to evaluate the power series accurately. Each term you calculate is subject to rounding and domain cancellation error due to limited precision, and the only way around that is more precision.

That is, the series expansion does NOT give a correct approximation up to the precision available in the floating-point type, unlike library sin() and cos() functions are supposed to.

Sure it's not as exact as library functions are supposed to be, but the OP didn't need that. He only needed a 16 bit result (15 plus sign, effectively) and wasn't even asking for that to be fully accurate.

My code in single precision never needs more than 6 terms (in addition to the angle itself) to converge in the first quadrant. Each term only does one multiply, one divide, one add, so shouldn't be adding more than 1.5 ULP error per term on average, and probably less than 1.0. So that's around maybe 15 ULP (i.e. 4 bits) worst case, on a 23 bit result when only 15 are needed.

As we have already seen, this results in rounding the 15 bit result the wrong way for only about 50 out of 65536 values.

If you've got an FPU with only single precision hardware support as the OP's STM has then it would be much faster to keep a table of the 50 values that are off by 1 bit and do a binary search on it than it would be to use a higher precision calculation.

Note that using a Chebyshev polynomial is still going to give you the same 1.5 ULP error per term: one in the constant, one from the multiply, one from the add. It just might be slightly fewer terms.

Nominal Animal · « **Reply #59 on:** December 23, 2021, 09:47:28 am »

Quote from: brucehoult on December 23, 2021, 08:13:31 am

Quote from: Nominal Animal on December 23, 2021, 07:37:14 am
Quote from: Mechatrommer on December 23, 2021, 06:10:05 am
i use brucehoult true taylor series loop as reference (considered as true sin).
You need higher precision to evaluate the power series accurately. Each term you calculate is subject to rounding and domain cancellation error due to limited precision, and the only way around that is more precision.

That is, the series expansion does NOT give a correct approximation up to the precision available in the floating-point type, unlike library sin() and cos() functions are supposed to.

Sure it's not as exact as library functions are supposed to be, but the OP didn't need that. He only needed a 16 bit result (15 plus sign, effectively) and wasn't even asking for that to be fully accurate.

Yup. I only objected to using it as the reference, and showed what I myself used for a reference.

I basically just wanted to show I how I verified the discoveries that go beyond OP's needs:

With four single-precision floating-point multiplications and two additions, we have a sine approximation for single-precision sin(x) with x=-PI/2..+PI/2 with less than ±0.0000706 absolute error, or about 14.79 bits of precision (13.79 plus sign) in fixed point format
Adding one more multiplication and addition drops the absolute error to ±0.00000069, giving 21.47 bits of precision (20.47 plus sign) in fixed point format.

If we want to get more accurate than that, we need enough extra bits of precision to avoid the rounding error affecting the result. Comparing approximations to approximations (and not to a known accurate function) is, well, silly.

I consider the OP's original question well answered already; this is just extending the subject past the stated, arbitrary limits.
As usual, I'm trying to add "useful" information in case someone else has a similar approximation need, but with say different error bounds.

I have not experimented with say Q.29 (twos complement fixed-point integers with 29 fractional bits), which might be interesting on integer-only hardware for sine approximation; or say Q.60 for accurate sine/cosine approximation. But, if I were to, the reference machinery I showed can easily be adapted to check those against standard library sin()/cos(), too.

peter-h · « **Reply #60 on:** December 23, 2021, 09:50:26 am »

There are so many incredibly clever people here!

I read some clever stuff like this in a book about the Apollo guidance system, which was done in fixed point integers. They did have floats, later, implemented with some kind of interpreted language but since that CPU ran at something like 2MHz, it would have been slow - of the order of 100ms.

Picuino · « **Reply #61 on:** December 23, 2021, 10:10:59 am »

In the case of calculators and computers, it is preferable to use the CORDIC code that was mentioned in another post at the beginning.

Code: (c) [Select]

// Cordic in 16 bit signed fixed point math
// Function is valid for arguments in range -pi/2 -- pi/2
// for values pi/2--pi: value = half_pi-(theta-half_pi) and similarly for values -pi---pi/2
//
// Constants

#define cordic_1K16 0x26DD
#define half_pi16 0x6487
#define MUL16 16384.0


int cordic_ctab16 [] = {
   0x3243, 0x1DAC, 0x0FAD, 0x07F5,
   0x03FE, 0x01FF, 0x00FF, 0x007F,
   0x003F, 0x001F, 0x000F, 0x0007,
   0x0003, 0x0001, 0x0000, 0x0000,
};

void cordic16(int theta, int *s, int *c) {
   int tx, ty;
   int x=cordic_1K16;
   int y=0;
   int z=theta;
   for (int k=0; k<16; ++k) {
      tx = (x>>k);
      ty = (y>>k);
      if (z>0) {
         x -= ty;
         y += tx;
         z -= cordic_ctab16[k];
      }
      else {
         x += ty;
         y -= tx;
         z += cordic_ctab16[k];
      }
   }
   *c = x;
   *s = y;
}


// Cordic in 24 bit signed fixed point math
// Function is valid for arguments in range -pi/2 -- pi/2
// for values pi/2--pi: value = half_pi-(theta-half_pi) and similarly for values -pi---pi/2
//
// Constants
#define cordic_1K24 0x0026DD3B
#define half_pi24 0x006487ED
#define MUL24 4194304.00
int cordic_ctab24 [] = {
   0x003243F6, 0x001DAC67, 0x000FADBA, 0x0007F56E,
   0x0003FEAB, 0x0001FFD5, 0x0000FFFA, 0x00007FFF,
   0x00003FFF, 0x00001FFF, 0x00000FFF, 0x000007FF,
   0x000003FF, 0x000001FF, 0x000000FF, 0x0000007F,
   0x0000003F, 0x0000001F, 0x0000000F, 0x00000007,
   0x00000003, 0x00000001, 0x00000000, 0x00000000,
};

void cordic24(int theta, int *s, int *c) {
   int tx, ty;
   int x=cordic_1K24;
   int y=0;
   int z=theta;
   for (int k=0; k<24; ++k) {
      tx = (x>>k);
      ty = (y>>k);
      if (z>0) {
         x -= ty;
         y += tx;
         z -= cordic_ctab24[k];
      }
      else {
         x += ty;
         y -= tx;
         z += cordic_ctab24[k];
      }
   }
   *c = x;
   *s = y;
}



// Cordic in 32 bit signed fixed point math
// Function is valid for arguments in range -pi/2 -- pi/2
// for values pi/2--pi: value = half_pi-(theta-half_pi) and similarly for values -pi---pi/2
//
// Constants
#define cordic_1K32 0x26DD3B6A
#define half_pi32 0x6487ED51
#define MUL32 1073741824.0
int cordic_ctab32 [] = {
   0x3243F6A8, 0x1DAC6705, 0x0FADBAFC, 0x07F56EA6,
   0x03FEAB76, 0x01FFD55B, 0x00FFFAAA, 0x007FFF55,
   0x003FFFEA, 0x001FFFFD, 0x000FFFFF, 0x0007FFFF,
   0x0003FFFF, 0x0001FFFF, 0x0000FFFF, 0x00007FFF,
   0x00003FFF, 0x00001FFF, 0x00000FFF, 0x000007FF,
   0x000003FF, 0x000001FF, 0x000000FF, 0x0000007F,
   0x0000003F, 0x0000001F, 0x0000000F, 0x00000008,
   0x00000004, 0x00000002, 0x00000001, 0x00000000,
};

void cordic32(int theta, int *s, int *c) {
   int tx, ty;
   int x=cordic_1K32;
   int y=0;
   int z=theta;
   for (int k=0; k<32; ++k) {
      tx = (x>>k);
      ty = (y>>k);
      if (z>0) {
         x -= ty;
         y += tx;
         z -= cordic_ctab32[k];
      }
      else {
         x += ty;
         y -= tx;
         z += cordic_ctab32[k];
      }
   }
   *c = x;
   *s = y;
}



#include <math.h> // for testing only!
#include <stdio.h> // for testing only!

// Print out sin(x) vs fp CORDIC sin(x)
int main(int argc, char **argv) {
   double p, err, serr;
   int s,c;

   printf("\nCordic 16 bits\n");
   serr = 0;
   for(int i=0;i<=50;i++) {
      p = (i/50.0)*M_PI/2;
      cordic16((p*MUL16), &s, &c);
      err = 1000000*(s/MUL16-sin(p));
      serr += abs(err);
      printf("   sin(%f)=%f  err=%5.0f ppm\n", p, s/MUL16, err);
   }
   printf("   err_medio=%f ppm\n", serr/51);

   printf("\nCordic 24 bits\n");
   serr = 0;
   for(int i=0;i<=50;i++) {
      p = (i/50.0)*M_PI/2;
      cordic24((p*MUL24), &s, &c);
      err = 1000000000*(s/MUL24-sin(p));
      serr += abs(err);
      printf("   sin(%f)=%f  err=%5.0f ppb\n", p, s/MUL24, err);
   }
   printf("   err_medio=%f ppb\n", serr/51);

   printf("\nCordic 32 bits\n");
   serr = 0;
   for(int i=0;i<=50;i++) {
      p = (i/50.0)*M_PI/2;
      cordic32((p*MUL32), &s, &c);
      err = 1000000000*(s/MUL32-sin(p));
      serr += abs(err);
      printf("   sin(%f)=%f  err=%5.0f ppb\n", p, s/MUL32, err);
   }
   printf("   err_medio=%f ppb\n", serr/51);
}

nfmax · « **Reply #62 on:** December 23, 2021, 10:47:34 am »

Another wrinkle - not all errors are equally bad.

For instance, it would be be a bad thing for an approximation over the interval [0..pi/2] to give a non-zero value for an argument of zero, because it gives rise to a discontinuity when the approximation is extended to negative values by applying the symmetries of the sin() function. Similarly (depending on the application) it could be really bad news if the approximation ever returned a value > 1.0 for any argument.

Chebyschev approximation, which guarantees maximum absolute error at the bounds of the interval, may be liable to such problems. As noted by @emece67, the original authors of the A&S approximation used a technique which forced the error at the bounds to be zero, presumably in an attempt to avoid them.

Numerical analysis is fun!

Picuino · « **Reply #63 on:** December 23, 2021, 11:21:19 am »

Updated program with error measurement.

Code: (Python) [Select]

import numpy as np

num_nodes = 6
interval = [-np.pi/2, np.pi/2]

def f(x):
    return np.sin(x)

def chebyshev_nodes(n, interval):
    """Return n chebyshev nodes over interval [a, b]"""
    i = np.arange(n)
    x = np.cos((2*i+1)*np.pi/(2*n)) # nodes over interval [-1, 1]
    a, b = interval
    return 0.5*(b - a)*x + 0.5*(b + a) # nodes over interval [a, b]

def print_coefs(coefs):
    for i in range(len(coefs)):
        print(f"a_{i} = {coefs[i]:.14g}")
    
def test_errors(num_dots, interval, poly_coefs):
    i = np.arange(num_dots+1)/num_dots
    a, b = interval
    x_dots = a + i*(b-a)
    y_dots = f(x_dots)
    y_poly_dots = np.polyval(poly_coefs, x_dots)
    max_err_abs = 0
    max_err_rel = 0
    for i in range(len(x_dots)):
        err_abs = np.abs(y_dots[i] - y_poly_dots[i])
        if err_abs > max_err_abs:
            max_err_abs = err_abs
        if y_dots[i] != 0:
            err_rel = err_abs / y_dots[i]
            if err_rel > max_err_rel:
                max_err_rel = err_rel
        elif y_poly_dots[i] != 0:
            max_err_rel = np.inf

    print(f"\nMax absolute error = {max_err_abs:.14g}")
    print(f"Max relative error = {max_err_rel:.14g}")
    print(f"Max polynomial value = {max(y_poly_dots):.14g}")
    print(f"Min polynomial value = {min(y_poly_dots):.14g}")
    
def main():
    x_dots = chebyshev_nodes(num_nodes, interval)
    y_dots = f(x_dots)
    degrees = np.arange(1, num_nodes, 2) # Only odd coefficients
    poly_coefs = np.polynomial.polynomial.polyfit(x_dots, y_dots, degrees)
    print_coefs(poly_coefs)
    test_errors(10000, [0, interval[1]], np.flip(poly_coefs))

main()

Result:

Code: [Select]

a_0 = 0
a_1 = 0.9999115283796
a_2 = 0
a_3 = -0.16602000425895
a_4 = 0
a_5 = 0.0076266621511779

Max absolute error = 0.0001342309422232
Max relative error = 0.0001342309422232
Max polynomial value = 1.0001342309422
Min polynomial value = 0

Nominal Animal · « **Reply #64 on:** December 23, 2021, 12:54:05 pm »

Quote from: nfmax on December 23, 2021, 10:47:34 am

Similarly (depending on the application) it could be really bad news if the approximation ever returned a value > 1.0 for any argument.

Good point.

This is also related to absolute error as a measurement. For basic sine wave synthesis, it is the absolute error that matters most, and exceeding 1.0 is not a problem; even the rounding differences just adds a tiny bit of noise to the output.
For most other use cases, relative error is what actually matters.

As an example, when calculating versors (unit quaternions) or rotation matrices from axis and angle, sine approximation exceeding 1.0 would definitely be a problem.

The odd power series expansion, \$f(x) = \sum_{k=1}^{N} C_{2 k - 1} x^{2 k - 1}\$, has the nice feature that it is guaranteed to be odd (and thus be zero at zero), given nonzero coefficients \$C\$. You can define the highest-degree coefficient explicitly as $$C_{2 N - 1} = \frac{2^{2 N - 1}}{\pi^{2 N - 1}} - \sum_{k=1}^{N-1} \frac{2^{2 N - 2 k} }{\pi^{2 N - 2 k}} C_{2 k - 1}$$when defining the polynomial to fit to, to ensure the series yields 1 at \$\pm\pi/2\$. (I prefer notation where the subscript matches the power of x.)

Picuino · « **Reply #65 on:** December 23, 2021, 01:41:22 pm »

By putting the interpolation nodes at the beginning and at the end of the interval, the polynomial has little bit more relative error, but it ensures that it does not exceed 1.

Code: (Python) [Select]

import numpy as np

num_nodes = 6
interval = [-np.pi/2, np.pi/2]

def f(x):
    return np.sin(x)

def chebyshev_nodes(n, interval):
    """Return n chebyshev nodes over interval [a, b]"""
    i = np.arange(n)
    x = np.cos((2*i)*np.pi/(2*(n-1))) # nodes over interval [-1, 1]
    a, b = interval
    return 0.5*(b - a)*x + 0.5*(b + a) # nodes over interval [a, b]

def print_coefs(coefs):
    for i in range(len(coefs)):
        print(f"a_{i} = {coefs[i]:.14g}")
    
def test_errors(num_dots, interval, poly_coefs):
    i = np.arange(num_dots+1)/num_dots
    a, b = interval
    x_dots = a + i*(b-a)
    y_dots = f(x_dots)
    y_poly_dots = np.polyval(poly_coefs, x_dots)
    max_err_abs = 0
    max_err_rel = 0
    for i in range(len(x_dots)):
        err_abs = np.abs(y_dots[i] - y_poly_dots[i])
        if err_abs > max_err_abs:
            max_err_abs = err_abs
        if y_dots[i] != 0:
            err_rel = err_abs / y_dots[i]
            if err_rel > max_err_rel:
                max_err_rel = err_rel
        elif y_poly_dots[i] != 0:
            max_err_rel = np.inf
            
    print(f"\nMax absolute error = {max_err_abs:.14g}")
    print(f"Max relative error = {max_err_rel:.14g}")
    print(f"Max polynomial value = {max(y_poly_dots):.14g}")
    print(f"Min polynomial value = {min(y_poly_dots):.14g}")

def main():
    x_dots = chebyshev_nodes(num_nodes, interval)
    y_dots = f(x_dots)
    degrees = np.arange(1, num_nodes, 2) # Only odd coefficients
    poly_coefs = np.polynomial.polynomial.polyfit(x_dots, y_dots, degrees)
    print_coefs(poly_coefs)
    test_errors(10000, [0, interval[1]], np.flip(poly_coefs))

main()

Output:

Code: [Select]

a_0 = 0
a_1 = 0.99982457406782
a_2 = 0
a_3 = -0.16573991207075
a_4 = 0
a_5 = 0.0075133914860062

Max absolute error = 0.00013030270177095
Max relative error = 0.00017542591003003
Max polynomial value = 1
Min polynomial value = 0

brucehoult · « **Reply #66 on:** December 23, 2021, 03:05:27 pm »

Quote from: Nominal Animal on December 23, 2021, 12:54:05 pm

Quote from: nfmax on December 23, 2021, 10:47:34 am
Similarly (depending on the application) it could be really bad news if the approximation ever returned a value > 1.0 for any argument.
Good point.

This is also related to absolute error as a measurement. For basic sine wave synthesis, it is the absolute error that matters most, and exceeding 1.0 is not a problem; even the rounding differences just adds a tiny bit of noise to the output.
For most other use cases, relative error is what actually matters.

For sin() you can easily control the relative error in the series expansion by instead calculating a series for sin(x)/x and multiplying by x at the end.

When calculating using Horner's method, this is as simple as setting the first term to 1.0 instead of x, which has the effect of dividing every term by x.

Then multiply the final sum by x.

Picuino · « **Reply #67 on:** December 24, 2021, 01:48:09 pm »

updated program with graphical representation of errors (absolute and relative)

Code: (Python) [Select]

import matplotlib.pyplot as plt
import numpy as np

num_nodes = 6
interval = [-np.pi/2, np.pi/2]

def f(x):
    return np.sin(x)

def chebyshev_nodes(n, interval, closed_interval=False):
    """Return n chebyshev nodes over interval [a, b]"""
    i = np.arange(n)
    if closed_interval:
       x = np.cos((2*i)*np.pi/(2*(n-1))) # nodes over closed interval [-1, 1]
    else:
       x = np.cos((2*i+1)*np.pi/(2*n)) # nodes over open interval (-1, 1)
    a, b = interval
    return 0.5*(b - a)*x + 0.5*(b + a) # nodes over interval [a, b]

def print_coefs(coefs):
    print("\nCoefficients:")
    for i in range(len(coefs)):
        print(f"    a_{i} = {coefs[i]:.14g}")

def tics(interval, numtics):
    a, b = interval
    return np.linspace(a, b, numtics)

def plot_func(x, y, err_abs, err_rel):
    fig, ax = plt.subplots(3)
    fig.subplots_adjust(left=0.1, bottom=0.05, right=0.95, top=0.95, wspace=None, hspace=0.35)

    ax[0].plot(x, y, linewidth=1)
    ax[0].set_title('Function')
    ax[0].spines['left'].set_position('zero')
    ax[0].spines['right'].set_color('none')
    ax[0].spines['bottom'].set_position('zero')
    ax[0].spines['top'].set_color('none')  

    ax[1].plot(x, err_abs, linewidth=1)
    ax[1].set_title('Polynomial absolute error')
    ax[1].spines['left'].set_position('zero')
    ax[1].spines['right'].set_color('none')
    ax[1].spines['bottom'].set_position('zero')
    ax[1].spines['top'].set_color('none')   

    ax[2].plot(x, err_rel, linewidth=1)
    ax[2].set_title('Polynomial relative error')
    ax[2].spines['left'].set_position('zero')
    ax[2].spines['right'].set_color('none')
    ax[2].spines['bottom'].set_position('zero')
    ax[2].spines['top'].set_color('none') 

    plt.show()

def test_errors(interval, poly_coefs, num_dots=1000):
    x_dots = np.linspace(interval[0], interval[1], num_dots)
    y_dots = f(x_dots)
    y_poly_dots = np.polyval(poly_coefs, x_dots)

    # Compute errors
    err_abs = y_dots - y_poly_dots
    err_abs_max = max(np.abs(err_abs))
    err_rel = np.arange(num_dots).astype(float)
    for i in range(len(x_dots)):
        if y_dots[i] == 0:
            if y_poly_dots[i] == 0:
                err_rel[i] = 0.0
            else:
                err_rel[i] = np.inf
        else:
            err_rel[i] = err_abs[i] / y_dots[i]
    err_rel_max = max(np.abs(err_rel))

    # Show errors
    print(f"\nMax absolute error = {err_abs_max:.14g}")
    print(f"Max relative error = {err_rel_max:.14g}")
    print(f"Max polynomial value = {max(y_poly_dots):.14g}")
    print(f"Min polynomial value = {min(y_poly_dots):.14g}")
    plot_func(x_dots, y_dots, err_abs, err_rel)

def main():
    x_dots = chebyshev_nodes(num_nodes, interval, closed_interval=True)
    print(f"x nodes = {x_dots}")
    y_dots = f(x_dots)
    degrees = np.arange(1, num_nodes, 2) # 2 = compute only odd coefficients
    poly_coefs = np.polynomial.polynomial.polyfit(x_dots, y_dots, degrees)
    print_coefs(poly_coefs)
    test_errors([0, interval[1]], np.flip(poly_coefs))

main()

Output:

Code: [Select]

x nodes = [ 1.51727274  1.11072073  0.40655201 -0.40655201 -1.11072073 -1.51727274]

Coefficients:
    a_0 = 0
    a_1 = 0.9999115283796
    a_2 = 0
    a_3 = -0.16602000425895
    a_4 = 0
    a_5 = 0.0076266621511779

Max absolute error = 0.0001342309422232
Max relative error = 0.0001342309422232
Max polynomial value = 1.0001342309422
Min polynomial value = 0

emece67 · « **Reply #68 on:** December 24, 2021, 07:09:44 pm »

Nominal Animal · « **Reply #69 on:** December 25, 2021, 03:51:39 am »

Excellent work!

TL;DR:

emece67's Remez-based coefficients yield the single-precision, four-multiplication and two-addition first-quarter sine approximation function that thus far yields the smallest absolute error, -0.00006786226 to 0.00006776527. In C, we can write it as

Code: [Select]

float sin_approx(const float x)
{
    const float  xx = x * x;
    float  result =  0.0075143874f * xx;   /*  0.0075143873691558837890625   */
    result += -0.16567312f;                /* -0.16567312180995941162109375  */
    result *= xx;
    result += 0.9996968f;                  /*  0.999696791172027587890625    */
    return result * x;
}

where the commented numeric constants are the exact decimal values the shorter floating point constants refer to, in case anyone wants to do some exact numerical error analysis.

Edited to add: there is a tiny bit better solution at -178+19-3 ULPs,

Code: [Select]

float sin_approx(const float x)
{
    const float  xx = x * x;
    float  result =  0.0075143045f * xx;   /*  0.007514304481446743011474609375 */
    result += -0.16567284f;                /* -0.165672838687896728515625 */
    result *= xx;
    result += 0.9996966f;                  /*  0.99969661235809326171875 */
    return result * x;
}

which yields maximum absolute error -0.0000677852 .. 0.000067783265.

In exact values, the Remez approximation is
y_remez(x) = x * ( 0.999696791172027587890625 + x * x * ( - 0.16567312180995941162109375 + x * x * 0.0075143873691558837890625 ) )
and the best I found is
y_na(x) = x * ( 0.9996874332427978515625 + x * x * ( - 0.16566254198551177978515625 + x * x * 0.0075119235552847385406494140625 ) )
and their difference is
y_na(x) - y_remez(x) = x * ( -0.000009357929229736328125 + x * x * ( 0.0000105798244476318359375 + x * x * 0.0000024638138711452484130859375 ) )
That is, if the short-form floating-point coefficients refer to the above exact decimal values. (I use a helper program to calculate and display these, of course.)
I chose the order of the functions in the difference to make the early differences negative, to leave room for the key in the upper left; not because even I am that conceited.

As a plot, the errors (and their differences, since that is also their difference in errors):

difference uses the right vertical axis, and phase being x in terms of 2*PI.

Not only do the Remez-based coefficients by emece67 (0.9996968 -0.16567312 0.0075143874) yield smaller absolute error, but they also provide more symmetric absolute error.

The three extrema of the error in the Remez-based approximation are at x≃0.347884, x≃0.976371, and x≃1.413914, (phases 0.05537, 0.15539, and 0.22503) where the absolute errors are approximately -0.0000677, +0.0000677, and -0.0000677: basically the same (or at least very, very close). Even at the upper limit, x=PI/2≃1.57080, the absolute error is the same, approximately +0.0000677. This is a very, very nice three-term sine approximation at single precision; I really like it.

The errors in the best approximation I found earlier (via simple least squares polynomial fitting with controlled distribution of x within 0 to PI/2) are quite asymmetric in comparison.

The errors in the -178+19-3 solution are obviously close to the Remez-based approximation, because the coefficients differ only by a tiny bit; the modified coefficients just happen to quantize better to single-precision floating point numbers. At this scale, the maximum absolute error function is not really a continuous function of the coefficients anymore (because of those per-term rounding errors canceling out now and then; basically a quantization problem), and I suspect only brute force or similar discrete searches in the parameter space can "improve" the coefficients wrt. absolute error. I would not be at all surprised if there are even better solutions in the ±500±100±10 ULP region near the Remez coefficients (or one of the million possible unique single-precision triplets where C1 is between 0.9996960163116455078125 and 0.999697208404541015625, C3 is between -0.165672838687896728515625 and -0.16567134857177734375, and C5 is between 0.007514071650803089141845703125 and 0.007514304481446743011474609375).

T3sl4co1l · « **Reply #70 on:** December 25, 2021, 03:55:54 pm »

Quote from: Nominal Animal on December 25, 2021, 03:51:39 am

Edited to add: there is a tiny bit better solution at -178+19-3 ULPs,
Code: [Select]
float sin_approx(const float x) { const float xx = x * x; float result = 0.0075143045f * xx; /* 0.007514304481446743011474609375 */ result += -0.16567284f; /* -0.165672838687896728515625 */ result *= xx; result += 0.9996966f; /* 0.99969661235809326171875 */ return result * x; }which yields maximum absolute error -0.0000677852 .. 0.000067783265.

Which, FWIW, are equivalent to fractions of:
3253427/432964489
2779529/16777216
8386063/8388608
if that's any help to those of you working in fixed point. The latter two of which are over powers of 2, which makes sense enough (lg(16..M) = 24 bits, sounds suspiciously float-y) but the first one is actually quite a bit more. (I'm just using my JS calculator for this, so it's working in that default (double) precision.) The nearest float of it is a modest 742/98745 = 0.007514304521.., and two steps down is the highest gain term, which... well, play with it yourself, I'm not gonna read off all the probably-useless stats here anyway.

https://www.seventransistorlabs.com/Calc/Frac.html
Also, asking WolframAlpha about the number gives a good approximation (to full digits), and asking about convergents or continued fraction thereof gets the terms of the above expansion.

Tim

emece67 · « **Reply #71 on:** December 25, 2021, 11:26:47 pm »

brucehoult · « **Reply #72 on:** December 25, 2021, 11:52:07 pm »

Quote from: emece67 on December 25, 2021, 11:26:47 pm

Unfortunately not all functions are as well behaved as sin() for these polynomial approximations, I'm looking now for ways to efficiently compute log2(), but it is not as nice as sin().

Obviously, you only want to do that for x-1 with x between 1.0 and 2.0, just as you only want to do sin() between 0 and pi/2. i.e. both x-1 and log2(x-1) lie between 0.0 and 1.0.

emece67 · « **Reply #73 on:** December 25, 2021, 11:55:35 pm »

Of course, but the approximation is still worse.

SiliconWizard · « **Reply #74 on:** December 26, 2021, 02:37:34 am »

Do you need something fast, or something light in code and not requiring any math library?


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Compact sin(x) which gets to within 0.1%, x = 0 to 2xPI (Read 8086 times)

Share me