-
Currently I am using standard sin(x) but math.h takes up about 6k (STM 32F417) and I would like to reduce this. There is a lot of stuff online but most of it uses stuff like pow(x,y) which also uses math.h so is dumb unless you just want more speed. My use of sin(x) is for table precomputation only and it doesn't matter how slow it is. Also a lot of code posted doesn't work :)
Does anyone have a compact C function which has actually been tested?
It is to generate a table of signed 16 bit ints, scaled so that +/- 30000 (approx) is the max value. So I don't need floats at all really.
One problem is that the table size (64/128/256/etc) is set at compile time, otherwise I could just precompute the table, and say a 128 sized table would be just 256 bytes and that would be it. Yes of course anything is possible; I could get somebody on freelancer.com, most likely in the former USSR, to knock up a prog for £50 which reads the .h file where the table size is defined and generates it, and then it can be #included, but then I have to make sure the chain of dependencies is correctly set up (I am using ST Cube IDE) etc. A float sin(x) would be a lot neater.
I am also aware that one needs to compute only the octant, and the other 7/8 of the circle can be obtained by swapping things around. I did this (the Horn algorithm) on a Z80 in the 1980s, but it was primitive and inaccurate - it was for drawing circles on a 512x512 display.
-
Try googling: CORDIC in C
Wikipedia has a decent page on the CORDIC (https://en.wikipedia.org/wiki/CORDIC) algorithm itself.
-
You also don't need to parse anything if the number of possible sizes is limited (and it would be limited by memory size, of course). Just generate tables for all sizes and #include the one you need depending on the define. All the code except for the tables themselves would be hand-written and would not require any special maintenance or external tools in the workflow.
And if you can drop the requirement to < 2%, then there is this absolutely trivial formula https://en.wikipedia.org/wiki/Bhaskara_I%27s_sine_approximation_formula
And as usual, Taylor series is a good approach if you just need something that is simple to implement, but not necessarily the fastest.
-
I already gave the runtime function in the other thread, so calculating the table I guess is the remaining part?
https://www.seventransistorlabs.com/sintables.html (https://www.seventransistorlabs.com/sintables.html)
Pop open the console, adjust the obvious parameters and run the update function again for whatever you want.
Tim
-
Have a look at how the Yamaha DX7 chip implemented a fast sine function (actually the logarithm of the sine) using the chip resources available in 1983. It may give you some ideas :)
http://www.righto.com/2021/12/yamaha-dx7-reverse-engineering-part-iii.html (http://www.righto.com/2021/12/yamaha-dx7-reverse-engineering-part-iii.html)
-
Thank you all for the suggestions.
I am being lazy and was looking for something that is usable straight off and works :)
It was CORDIC that I implemented in Z80 assembler. The version published by Mr Horn, c. 1976, was used in the French Thomson-CSF (the forerunner of ST that we know today) 9365 graphics chip.
I have also seen a sine generator which uses a recirculating shift register principle. It generates integer sin(x) to any desired precision, with a direct digital output, but the output is continuous; you can't just feed in any gixen x.
I suspect a Taylor implementation, with say 10 iterations, will do, but I have not found code online which looked like it actually works.
-
.
-
I'm not sure what the difficulty is here. Sine is rather easy to calculate, if it doesn't have to be fast or accurate to the last bit.
Off the top of my head ...
#include <stdio.h>
#define TBLSZ 64
#define PI 3.14159265358979323846
#define SCALE 32767
int my_sin(int i){
double rad = 2*PI*i/TBLSZ, term=rad, sin=rad, old_sin=0;
for (int n=3; sin != old_sin; n+=2){
old_sin = sin;
term *= (rad*rad) / (-n * (n-1));
sin += term;
}
return (sin * SCALE) + (sin >= 0 ? 0.5 : -0.5);
}
int main(){
printf("short sintbl[] = {\n");
for (int i=0; i<TBLSZ; ++i)
printf(" %6d,\n", my_sin(i));
printf("};\n");
return 1;
}
As I understand, the 32F417 has an FPU so you don't have to worry about using fixed point to create your table.
-
Generating tables at compile time sounds trivial enough. Why would you need to hire someone to do it? It should be a few minutes work to write the utility?
-
.
-
What, you want to custom implement sin() only to run at the init, generating a fixed LUT, only because the size of that LUT is a compile-time constant?
I fail to see the problem in just creating the tooling. You'd have done it in the same time it took to write that post. For example, a small C program that #includes the same header (or takes the same command line parameters) where the desired size is. Then run sin() in a loop, write results to a file in C array syntax. #include that generated file where needed, and do a safety check that the size matches (#if DESIRED_SIZE != sizeof sin_table #error Workflow error). Just add that generator as a step in a makefile before the compilation of the file(s) that need this table. If Cube IDE makes it hard or impossible to modify makefile, then I'm afraid you have no other options than not use it. If some manager demands it, you can "keep using" it without actually using it.
-
Sorry, guys, of course you are right. I should just generate the LUTs for 64/128/256/512/1024 values per cycle and #include them according to the SPC (samples per cycle) value which is #defined in the embedded code.
I just don't know how to write a win32 command line program. I should find out how, but right now I don't have a win32 command line C compiler on my PC. Last time I did that sort of thing was in the days of Borland C++ :) Or I would have used Basic, which came with every "IBM PC". I have found this: https://www.freebasic.net/. (https://www.freebasic.net/.)
Re "#error Workflow error" that is something else I wondered about: how does one generate warnings, or fatal compilation errors, in GCC. That's very useful.
-
Sorry, guys, of course you are right. I should just generate the LUTs for 64/128/256/512/1024 values per cycle and #include them according to the SPC (samples per cycle) value which is #defined in the embedded code.
I just don't know how to write a win32 command line program. I should find out how, but right now I don't have a win32 command line C compiler on my PC. Last time I did that sort of thing was in the days of Borland C++ :) Or I would have used Basic, which came with every "IBM PC". I have found this: https://www.freebasic.net/. (https://www.freebasic.net/.)
Re "#error Workflow error" that is something else I wondered about: how does one generate warnings, or fatal compilation errors, in GCC. That's very useful.
if you can live with the dirty feeling, you could just use a spreadsheet ...
-
What, you want to custom implement sin() only to run at the init, generating a fixed LUT, only because the size of that LUT is a compile-time constant?
The goal was to create a constexpr function for the sin table, and not all C++ compilers allow you to use the math functions, sin(), cos(), sqrt(), etc., in constexpr expressions/functions.
GCC is an exception is this case, and has implemented the std math functions as constexpr (when it is possible):
https://stackoverflow.com/questions/27744079/is-it-a-conforming-compiler-extension-to-treat-non-constexpr-standard-library-fu (https://stackoverflow.com/questions/27744079/is-it-a-conforming-compiler-extension-to-treat-non-constexpr-standard-library-fu)
https://stackoverflow.com/questions/17347935/constexpr-math-functions (https://stackoverflow.com/questions/17347935/constexpr-math-functions)
I just don't know how to write a win32 command line program. I should find out how, but right now I don't have a win32 command line C compiler on my PC. Last time I did that sort of thing was in the days of Borland C++ :) Or I would have used Basic, which came with every "IBM PC". I have found this: https://www.freebasic.net/. (https://www.freebasic.net/.)
For one-time compilation there are several online compilers available, e.g.:
https://www.onlinegdb.com/online_c_compiler (https://www.onlinegdb.com/online_c_compiler)
https://rextester.com/l/cpp_online_compiler_gcc (https://rextester.com/l/cpp_online_compiler_gcc)
-
I should just generate the LUTs for 64/128/256/512/1024 values per cycle
Ehh ?, dont double things many times, just shift/skip bytes and use only 1 table.
My sine.h file includes all sizes lookuptable with the right define.
I add the file and say how big i want : 2048 maybe 4096 used without interpolation.
Can add to RAM also for better performance.
-
I just don't know how to write a win32 command line program. I should find out how, but right now I don't have a win32 command line C compiler on my PC. Last time I did that sort of thing was in the days of Borland C++ :) Or I would have used Basic, which came with every "IBM PC".
MINGW is a quick install. Cross-compling between Windows/linux is a bit more tedious to set up, but luckily you don't need to do that.
I really prefer my tooling in C, not just because I like C, but because you can include shared headers with datatypes, constants and macros, even share functions or complete compilation units between PC/MCU side, etc. Of course need to be careful with data type widths (always use stdint), implementation defined areas, endianness etc., but with great responsibility comes great power.
If you have just those few different options and you are quite sure they don't change often, just create the files once by whatever means. Matlab, Octave, Python, spreadsheet, heck, Qbasic or how about 8088 assembly or maybe COBOL? The point is, create all the files once and include the right one automagically (#if LEN == 128 #include lut128.inc #elif LEN == 256... ... ... #elif #error Illegal LEN #endif)
-
As you may've noticed, I have a tendency to write my tooling in JS... I mean, a browser is basically anywhere you are, in fact you're even using it right now! ;D
Tim
-
Do it in Excel and save as CSV :P
Then it's a simple substitution in Notepad to get C code.
-
Its easy enough to use worksheet formulae to get Excel to emit valid C code for an array as a column, then copy/paste that column into a source file.
-
I just don't know how to write a win32 command line program. I should find out how, but right now I don't have a win32 command line C compiler on my PC.[/url]
Any C or C++ compiler for Windows can generate command line programs. If you ever have Visual Studio, for instance, it absolutely can (and btw, its compilers are command line too!)
But you can use many other compilers (GCC, Clang, whatever...) available on Windows. Or you could use an interpreted language if you prefer/find it easier in your case. Some will suggest Python, I personally mostly use Lua instead, and can do this kind of things with a few lines of code.
-
One problem is that the table size (64/128/256/etc) is set at compile time, otherwise I could just precompute the table, and say a 128 sized table would be just 256 bytes and that would be it. Yes of course anything is possible; I could get somebody on freelancer.com, most likely in the former USSR, to knock up a prog for £50 which reads the .h file where the table size is defined and generates it, and then it can be #included, but then I have to make sure the chain of dependencies is correctly set up (I am using ST Cube IDE) etc. A float sin(x) would be a lot neater.
If you generate all options as const tables, with a little bit if python code for example, and then add all of them to your project, but you only reference one of then, the rest will be removed by the linker.
You could even, if you wanted to, have the python generate the list based on what is in the #define.... if you add the python as a build-step.
import math
def calculateTable(length):
table = []
for i in range(length):
value = math.cos( 2 * math.pi * i / length );
table.append(value)
return table
n = 256
table = calculateTable(n)
f = open("table"+str(n)+".c", "w", encoding="ansi")
f.write("float table"+str(n)+"["+str(n)+"] = {\n")
for x in table:
f.write(" "+str(x)+"f,\n")
f.write("};\n")
f.close()
I do that all the time. This is why python is amazing.
-
I did it in Freebasic, in about 10 mins
dim twopi as double
dim x as double
twopi = 2*3.1415926
for x = 0 to twopi step twopi/1024
print x, sin(x)
next x
Now I just need to generate the C const format.
Took me a while to realise that real men don't use line numbers anymore :) :)
-
Here is what I'd use, assuming single-precision floating-point hardware is available, noting that
sinscale(p, s) ≃ (int)round(s * sin(2 * PI * p))
/* phase = x / (2*PI) */
int sinscale(float phase, float scale)
{
/* Optional: Handle negative phases */
if (phase < 0.0f) {
phase = -phase;
scale = -scale;
}
/* Optional: Handle phases outside a full period */
if (phase >= 1.0f) {
phase -= (float)(int)phase;
}
/* Second half of each period is the first half negated */
if (phase >= 0.5f) {
phase -= 0.5f;
scale = -scale;
}
/* Second quarter is first quarter mirrored: */
if (phase > 0.25f) {
phase = 0.50f - phase;
}
/* Convert phase to radians. */
float x = phase * 6.28318530717958647692528676655900576839433879875021164f;
const float xx = x*x;
x *= scale;
/* Power series expansion. First term is x. */
const float term1 = x;
/* term -x^3/3! */
x *= xx;
x /= 2*3;
const float term3 = x;
/* +x^5/5! */
x *= xx;
x /= 4*5; /* 4*5 */
const float term5 = x;
/* -x^7/7! */
x *= xx;
x /= 6*7;
const float term7 = x;
/* +x^9/9! */
x *= xx;
x /= 8*9;
/* Sum terms in increasing order of magnitudes, to minimize rounding errors. */
x -= term7;
x += term5;
x -= term3;
x += term1;
/* Rounding. */
return (x < 0.0f) ? (x - 0.5f) : (x + 0.5f);
}
If you are interested in why one would implement the power series expansion of sin() (https://en.wikipedia.org/wiki/Trigonometric_functions#Power_series_expansion) that way: we want to do the addition of the terms in order of increasing magnitude, to minimize domain cancellation errors.
For N=65536 and S=32000, i.e. sinscale((float)i/(float)N, S) ≃ (int)round(S*sin(2*PI*i/N)), 99% of the values are exact, and the rest only differ by ±1. Same for N=65536 and S=32767.
For N=1048576 and S=262144, 91% of the values are exact, and the rest only differ by ±1.
In general, for S ≤ 262144, I do believe the maximum difference is ±1.
This works well for non-power of two N, including primes.
If you are using C++, then consider expanding what emece67 (https://www.eevblog.com/forum/programming/compact-sin(x)-which-gets-to-within-0-1-x-0-to-2xpi/msg3880913/#msg3880913) suggested, into something like the following:
#define HALFPI 1.57079632679489661923132169163975144209858469968755291
#define PI 3.14159265358979323846264338327950288419716939937510582
#define TWOPI 6.28318530717958647692528676655900576839433879875021164
template <class T, int N, T M>
class sine_table {
public:
T value[N];
constexpr sine_table(): value() {
for (int i = 0; i < N; i++) {
double x = TWOPI * i / N;
double s = (double)M;
if (x >= PI) {
x -= PI;
s = -s;
}
if (x >= HALFPI) {
x = PI - x;
}
const double xx = x*x;
/* Power series begins with the x term */
x *= s;
const double t1 = x;
/* -x**3/3! */
x *= xx;
x /= 2*3;
const double t3 = x;
/* +x**5/5! */
x *= xx;
x /= 4*5;
const double t5 = x;
/* -x**7/7! */
x *= xx;
x /= 6*7;
const double t7 = x;
/* +x**9/9! */
x *= xx;
x /= 8*9;
const double t9 = x;
/* -x**11/11! */
x *= xx;
x /= 10*11;
const double t11 = x;
/* +x**13/13! */
x *= xx;
x /= 12*13;
const double t13 = x;
/* -x**15/15! */
x *= xx;
x /= 14*15;
const double t15 = x;
/* +x**17/17! */
x *= xx;
x /= 16*17;
const double t17 = x;
/* -x**19/19! */
x *= xx;
x /= 18*19;
const double t19 = x;
/* +x**21/21! */
x *= xx;
x /= 20*21;
x -= t19;
x += t17;
x -= t15;
x += t13;
x -= t11;
x += t9;
x -= t7;
x += t5;
x -= t3;
x += t1;
value[i] = (x < 0.0) ? (x - 0.5) : (x + 0.5);
}
}
};
#undef HALFPI
#undef PI
#undef TWOPI
gives about 50-bit approximation at compile time (±8 ULP in double precision). For example,
constexpr sine_table<int16_t, 1048573, 32767> table1;
constexpr sine_table<int16_t, 65536, 32000> table2;
take annoyingly long to compile, but the generated tables have
table1.value[i] == round(32767*sin(2*TWOPI*i/1048573));
table2.value[j] == round(32000*sin(2*TWOPI*j/65536));
with exact values including correct rounding (towards nearest integer, halfway away from zero).
If you use precalculated tables, I recommend you define preprocessor macros for the size (N) and scale (S), and use e.g.
#if N-0 == 0
#error Sine table size (N) is zero
#elif S-0 == 0
#error Sine table scale (S) is zero
#elif N == 4096 && S == 32000
#include "sin-4096-32000.h"
#elif N == 16384 && S == 32000
#include "sin-16384-32000.h"
#elif N == 65536 && S == 32000
#include "sin-65536-32000.h"
#else
#error Unknown sine table size (N) and/or scale (S).
#endif
with whatever macro names (N and S) you prefer. In the preprocessor tests, N-0 yields zero for both N=0 and when N is undefined, and generates an error if N is not an integral constant.
-
If you are interested in why one would implement the power series expansion of sin() (https://en.wikipedia.org/wiki/Trigonometric_functions#Power_series_expansion) that way: we want to do the addition of the terms in order of increasing magnitude, to minimize domain cancellation errors.
For N=65536 and S=32000, i.e. sinscale((float)i/(float)N, S) ≃ (int)round(S*sin(2*PI*i/N)), 99% of the values are exact, and the rest only differ by ±1. Same for N=65536 and S=32767.
My much simpler and smaller code in ...
https://www.eevblog.com/forum/programming/compact-sin(x)-which-gets-to-within-0-1-x-0-to-2xpi/msg3880130/#msg3880130 (https://www.eevblog.com/forum/programming/compact-sin(x)-which-gets-to-within-0-1-x-0-to-2xpi/msg3880130/#msg3880130)
... gets the exact same 16 bit integer value as using the C library sin() function in 100% of cases (N=65536, S=32000 or 32767), despite not bothering with quadrants etc (it just uses the raw radian value up to 2*Pi in the power series) and not bothering to add up smallest terms first.
It seems using a sufficient number of terms is more important than the other factors.
My code is slower (calculates more terms), but the OP says that's not important.
-
Now I still think a small tool to generate this at compile time makes more sense if the N is fixed at compile time. But, this can still be a fun exercise to do this efficiently at run time. Or could be mandatory if N can be dynamic.
Apart from the approaches already shown above, my "favorite" one for incremental generation of sine/cosine is to use a rotation matrix. Pretty efficient, *and* accurate, useful in particular for real-time generation!
For initializing the generation, you still need to compute the two constants: sin(pi/N) and sin(2pi/N). But if the number of different values of N is fixed and reasonably small, those can be computed in advance and put in a small constant array. Benefit of this algorithm is that you get both sin() and cos() at each iteration. You can discard the cos value, or use it.
Regarding the sin(pi/N) and sin(2pi/N) constants, if N is not too small, you can even approximate both by just pi/N and 2pi/N, not requiring any constant array. I computed tables with both approaches and for N >= 64, the error over the whole table is pretty acceptable!
#define kPI 3.14159265358979323846
static void MakeSinCosTable(double *pSinTable, double *pCosTable, unsigned int n)
{
if (n == 0)
return;
double x = kPI / (double) n, y = sin(x), Alpha = 2.0 * y * y, Beta = sin(2.0 * x),
Si = 0.0, Ci = 1.0;
for (unsigned int i = 0; i < n; i++)
{
if (pSinTable != NULL)
pSinTable[i] = Si;
if (pCosTable != NULL)
pCosTable[i] = Ci;
double Si1 = Si - (Alpha*Si - Beta*Ci);
double Ci1 = Ci - (Alpha*Ci + Beta*Si);
Si = Si1;
Ci = Ci1;
}
}
As said above, you can replace the two sin() values required for computing coefficients either by the approximation x and 2*x (try it and see what you get!), or using a table for various values of n, as long as n has a finite (and resonable) number of possible values.
-
My much simpler and smaller code in ...
True, but yours uses double-precision floating-point (double), while mine uses only single precision (float).
-
My much simpler and smaller code in ...
True, but yours uses double-precision floating-point (double), while mine uses only single precision (float).
Changing my series expansion to float instead of double gives (N=65536, S=32767):
411 -1
64694 0
431 1
-
I actually get slightly worse results if I add up the terms from smallest to largest:
float my_sin_inner(int n, float tot, float term, float rad2){
term *= rad2 / (-n * (n-1));
float new_tot = tot + term;
if (new_tot != tot)
return term + my_sin_inner(n+2, new_tot, term, rad2);
else
return term;
}
int my_sin(int i){
float rad = 2*PI*i/TBLSZ;
float sin = rad + my_sin_inner(3, 0.0, rad, rad*rad);
return (sin * SCALE) + (sin >= 0 ? 0.5 : -0.5);
}
Gives ...
451 -1
64643 0
442 1
-
Restricting my single-precision routine to the first quadrant does improve it quite a lot.
int my_sin(int i){
int neg = 0;
if (i > TBLSZ/2){
neg = 1;
i = TBLSZ-i;
}
if (i > TBLSZ/4) i = TBLSZ/2 - i;
float rad = 2*PI*i/TBLSZ;
int sin = SCALE * (rad + my_sin_inner(3, 0.0, rad, rad*rad)) + 0.5;
return neg ? -sin : sin;
}
28 -1
65480 0
28 1
So that's 0.085% with ±1 and 99.915% exact.
But adding them up forwards ...
float my_sin_inner(int n, float tot, float term, float rad2){
term *= rad2 / (-n * (n-1));
float new_tot = tot + term;
if (new_tot != tot)
return my_sin_inner(n+2, new_tot, term, rad2);
else
return tot;
}
... is *still* better (and also makes my_sin_inner() an iteration not a recursion) ...
24 -1
65488 0
24 1
-
.
-
If you are interested in why one would implement the power series expansion of sin() (https://en.wikipedia.org/wiki/Trigonometric_functions#Power_series_expansion) that way: we want to do the addition of the terms in order of increasing magnitude, to minimize domain cancellation errors.
Didn't know that. But now I need to test this method against the Horner's scheme (https://en.wikipedia.org/wiki/Horner%27s_method) I usually prefer... :-DD
My functions effectively implement Horner's method. You'd be crazy not to. But I'm finding forward addition of the terms to be slightly more accurate than reverse addition. There's very little in it. We're only using 15 of the 23 bits in the final sum, after all -- so we're not really concerned with the last bit or two.
-
@brucehoult your code works perfectly :) Many thanks!
SPC = samples per cycle.
// Compact sin(x) function which doesn't need math.h
// i is from 0 to SPC-1
#define PI 3.14159265358979323846
#define SCALE 13107
int sin2(int i)
{
double rad = 2*PI*i/SPC, term=rad, sin=rad, old_sin=0;
for (int n=3; sin != old_sin; n+=2)
{
old_sin = sin;
term *= (rad*rad) / (-n * (n-1));
sin += term;
}
return (sin * SCALE) + (sin >= 0 ? 0.5 : -0.5);
}
// Precompute a sin(x) table, for a whole cycle (0 to 2*PI) with signed 16 bit int
// values from -13107 to +13107. Uses int16 rather than int32 to save RAM.
// 13107 is 16384 * 2 / 2.5, to get DAC swing from +0.25V to +2.25V (2V P-P)
// The table is shifted to the left by SHIFT samples, to compensate for the 2-sample
// delay in the wave gen, plus the phase delay in the DAC output filter (probably
// arranged to be equal to 1 sample period)
/*
* This version needs #include "math.h" which adds about 5k to the FLASH usage
double xinc=2*3.1415926/SPC;
double x=xinc*SHIFT;
for (int i=0;i<SPC;i++)
{
sintab[i] = sin(x)*13107.0;
x+=xinc;
if (x>(2*3.1415926)) x=0;
}
*/
// This uses the simple sin2(x) computation
for (int i=0;i<SPC;i++)
{
sintab[i] = sin2(i+SHIFT);
}
To my surprise it works fine through the 2*PI cycle end; I am adding SHIFT to i to generate a shifted table to compensate for some 32F417 quirks and for the time delay in the 3 pole lowpass filter.
-
But adding them up forwards [...] is *still* better (and also makes my_sin_inner() an iteration not a recursion) ...
I assume you verified FLT_EVAL_METHOD==0 (so that float operations are done at float precision and range)?
In that case, the rounding errors in the individual terms happen to cancel "better"!
(If you were to check, you'd find that "incorrect" 24/28 cases occur for completely different arguments, too. The ±1 differences we cannot get completely rid of without extending precision somehow, because they occur due to rounding errors at individual terms.)
The smaller the difference between successive terms with alternating signs, the more it helps with domain cancellation errors to sum them in descending order of magnitude. Ascending order of magnitude is usually better if the terms all have the same sign.
I suppose five float terms are too few to really tell which one is truly better here. The 24 or 28 cases out of 65536, are due to rounding errors in individual terms not canceled out by rounding errors in other terms.
Now, since the target table sizes are powers of two, I wonder if a simple CORDIC (https://en.wikipedia.org/wiki/CORDIC) approach using phase = x/TWOPI as the argument, using say Q.30 fixed point arithmetic, would yield even better results?
But now I need to test this method against the Horner's scheme (https://en.wikipedia.org/wiki/Horner%27s_method) I usually prefer... :-DD
Mine also uses Horner's method. It simply postpones the additions by saving them in temporary variables, and then does the additions from right to left, because that happens to be the order of increasing magnitude.
It is not "mathematically obvious" which order (increasing or decreasing in magnitude) is superior, when the terms have alternating signs. At least, I don't know of any good argument for either.
In case somebody does not recognize the term "domain cancellation" when summing/subtracting floating-point numbers:
Domain cancellation is what happens, when you try to naïvely calculate e.g. 1e100 + 2e-100 - 1e100 - 2e-100. You expect zero, but due to the limited range and precision in the floating point type, the result you get is -2e-100: The second term is too small in magnitude to affect the sum (of it and the first term), and this is what we call "domain cancellation". If you use Kahan sum, or sum terms in descending order of magnitude, you get the expected zero result.
That sum is a contrived example, where the terms cancel out exactly: in a real life set or series, they only cancel out somewhat. With descending order of magnitude, at some point the new term no longer affects the result, because the magnitude of the term is too small to affect the sum. With increasing order of magnitude, the small terms are accumulated first, so that with floating point, they are accounted for in the temporary sum.
When all the terms have the same sign, summing in order of increasing magnitude means that the temporary sum is as close to the magnitude of the next term to be added to the sum as possible.
Summing the negative and positive terms separately, would lead to maximum domain cancellation when adding the two sums together, so that's no good.
-
analyzing all Taylor's series methods proposed here are using Horner's scheme dont worry. you all people dont smash each other ;)
1) emece67 is using up to X^5 term with tuned (experimented?) constants instead of the true inversed factorial values (i guess to reduce error as much as possible in 1st quadrant (0-pi/2)) this should run fastest in real/run time i guess...
2) Nominal Animal is using up to X^9 term approximation, wrapped around (instead of loop). i guess slower than (1) but faster than (3)
3) brucehoult is using infinite loop until convergence hence truer Taylor's series. run slowest, but theoretically should give true sin value. code is smallest too if it matters. one possible problem is if the loop doesnt converge such as due to hardware FPU flaw such as round off error at the last decimal place, but its easily tested on particular machine. and esp OP's case of using static table calculate once at bootup.
looking at OP requirement, i will choose (3) but in case i need runtime sine value at acceptable accuracy, no table based allowed, and fast execution is important, i will choose (1), but if i have more room (faster processing) and need more accuracy, i will use (2) YMMV...
ps: since last night i tried to get fast interpolation table based for getting sin value in real/run time (not what the OP wants) including a novel deviation correction trick ::) i still cant beat emece67's code even with simplest interpolation in VB (his code also beats VB builtin Sin function). maybe i havent tried hard enough :-// hence, i surrender fondling this thread, until next time and cheers ;)...
-
Just out of interest, I verified that with scale = 13107, the following will produce exact results for power of two SPC, up to 8,388,608, with phase between 0 and 1:
#define TWOPI 6.28318530717958647692528676655900576839433879875021164
int sinp(double phase, double scale)
{
if (phase >= 0.5) {
phase -= 0.5;
scale = -scale;
}
if (phase > 0.25) {
phase = 0.5 - phase;
}
double x = phase * TWOPI;
const double xx = x*x;
x *= scale;
double result = x;
for (int i = 2; i < 14; i+=2) {
x /= (double)(-i*(i+1));
x *= xx;
result += x;
}
return (result < 0) ? (int)(result - 0.5) : (int)(result + 0.5);
}
with the table calculated via
for (i = 0; i < SPC; i++) {
sintab[i] = sinp((double)i / (double)(SPC), 13107.0);
}
That is, the sintab will be identical to when computed using standard C library sin() and
for (i = 0; i < SPC; i++) {
sintab[i] = round(13107.0 * sin(TWOPI * (double)i / (double)(SPC)));
}
Interestingly, if SPC is a power of two not greater than 65536, and you only have single-precision floating-point hardware, then
#define TWOPI 6.28318530717958647692528676655900576839433879875021164
static int sinpf(float phase, float scale)
{
if (phase >= 0.5f) {
phase -= 0.5f;
scale = -scale;
}
if (phase > 0.25f) {
phase = 0.5f - phase;
}
const int p = 65536.0f * phase;
const float half = (p == 7014 || p == 7437 || p == 7689 || p == 8774 || p == 9447 ||
p == 11382 || p == 13188 || p == 14289) ? 0.0f : 0.5f;
float x = phase * (float)(TWOPI);
const float xx = x*x;
x *= scale;
float result = x;
for (int i = 2; i < 12; i+=2) {
x /= (float)(-i*(i+1));
x *= xx;
result += x;
}
return (result < 0) ? (int)(result - half) : (int)(result + half);
}
with the table calculated via
for (i = 0; i < SPC; i++) {
sintab[i] = sinpf((float)i / (float)(SPC), 13107.0f);
}
will match the previous sintab[] exactly.
(For power of two periods up to 8192, no special cases for half are needed. For period 16384, there is one special case. For period 32768, there are 4 special cases. For period 65536, there are 8 special cases. Above that, both +1 and -1 do occur.)
When half is fixed to 0.5f, sinpf(x, 13107.0f) is correct in 99.998% of all possible float x between 0 and 1. The rest, 24624 of 1065353218 cases, differ by at most ±1.
Interestingly, sinp(x, 13107.0) does have 11 cases of float x between 0 and 1 (1,065,353,218, total) where the result is incorrect: 0x1.c06d98p-3 (0.21895903), 0x1.cd1914p-3 (0.22514549), 0x1.e702ccp-3 (0.23779830), 0x1.ee37c4p-3 (0.24131730), 0x1.f4f90ap-3 (0.24461563), 0x1.08e41ep-2 (0.25868270), 0x1.0c7e9ap-2 (0.26220170), 0x1.197376p-2 (0.27485451), and 0x1.1fc934p-2 (0.28104097) all yield a result that is too large by +1; and 0x1.701b66p-1 (0.71895903) and 0x1.8fe49ap-1 (0.78104097) yield a result that is too small by -1.
-
Interesting observation about
for (int n=3; sin != old_sin; n+=2)
doing a floating point inequality comparison. Perhaps one should limit n to say 100, which must be a ridiculously large value anyway, but it would prevent the loop hanging?
for (int n=3; sin != old_sin; n+=2)
{
old_sin = sin;
term *= (rad*rad) / (-n * (n-1));
sin += term;
if (n>100) break;
}
-
But adding them up forwards [...] is *still* better (and also makes my_sin_inner() an iteration not a recursion) ...
I assume you verified FLT_EVAL_METHOD==0 (so that float operations are done at float precision and range)?
I did better than that -- looked at the code.
0000000100003d20 _my_sin_inner:
100003d20: e9 23 be 6d stp d9, d8, [sp, #-32]!
100003d24: fd 7b 01 a9 stp x29, x30, [sp, #16]
100003d28: fd 43 00 91 add x29, sp, #16
100003d2c: 03 1c a0 4e mov.16b v3, v0
100003d30: 08 04 00 51 sub w8, w0, #1
100003d34: 08 fd 00 1b mneg w8, w8, w0
100003d38: 00 01 22 1e scvtf s0, w8
100003d3c: 40 18 20 1e fdiv s0, s2, s0
100003d40: 08 08 21 1e fmul s8, s0, s1
100003d44: 00 29 23 1e fadd s0, s8, s3
100003d48: 00 20 23 1e fcmp s0, s3
100003d4c: c0 00 00 54 b.eq #24 <_my_sin_inner+0x44>
100003d50: 00 08 00 11 add w0, w0, #2
100003d54: 01 1d a8 4e mov.16b v1, v8
100003d58: f2 ff ff 97 bl #-56 <_my_sin_inner>
100003d5c: 00 29 20 1e fadd s0, s8, s0
100003d60: 02 00 00 14 b #8 <_my_sin_inner+0x48>
100003d64: 00 1d a8 4e mov.16b v0, v8
100003d68: fd 7b 41 a9 ldp x29, x30, [sp, #16]
100003d6c: e9 23 c2 6c ldp d9, d8, [sp], #32
100003d70: c0 03 5f d6 ret
It's using s register names not d register names, so it's single precision.
The smaller the difference between successive terms with alternating signs, the more it helps with domain cancellation errors to sum them in descending order of magnitude. Ascending order of magnitude is usually better if the terms all have the same sign.
With factorial on the bottom line and powers of something never much bigger than 1.0 on the top line (1.57 if we restrict to the first quadrant) each successive term is much smaller than the previous one.
It would be a completely different matter for something that converges slowly, such as pi = 4 * (1 - 1/3 + 1/5 - 1/7 ...)
-
doing a floating point inequality comparison. Perhaps one should limit n to say 100, which must be a ridiculously large value anyway, but it would prevent the loop hanging?
It can't hang. The peculiar properties of computer floating point arithmetic are the whole reason for doing it like this -- finding out when A + B == A so that you know B has become too small to matter.
It also doesn't get anywhere near N=100. The maximum value in the first quadrant is N=15 i.e. 7 iterations/recursions on about 10% of the values.
-
.
-
1) emece67 is using up to X^3X^5 term with tuned (experimented?) constants instead of the true inversed factorial values (i guess to reduce error as much as possible in 1st quadrant (0-pi/2)) this should run fastest in real/run time i guess...
In fact using three terms of 1, 3 & 5 degree. Coeffs were computed using a variant of the Remez algorithm that uses as weighting function the same function to be approximated, thus turning the equi-ripple behavior of the basic Remez method into an "equi-correct_digits" one.
yup i stand corrected in earlier post. this is FU moment when i tried to post before bed time... looking back at posts.. yours is using X^5 tuned coeff's approx... Nominal Animal's is using X^9 and to the textbook 1/factorials approx... earlier post corrected... thanks.
Interesting observation about
for (int n=3; sin != old_sin; n+=2)
doing a floating point inequality comparison. Perhaps one should limit n to say 100, which must be a ridiculously large value anyway, but it would prevent the loop hanging?
yes since peformance is not mandatory and this is calculate once only, imho you may be wise adding some safety guard, as bruce said, 100 maybe too large, maybe something like X^19 is good enough? i've not been coding for a while, so maybe this is the syntax?
for (int n=3; sin != old_sin, n < 20; n+=2)
-
Error (GCC): left-hand operand of comma expression has no effect [-Wunused-value]
This works:
for (int n=3; (sin != old_sin) && (n < 30); n+=2)
-
This:
x2 = x * x;
sinx = x2 * 7.628645e-03f; // 7.628644736609089e-03
sinx -= 0.1660249f; // 0.1660248982778021
sinx *= x2;
sinx += 0.9999130f; // 0.9999130398499662
sinx *= x;
being all variables 32 bit floats, gives a maximum error about 0.013 % over [0, π/2], x in radians. You can extend the allowed input range for x (to 0~2π) prior to entering such code.
Hope this helps, regards.
Underrated comment. By far the simplest and most elegant solution, given op's accuracy requirements. No loops, no branching, just a few hard coded newton-raphson iterations.
-
By n/2, do you mean PI/2 ?
I know N-R converges very fast on a square root.
-
.
-
The Taylor series polynomial of a transcendental function, truncated at some finite term, is generally not the best approximating polynomial of that order. Approximating polynomials of lower degree giving errors less than the truncated Taylor series are called 'economised polynomials'
Abramowitz & Stegun give economised polynomial approximations for the sine and (many) other functions. The simplest such (4.3.96), using only two coefficients, gives more than enough accuracy for the OP's requirements, and with suitable scaling, can be implemented in 32 bit fixed-point arithmetic. Note that it gives an approximation for sin(x)/x - when you finish, don't forget to multiply by the number you first thought of! ;)
-
.
-
The Taylor series polynomial of a transcendental function, truncated at some finite term, is generally not the best approximating polynomial of that order. Approximating polynomials of lower degree giving errors less than the truncated Taylor series are called 'economised polynomials'
Abramowitz & Stegun give economised polynomial approximations for the sine and (many) other functions. The simplest such (4.3.96), using only two coefficients, gives more than enough accuracy for the OP's requirements, and with suitable scaling, can be implemented in 32 bit fixed-point arithmetic. Note that it gives an approximation for sin(x)/x - when you finish, don't forget to multiply by the number you first thought of! ;)
Note that the code I posted works the same way as the Abramowitz & Stegun example you cite (and the coefficients are almost the same).
You are quite right - there is a strong family resemblance! However, the A&S formulation is better suited to fixed-point implementation. I think the first computers with floating-point hardware didn't arrive until after 1955, and the original paper (which I don't have a copy of) was probably aimed at users of desk calculating machines
-
Just put in a table of the first octent that is a power of two long at the highest resolution you want. The array argument has range of 0 to 2^n
You would have to scale the input values to be max at 2^n - 1 .
Determine the octent of the argument by shifting right by n.
-
Just put in a table of the first octent that is a power of two long at the highest resolution you want. The array argument has range of 0 to 2^n
You would have to scale the input values to be max at 2^n - 1 .
Determine the octent of the argument by shifting right by n.
quadrant?
-
.
-
In case anyone is interested, I found single-precision coefficients for a sine approximation (0.99968741f*x - 0.16566247f*x**3 + 0.0075119016f*x**5) with absolute error less than ±0.0000706 over all unique single-precision floating point numbers in the first quadrant (x from 0 to PI/2, or from -PI/2 to PI/2 since sin is an odd ("antisymmetric") function):
float sinf_first_quadrant(float x)
{
const float xx = x * x;
float result = x * 0.0075119016f;
result += -0.16566247f;
result *= xx;
result += 0.99968741f;
return result * x;
}
float sinp(float phase, float scale)
{
if (phase < 0.0f) {
phase = -phase;
scale = -scale;
}
if (phase >= 1.0f) {
phase -= (int)phase;
}
if (phase >= 0.5f) {
phase -= 0.5f;
scale = -scale;
}
if (phase > 0.25f) {
phase = 0.5f - phase;
}
return scale * sinf_first_quadrant(phase * 6.2831853f);
}
int sinip(float phase, float scale)
{
if (phase < 0.0f) {
phase = -phase;
scale = -scale;
}
if (phase >= 1.0f) {
phase -= (int)phase;
}
if (phase >= 0.5f) {
phase -= 0.5f;
scale = -scale;
}
if (phase > 0.25f) {
phase = 0.5f - phase;
}
float result = scale * sinf_first_quadrant(phase * 6.2831853f);
/* return (result < 0.0f) ? (result - 0.5f) : (result + 0.5f); */
return roundf(result);
}
This means that you can scale the result by up to 14178 or so, and when rounded, the result will be within ±1 of round(scale * sin(x)), regardless of the array size.
sinp(p, 1.0f) differs from sinf(p*6.2831853f) by at most ±0.0000706 within -2*PI to +2*PI, but
sinp(x/6.2831853f, 1.0f) can differ from sinf(x) by at most ±0.000071 within -2*PI to +2*PI.
For larger ranges, note that for sinp(), 2*PI is exactly 6.2831853f. (Near ±200*PI, the error increases to almost ±0.00011.)
You can use sinip(p, s) to approximate (int)round(s*sin(p*2*PI)) with any finite p: but note the above accuracy limitation if you go far from zero. I suggest using it to populate an array with a full sine wave, using a simple loop,
for (int i = 0; i < N; i++) sintab[i] = sinip((float)i / (float)N, S);
where N is the size of the table (one full period), and S is the maximum amplitude.
There are only 1,070,141,419 unique float values between 0 and PI/2, so I verified the absolute errors brute-force; and sinip() with a few hundred million pseudorandom numbers.
-
float sinf_first_quadrant(float x)
{
const float xx = x * x;
float result = x * 0.0075119016f;
result += -0.16566247f;
result *= xx;
result += 0.99968741f;
return result * x;
}
float sinp(float phase, float scale)
{
if (phase < 0.0f) {
phase = -phase;
scale = -scale;
}
if (phase >= 1.0f) {
phase -= (int)phase;
}
if (phase >= 0.5f) {
phase -= 0.5f;
scale = -scale;
}
if (phase > 0.25f) {
phase = 0.5f - phase;
}
return scale * sinf_first_quadrant(phase * 6.2831853f);
}
int sinip(float phase, float scale)
{
if (phase < 0.0f) {
phase = -phase;
scale = -scale;
}
if (phase >= 1.0f) {
phase -= (int)phase;
}
if (phase >= 0.5f) {
phase -= 0.5f;
scale = -scale;
}
if (phase > 0.25f) {
phase = 0.5f - phase;
}
float result = scale * sinf_first_quadrant(phase * 6.2831853f);
/* return (result < 0.0f) ? (result - 0.5f) : (result + 0.5f); */
return roundf(result);
}
You can compute the coefficients with this program:
import numpy as np
num_nodes = 6
interval = [-np.pi/2, np.pi/2]
def f(x):
return np.sin(x)
def chebyshev_nodes(n, interval):
"""Return n chebyshev nodes over interval [a, b]"""
i = np.arange(n)
x = np.cos((2*i+1)*np.pi/(2*n)) # nodes over interval [-1,1]
a, b = interval
return 0.5*(b-a)*x+0.5*(b+a) # nodes over interval [a, b]
def main():
x_dots = chebyshev_nodes(num_nodes, interval)
y_dots = f(x_dots)
degrees = np.arange(1, num_nodes, 2) # Only odd coefficients
poly_coefs = np.polynomial.polynomial.polyfit(x_dots, y_dots, degrees)
print(poly_coefs)
main()
Result:
[ 0. 0.99991153 0. -0.16602 0. 0.00762666]
Edit: I have modified the program so that it only calculates the odd coefficients of the polynomial.
-
The advantage of my method is that you can get coefficients of any function with the precision you want in any range you want.
-
i think there will be unbreakable limit if using 5th degree polynomials.. can you fit better than 0.1% for larger range of angle with 5th degree (x^5)? i guess the solution is to increase the degree to say x^7 or x^9
-
Yes. That is what I meant.
import numpy as np
num_nodes = 8
interval = [-np.pi/2, np.pi/2]
def f(x):
return np.sin(x)
def chebyshev_nodes(n, interval):
"""Return n chebyshev nodes over interval [a, b]"""
i = np.arange(n)
x = np.cos((2*i+1)*np.pi/(2*n)) # nodes over interval [-1,1]
a, b = interval
return 0.5*(b-a)*x+0.5*(b+a) # nodes over interval [a, b]
def main():
x_dots = chebyshev_nodes(num_nodes, interval)
y_dots = f(x_dots)
degrees = np.arange(1, num_nodes, 2) # Only odd coefficients
poly_coefs = np.polynomial.polynomial.polyfit(x_dots, y_dots, degrees)
for i in range(len(poly_coefs)):
coef = poly_coefs[i]
print(f"a_{i} = {coef:.14g}")
main()
Result:
a_0 = 0
a_1 = 0.99999923706153
a_2 = 0
a_3 = -0.16665676500414
a_4 = 0
a_5 = 0.0083131914143761
a_6 = 0
a_7 = -0.000185225393246
float sinf_first_quadrant(float x)
{
const float xx = x * x;
float result = - x * 0.000185225393246f;
result += 0.0083131914143761f;
result *= xx;
result += -0.16665676500414f;
result *= xx;
result += 0.99999923706153f;
return result * x;
}
-
.
-
You can use the following C (C99 or later) program to find the maximum absolute error of a single-precision odd power series expansion of the sine function in the first quadrant, with at least three coefficients (fifth degree polynomial).
Because this considers the absolute error and not the relative error as is usual, this is best suited for comparing approximations that calculate integer-valued samples of a sine wave. The reported amplitude limit is approximately the maximum integer amplitude where the sample error does not exceed ±1. I would not use this to comparing sine approximations for general use.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <math.h>
#ifndef MAX_COEFFS
#define MAX_COEFFS 8
#endif
static inline float approx(float x, const float coeff[], int n)
{
const float xx = x * x;
float result = coeff[--n] * xx;
while (n-->1) {
result += coeff[n];
result *= xx;
}
result += coeff[0];
return result * x;
}
static const char *floatstring(float value)
{
static char buffer[1024];
int decimals = 1;
float parsed;
while (1) {
int len = snprintf(buffer, sizeof buffer, "%.*f", decimals, value);
if (len < 1 || len >= (int)sizeof buffer)
break;
errno = 0;
parsed = strtof(buffer, NULL);
if (errno)
break;
if (value == parsed)
return (const char *)buffer;
decimals++;
}
return "(error)";
}
static inline const char *skipws(const char *src)
{
if (src)
while (*src == '\t' || *src == '\n' || *src == '\v' ||
*src == '\f' || *src == '\r' || *src == ' ')
src++;
return src;
}
static int parse_float(const char *src, float *to)
{
const char *end;
float val;
src = skipws(src);
if (!src || !*src)
return -1;
end = src;
errno = 0;
val = strtof(src, (char **)&end);
if (errno)
return -1;
if (!end || end == src)
return -1;
end = skipws(end);
if (*end == 'f')
end = skipws(end + 1);
if (*end)
return -1;
if (!isfinite(val))
return -1;
if (to)
*to = val;
return 0;
}
static inline float fabsmin2(const float a, const float b)
{
const float f1 = fabsf(a), f2 = fabsf(b);
return (f1 < f2) ? f1 : f2;
}
int main(int argc, char *argv[])
{
int coeffs;
float coeff[MAX_COEFFS];
if (argc < 4 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
const char *arg0 = (argc > 0 && argv && argv[0] && argv[0][0]) ? argv[0] : "(this)";
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", arg0);
fprintf(stderr, " %s C1 C3 C5 [...]\n", arg0);
fprintf(stderr, "\n");
fprintf(stderr, "This program compares the sin approximation\n");
fprintf(stderr, " C1*x + C3*x*x*x + C5*x*x*x*x*x [ + ... ]\n");
fprintf(stderr, "to the correct value of sin, at single precision (float).\n");
fprintf(stderr, "for each possible x between 0 and PI/2, and reports\n");
fprintf(stderr, "the maximum absolute deviation.\n");
fprintf(stderr, "\n");
fprintf(stderr, "Example:\n");
fprintf(stderr, " %s 0.99968741 -0.16566247 0.0075119016\n", arg0);
fprintf(stderr, "\n");
return EXIT_SUCCESS;
}
coeffs = argc - 1;
if (coeffs > MAX_COEFFS) {
fprintf(stderr, "Too many coefficients. Maximum limit is %d.\n", MAX_COEFFS);
return EXIT_FAILURE;
}
for (int i = 0; i < coeffs; i++) {
if (parse_float(argv[i+1], coeff + i)) {
fprintf(stderr, "%s: Invalid coefficient C%d.\n", argv[i+1], 2*i+1);
return EXIT_FAILURE;
}
}
int n = 0;
float x = (float)(0.5 * 3.14159265358979323846);
float smin = 1.0f, s;
double err, err_min = -0.0, err_max = +0.0;
do {
s = approx(x, coeff, coeffs);
err = (double)s - sin((double)x);
if (err_min > err) { err_min = err; smin = 0.25f * fabsmin2(err_min, err_max); }
if (err_max < err) { err_max = err; smin = 0.25f * fabsmin2(err_min, err_max); }
++n;
if (!(n & 16777215)) {
fprintf(stderr, "\r%+.9f .. %+.9f (%.9f to PI/2) ", err_min, err_max, x);
fflush(stderr);
}
x = nextafterf(x, -1.0f);
} while (x > 0.0f && s > smin);
fprintf(stderr, "\n");
fflush(stderr);
printf("Coefficients:");
for (int i = 0; i < coeffs; i++)
printf(" %s", floatstring(coeff[i]));
printf("\nMaximum absolute error: %s", floatstring((float)err_min));
printf(" .. %s\n", floatstring((float)err_max));
if (err_max < -err_min) err_max = -err_min;
printf("Approximate amplitude limit: %.0f\n", 1.0 / err_max - 0.5);
return EXIT_SUCCESS;
}
It evaluates the polynomial at single precision, and compares to the standard C library sin() function evaluated at double precision.
This assumes that the standard C library sin() function uses correct rounding per IEEE 754, so its error in the first quadrant is within one unit in the least significant place. Typically, standard C library trigonometric functions are accurate, but not the fastest possible; if you are using a commonly used one, you should be safe assuming its implementation is accurate. I am using the GNU C library, which provides this (unless you tell the compiler to use unsafe math optimizations). I verified this with a brute-force __float128 (quadmath) sine power series approximation, too.
The iteration starts at PI/2, and checks every unique float value in decreasing order, until the series yields a value less than a quarter of the smaller error bound. At this point, for any sane approximation with first coefficient around one (say, 0.75 to 1.5), the argument is so small that a new maximum absolute error cannot be reached anymore even for the smaller error bound. That is, this limit is based on mathematical analysis of sine approximations, and not a "I think this suffices" one.
On my machine, this can evaluate the error bounds for three coefficients in about two and a half seconds; four coefficients in about three seconds.
To compile it using GCC, I use
gcc -Wall -O2 approx.c -lm -o approx
and using clang,
clang -Wall -O2 approx.c -lm -o approx
They produce identical results for the few cases I tried.
To produce better approximations, I use polynomial fit to x = 0.5*PI*(1 - (i/N)**K), y = sin(x) for N=100000, i=0..N, varying K around 1.8 or so. This way the fit emphasizes larger values of x, with K acting as the weight.
Results from my testing runs with three coefficients:- Edited: Best coefficients I've found (K=1.84697)
Coefficients: 0.99968743 -0.16566254 0.0075119236
Maximum absolute error: -0.000070561255 .. 0.000070453505
Approximate amplitude limit: 14172
- My suggested coefficients
Coefficients: 0.99968743 -0.16566247 0.0075119017
Maximum absolute error: -0.00007055788 .. 0.000070572176
Approximate amplitude limit: 14169
- Picuino
Coefficients: 0.99991155 -0.16602 0.00762666
Maximum absolute error: -0.00011782573 .. 0.0001343489
Approximate amplitude limit: 7443
- emece67
Coefficients: 0.99991304 -0.1660249 0.007628645
Maximum absolute error: -0.00011876599 .. 0.00013673306
Approximate amplitude limit: 7313
- Abramowitz & Stegun 4.3.96
Coefficients: 1.0 -0.16605 0.00761
Maximum absolute error: -0.00016428306 .. 0.00012107952
Approximate amplitude limit: 6087
- Truncated power series
Coefficients: 1.0 -0.16666667 0.008333334
Maximum absolute error: -0.000000011740082 .. 0.004524946
Approximate amplitude limit: 220
Results from my testing with four coefficients:- Edited: New best coefficients I've found (K=2.109)
Coefficients: 0.9999963 -0.16664736 0.008305594 -0.00018346867
Maximum absolute error: -0.00000068000065 .. 0.00000068940767
Approximate amplitude limit: 1450520
- (K=1.8390)
Coefficients: 0.9999965 -0.166648 0.008306158 -0.00018360758
Maximum absolute error: -0.00000074344683 .. 0.0000007067032
Approximate amplitude limit: 1345086
- Picuino
Coefficients: 0.9999992 -0.16665676 0.008313191 -0.00018522539
Maximum absolute error: -0.0000013112358 .. 0.0000012125032
Approximate amplitude limit: 762639
- Truncated power series
Coefficients: 1.0 -0.16666667 0.008333334 -0.0001984127
Maximum absolute error: -0.00015699863 .. 0.00000003811595
Approximate amplitude limit: 6369
Note that the three-term polynomial takes four multiplications and two additions, and the four-term polynomial five multiplications and three additions, using single-precision floating-point; with single-precision hardware multiplication and addition, these are surprisingly fast.
-
i use brucehoult true taylor series loop as reference (considered as true sin).
You need higher precision to evaluate the power series accurately. Each term you calculate is subject to rounding and domain cancellation error due to limited precision, and the only way around that is more precision.
That is, the series expansion does NOT give a correct approximation up to the precision available in the floating-point type, unlike library sin() and cos() functions are supposed to.
I use GCC 7.5.0 and its quad-precision math library (https://gcc.gnu.org/onlinedocs/gcc-7.5.0/libquadmath/) with
static __float128 sin_series(__float128 x)
{
__float128 xx = -(x*x);
__float128 result = x, old;
int terms = 0;
do {
terms++;
old = result;
x /= (__float128)(2*terms*(2*terms + 1));
x *= xx;
result += x;
} while (result != old);
/* Omitted: Update min and max 'terms' */
return result;
}
with
sin power series expansion using __float128 terms that affect the result: x / (2*PI)
37 terms (x - x^3/3! + x^5/5! ... - x^71/71! + x^73/73!) up to 2*PI (0.500 .. 1.000)
29 terms (x - x^3/3! + x^5/5! ... - x^55/55! + x^57/57!) up to PI (0.250 .. 0.500)
18 terms (x - x^3/3! + x^5/5! ... + x^33/33! - x^35/35!) up to PI/2 (0.125 .. 0.250)
[ 15 terms (x - x^3/3! + x^5/5! ... - x^27/27! + x^29/29!) up to PI/4 (0.000 .. 0.125) ]
cos power series expansion using __float128 terms that affect the result:
31 terms (1 - x^2/2! + x^4/4! ... - x^58/58! + x^60/60!) up to 2*PI (0.500 .. 1.000)
24 terms (1 - x^2/2! + x^4/4! ... + x^44/44! - x^46/46!) up to PI (0.250 .. 0.500)
24 terms (1 - x^2/2! + x^4/4! ... + x^44/44! - x^46/46!) up to PI/2 (0.125 .. 0.250)
[ 15 terms (1 - x^2/2! + x^4/4! ... - x^26/26! + x^28/28!) up to PI/4 (0.000 .. 0.125) ]
but obviously only verified around the most problematic key values of x, and only uniform-randomly checked elsewhere (a few hundred million points per number of terms, in the given ranges of x).
Fewer terms are needed to match correctly-rounded IEEE-754 double-precision sin() and cos(), though. (For the first quadrant of the sine approximation, i.e. x in 0 to PI/2, or -PI/2 to PI/2, 11 terms seems to be enough. Cosine needs 17 near ±PI/2, something like 14 around 1. The degree, obviously, is 2*terms-1 for sine, and 2*terms-2 for cosine.)
I could compare the series obtained thus way to libquadmath sinq() and cosq(), as they are supposed to produce results with at most 1 ULP of error, but only by sampling; the parameter space is just too large by many orders of magnitude to check thoroughly.
-
i use brucehoult true taylor series loop as reference (considered as true sin).
You need higher precision to evaluate the power series accurately. Each term you calculate is subject to rounding and domain cancellation error due to limited precision, and the only way around that is more precision.
That is, the series expansion does NOT give a correct approximation up to the precision available in the floating-point type, unlike library sin() and cos() functions are supposed to.
Sure it's not as exact as library functions are supposed to be, but the OP didn't need that. He only needed a 16 bit result (15 plus sign, effectively) and wasn't even asking for that to be fully accurate.
My code in single precision never needs more than 6 terms (in addition to the angle itself) to converge in the first quadrant. Each term only does one multiply, one divide, one add, so shouldn't be adding more than 1.5 ULP error per term on average, and probably less than 1.0. So that's around maybe 15 ULP (i.e. 4 bits) worst case, on a 23 bit result when only 15 are needed.
As we have already seen, this results in rounding the 15 bit result the wrong way for only about 50 out of 65536 values.
If you've got an FPU with only single precision hardware support as the OP's STM has then it would be much faster to keep a table of the 50 values that are off by 1 bit and do a binary search on it than it would be to use a higher precision calculation.
Note that using a Chebyshev polynomial is still going to give you the same 1.5 ULP error per term: one in the constant, one from the multiply, one from the add. It just might be slightly fewer terms.
-
i use brucehoult true taylor series loop as reference (considered as true sin).
You need higher precision to evaluate the power series accurately. Each term you calculate is subject to rounding and domain cancellation error due to limited precision, and the only way around that is more precision.
That is, the series expansion does NOT give a correct approximation up to the precision available in the floating-point type, unlike library sin() and cos() functions are supposed to.
Sure it's not as exact as library functions are supposed to be, but the OP didn't need that. He only needed a 16 bit result (15 plus sign, effectively) and wasn't even asking for that to be fully accurate.
Yup. I only objected to using it as the reference, and showed what I myself used for a reference.
I basically just wanted to show I how I verified the discoveries that go beyond OP's needs:- With four single-precision floating-point multiplications and two additions, we have a sine approximation for single-precision sin(x) with x=-PI/2..+PI/2 with less than ±0.0000706 absolute error, or about 14.79 bits of precision (13.79 plus sign) in fixed point format
- Adding one more multiplication and addition drops the absolute error to ±0.00000069, giving 21.47 bits of precision (20.47 plus sign) in fixed point format.
If we want to get more accurate than that, we need enough extra bits of precision to avoid the rounding error affecting the result. Comparing approximations to approximations (and not to a known accurate function) is, well, silly.
I consider the OP's original question well answered already; this is just extending the subject past the stated, arbitrary limits.
As usual, I'm trying to add "useful" information in case someone else has a similar approximation need, but with say different error bounds.
I have not experimented with say Q.29 (twos complement fixed-point integers with 29 fractional bits), which might be interesting on integer-only hardware for sine approximation; or say Q.60 for accurate sine/cosine approximation. But, if I were to, the reference machinery I showed can easily be adapted to check those against standard library sin()/cos(), too.
-
There are so many incredibly clever people here!
I read some clever stuff like this in a book about the Apollo guidance system, which was done in fixed point integers. They did have floats, later, implemented with some kind of interpreted language but since that CPU ran at something like 2MHz, it would have been slow - of the order of 100ms.
-
In the case of calculators and computers, it is preferable to use the CORDIC code that was mentioned in another post at the beginning.
// Cordic in 16 bit signed fixed point math
// Function is valid for arguments in range -pi/2 -- pi/2
// for values pi/2--pi: value = half_pi-(theta-half_pi) and similarly for values -pi---pi/2
//
// Constants
#define cordic_1K16 0x26DD
#define half_pi16 0x6487
#define MUL16 16384.0
int cordic_ctab16 [] = {
0x3243, 0x1DAC, 0x0FAD, 0x07F5,
0x03FE, 0x01FF, 0x00FF, 0x007F,
0x003F, 0x001F, 0x000F, 0x0007,
0x0003, 0x0001, 0x0000, 0x0000,
};
void cordic16(int theta, int *s, int *c) {
int tx, ty;
int x=cordic_1K16;
int y=0;
int z=theta;
for (int k=0; k<16; ++k) {
tx = (x>>k);
ty = (y>>k);
if (z>0) {
x -= ty;
y += tx;
z -= cordic_ctab16[k];
}
else {
x += ty;
y -= tx;
z += cordic_ctab16[k];
}
}
*c = x;
*s = y;
}
// Cordic in 24 bit signed fixed point math
// Function is valid for arguments in range -pi/2 -- pi/2
// for values pi/2--pi: value = half_pi-(theta-half_pi) and similarly for values -pi---pi/2
//
// Constants
#define cordic_1K24 0x0026DD3B
#define half_pi24 0x006487ED
#define MUL24 4194304.00
int cordic_ctab24 [] = {
0x003243F6, 0x001DAC67, 0x000FADBA, 0x0007F56E,
0x0003FEAB, 0x0001FFD5, 0x0000FFFA, 0x00007FFF,
0x00003FFF, 0x00001FFF, 0x00000FFF, 0x000007FF,
0x000003FF, 0x000001FF, 0x000000FF, 0x0000007F,
0x0000003F, 0x0000001F, 0x0000000F, 0x00000007,
0x00000003, 0x00000001, 0x00000000, 0x00000000,
};
void cordic24(int theta, int *s, int *c) {
int tx, ty;
int x=cordic_1K24;
int y=0;
int z=theta;
for (int k=0; k<24; ++k) {
tx = (x>>k);
ty = (y>>k);
if (z>0) {
x -= ty;
y += tx;
z -= cordic_ctab24[k];
}
else {
x += ty;
y -= tx;
z += cordic_ctab24[k];
}
}
*c = x;
*s = y;
}
// Cordic in 32 bit signed fixed point math
// Function is valid for arguments in range -pi/2 -- pi/2
// for values pi/2--pi: value = half_pi-(theta-half_pi) and similarly for values -pi---pi/2
//
// Constants
#define cordic_1K32 0x26DD3B6A
#define half_pi32 0x6487ED51
#define MUL32 1073741824.0
int cordic_ctab32 [] = {
0x3243F6A8, 0x1DAC6705, 0x0FADBAFC, 0x07F56EA6,
0x03FEAB76, 0x01FFD55B, 0x00FFFAAA, 0x007FFF55,
0x003FFFEA, 0x001FFFFD, 0x000FFFFF, 0x0007FFFF,
0x0003FFFF, 0x0001FFFF, 0x0000FFFF, 0x00007FFF,
0x00003FFF, 0x00001FFF, 0x00000FFF, 0x000007FF,
0x000003FF, 0x000001FF, 0x000000FF, 0x0000007F,
0x0000003F, 0x0000001F, 0x0000000F, 0x00000008,
0x00000004, 0x00000002, 0x00000001, 0x00000000,
};
void cordic32(int theta, int *s, int *c) {
int tx, ty;
int x=cordic_1K32;
int y=0;
int z=theta;
for (int k=0; k<32; ++k) {
tx = (x>>k);
ty = (y>>k);
if (z>0) {
x -= ty;
y += tx;
z -= cordic_ctab32[k];
}
else {
x += ty;
y -= tx;
z += cordic_ctab32[k];
}
}
*c = x;
*s = y;
}
#include <math.h> // for testing only!
#include <stdio.h> // for testing only!
// Print out sin(x) vs fp CORDIC sin(x)
int main(int argc, char **argv) {
double p, err, serr;
int s,c;
printf("\nCordic 16 bits\n");
serr = 0;
for(int i=0;i<=50;i++) {
p = (i/50.0)*M_PI/2;
cordic16((p*MUL16), &s, &c);
err = 1000000*(s/MUL16-sin(p));
serr += abs(err);
printf(" sin(%f)=%f err=%5.0f ppm\n", p, s/MUL16, err);
}
printf(" err_medio=%f ppm\n", serr/51);
printf("\nCordic 24 bits\n");
serr = 0;
for(int i=0;i<=50;i++) {
p = (i/50.0)*M_PI/2;
cordic24((p*MUL24), &s, &c);
err = 1000000000*(s/MUL24-sin(p));
serr += abs(err);
printf(" sin(%f)=%f err=%5.0f ppb\n", p, s/MUL24, err);
}
printf(" err_medio=%f ppb\n", serr/51);
printf("\nCordic 32 bits\n");
serr = 0;
for(int i=0;i<=50;i++) {
p = (i/50.0)*M_PI/2;
cordic32((p*MUL32), &s, &c);
err = 1000000000*(s/MUL32-sin(p));
serr += abs(err);
printf(" sin(%f)=%f err=%5.0f ppb\n", p, s/MUL32, err);
}
printf(" err_medio=%f ppb\n", serr/51);
}
-
Another wrinkle - not all errors are equally bad.
For instance, it would be be a bad thing for an approximation over the interval [0..pi/2] to give a non-zero value for an argument of zero, because it gives rise to a discontinuity when the approximation is extended to negative values by applying the symmetries of the sin() function. Similarly (depending on the application) it could be really bad news if the approximation ever returned a value > 1.0 for any argument.
Chebyschev approximation, which guarantees maximum absolute error at the bounds of the interval, may be liable to such problems. As noted by @emece67, the original authors of the A&S approximation used a technique which forced the error at the bounds to be zero, presumably in an attempt to avoid them.
Numerical analysis is fun!
-
Updated program with error measurement.
import numpy as np
num_nodes = 6
interval = [-np.pi/2, np.pi/2]
def f(x):
return np.sin(x)
def chebyshev_nodes(n, interval):
"""Return n chebyshev nodes over interval [a, b]"""
i = np.arange(n)
x = np.cos((2*i+1)*np.pi/(2*n)) # nodes over interval [-1, 1]
a, b = interval
return 0.5*(b - a)*x + 0.5*(b + a) # nodes over interval [a, b]
def print_coefs(coefs):
for i in range(len(coefs)):
print(f"a_{i} = {coefs[i]:.14g}")
def test_errors(num_dots, interval, poly_coefs):
i = np.arange(num_dots+1)/num_dots
a, b = interval
x_dots = a + i*(b-a)
y_dots = f(x_dots)
y_poly_dots = np.polyval(poly_coefs, x_dots)
max_err_abs = 0
max_err_rel = 0
for i in range(len(x_dots)):
err_abs = np.abs(y_dots[i] - y_poly_dots[i])
if err_abs > max_err_abs:
max_err_abs = err_abs
if y_dots[i] != 0:
err_rel = err_abs / y_dots[i]
if err_rel > max_err_rel:
max_err_rel = err_rel
elif y_poly_dots[i] != 0:
max_err_rel = np.inf
print(f"\nMax absolute error = {max_err_abs:.14g}")
print(f"Max relative error = {max_err_rel:.14g}")
print(f"Max polynomial value = {max(y_poly_dots):.14g}")
print(f"Min polynomial value = {min(y_poly_dots):.14g}")
def main():
x_dots = chebyshev_nodes(num_nodes, interval)
y_dots = f(x_dots)
degrees = np.arange(1, num_nodes, 2) # Only odd coefficients
poly_coefs = np.polynomial.polynomial.polyfit(x_dots, y_dots, degrees)
print_coefs(poly_coefs)
test_errors(10000, [0, interval[1]], np.flip(poly_coefs))
main()
Result:
a_0 = 0
a_1 = 0.9999115283796
a_2 = 0
a_3 = -0.16602000425895
a_4 = 0
a_5 = 0.0076266621511779
Max absolute error = 0.0001342309422232
Max relative error = 0.0001342309422232
Max polynomial value = 1.0001342309422
Min polynomial value = 0
-
Similarly (depending on the application) it could be really bad news if the approximation ever returned a value > 1.0 for any argument.
Good point.
This is also related to absolute error as a measurement. For basic sine wave synthesis, it is the absolute error that matters most, and exceeding 1.0 is not a problem; even the rounding differences just adds a tiny bit of noise to the output.
For most other use cases, relative error is what actually matters.
As an example, when calculating versors (unit quaternions) or rotation matrices from axis and angle, sine approximation exceeding 1.0 would definitely be a problem.
The odd power series expansion, \$f(x) = \sum_{k=1}^{N} C_{2 k - 1} x^{2 k - 1}\$, has the nice feature that it is guaranteed to be odd (and thus be zero at zero), given nonzero coefficients \$C\$. You can define the highest-degree coefficient explicitly as $$C_{2 N - 1} = \frac{2^{2 N - 1}}{\pi^{2 N - 1}} - \sum_{k=1}^{N-1} \frac{2^{2 N - 2 k} }{\pi^{2 N - 2 k}} C_{2 k - 1}$$when defining the polynomial to fit to, to ensure the series yields 1 at \$\pm\pi/2\$. (I prefer notation where the subscript matches the power of x.)
-
By putting the interpolation nodes at the beginning and at the end of the interval, the polynomial has little bit more relative error, but it ensures that it does not exceed 1.
import numpy as np
num_nodes = 6
interval = [-np.pi/2, np.pi/2]
def f(x):
return np.sin(x)
def chebyshev_nodes(n, interval):
"""Return n chebyshev nodes over interval [a, b]"""
i = np.arange(n)
x = np.cos((2*i)*np.pi/(2*(n-1))) # nodes over interval [-1, 1]
a, b = interval
return 0.5*(b - a)*x + 0.5*(b + a) # nodes over interval [a, b]
def print_coefs(coefs):
for i in range(len(coefs)):
print(f"a_{i} = {coefs[i]:.14g}")
def test_errors(num_dots, interval, poly_coefs):
i = np.arange(num_dots+1)/num_dots
a, b = interval
x_dots = a + i*(b-a)
y_dots = f(x_dots)
y_poly_dots = np.polyval(poly_coefs, x_dots)
max_err_abs = 0
max_err_rel = 0
for i in range(len(x_dots)):
err_abs = np.abs(y_dots[i] - y_poly_dots[i])
if err_abs > max_err_abs:
max_err_abs = err_abs
if y_dots[i] != 0:
err_rel = err_abs / y_dots[i]
if err_rel > max_err_rel:
max_err_rel = err_rel
elif y_poly_dots[i] != 0:
max_err_rel = np.inf
print(f"\nMax absolute error = {max_err_abs:.14g}")
print(f"Max relative error = {max_err_rel:.14g}")
print(f"Max polynomial value = {max(y_poly_dots):.14g}")
print(f"Min polynomial value = {min(y_poly_dots):.14g}")
def main():
x_dots = chebyshev_nodes(num_nodes, interval)
y_dots = f(x_dots)
degrees = np.arange(1, num_nodes, 2) # Only odd coefficients
poly_coefs = np.polynomial.polynomial.polyfit(x_dots, y_dots, degrees)
print_coefs(poly_coefs)
test_errors(10000, [0, interval[1]], np.flip(poly_coefs))
main()
Output:
a_0 = 0
a_1 = 0.99982457406782
a_2 = 0
a_3 = -0.16573991207075
a_4 = 0
a_5 = 0.0075133914860062
Max absolute error = 0.00013030270177095
Max relative error = 0.00017542591003003
Max polynomial value = 1
Min polynomial value = 0
-
Similarly (depending on the application) it could be really bad news if the approximation ever returned a value > 1.0 for any argument.
Good point.
This is also related to absolute error as a measurement. For basic sine wave synthesis, it is the absolute error that matters most, and exceeding 1.0 is not a problem; even the rounding differences just adds a tiny bit of noise to the output.
For most other use cases, relative error is what actually matters.
For sin() you can easily control the relative error in the series expansion by instead calculating a series for sin(x)/x and multiplying by x at the end.
When calculating using Horner's method, this is as simple as setting the first term to 1.0 instead of x, which has the effect of dividing every term by x.
Then multiply the final sum by x.
-
updated program with graphical representation of errors (absolute and relative)
import matplotlib.pyplot as plt
import numpy as np
num_nodes = 6
interval = [-np.pi/2, np.pi/2]
def f(x):
return np.sin(x)
def chebyshev_nodes(n, interval, closed_interval=False):
"""Return n chebyshev nodes over interval [a, b]"""
i = np.arange(n)
if closed_interval:
x = np.cos((2*i)*np.pi/(2*(n-1))) # nodes over closed interval [-1, 1]
else:
x = np.cos((2*i+1)*np.pi/(2*n)) # nodes over open interval (-1, 1)
a, b = interval
return 0.5*(b - a)*x + 0.5*(b + a) # nodes over interval [a, b]
def print_coefs(coefs):
print("\nCoefficients:")
for i in range(len(coefs)):
print(f" a_{i} = {coefs[i]:.14g}")
def tics(interval, numtics):
a, b = interval
return np.linspace(a, b, numtics)
def plot_func(x, y, err_abs, err_rel):
fig, ax = plt.subplots(3)
fig.subplots_adjust(left=0.1, bottom=0.05, right=0.95, top=0.95, wspace=None, hspace=0.35)
ax[0].plot(x, y, linewidth=1)
ax[0].set_title('Function')
ax[0].spines['left'].set_position('zero')
ax[0].spines['right'].set_color('none')
ax[0].spines['bottom'].set_position('zero')
ax[0].spines['top'].set_color('none')
ax[1].plot(x, err_abs, linewidth=1)
ax[1].set_title('Polynomial absolute error')
ax[1].spines['left'].set_position('zero')
ax[1].spines['right'].set_color('none')
ax[1].spines['bottom'].set_position('zero')
ax[1].spines['top'].set_color('none')
ax[2].plot(x, err_rel, linewidth=1)
ax[2].set_title('Polynomial relative error')
ax[2].spines['left'].set_position('zero')
ax[2].spines['right'].set_color('none')
ax[2].spines['bottom'].set_position('zero')
ax[2].spines['top'].set_color('none')
plt.show()
def test_errors(interval, poly_coefs, num_dots=1000):
x_dots = np.linspace(interval[0], interval[1], num_dots)
y_dots = f(x_dots)
y_poly_dots = np.polyval(poly_coefs, x_dots)
# Compute errors
err_abs = y_dots - y_poly_dots
err_abs_max = max(np.abs(err_abs))
err_rel = np.arange(num_dots).astype(float)
for i in range(len(x_dots)):
if y_dots[i] == 0:
if y_poly_dots[i] == 0:
err_rel[i] = 0.0
else:
err_rel[i] = np.inf
else:
err_rel[i] = err_abs[i] / y_dots[i]
err_rel_max = max(np.abs(err_rel))
# Show errors
print(f"\nMax absolute error = {err_abs_max:.14g}")
print(f"Max relative error = {err_rel_max:.14g}")
print(f"Max polynomial value = {max(y_poly_dots):.14g}")
print(f"Min polynomial value = {min(y_poly_dots):.14g}")
plot_func(x_dots, y_dots, err_abs, err_rel)
def main():
x_dots = chebyshev_nodes(num_nodes, interval, closed_interval=True)
print(f"x nodes = {x_dots}")
y_dots = f(x_dots)
degrees = np.arange(1, num_nodes, 2) # 2 = compute only odd coefficients
poly_coefs = np.polynomial.polynomial.polyfit(x_dots, y_dots, degrees)
print_coefs(poly_coefs)
test_errors([0, interval[1]], np.flip(poly_coefs))
main()
Output:
x nodes = [ 1.51727274 1.11072073 0.40655201 -0.40655201 -1.11072073 -1.51727274]
Coefficients:
a_0 = 0
a_1 = 0.9999115283796
a_2 = 0
a_3 = -0.16602000425895
a_4 = 0
a_5 = 0.0076266621511779
Max absolute error = 0.0001342309422232
Max relative error = 0.0001342309422232
Max polynomial value = 1.0001342309422
Min polynomial value = 0
-
.
-
Excellent work!
TL;DR:
emece67's Remez-based coefficients yield the single-precision, four-multiplication and two-addition first-quarter sine approximation function that thus far yields the smallest absolute error, -0.00006786226 to 0.00006776527. In C, we can write it as
float sin_approx(const float x)
{
const float xx = x * x;
float result = 0.0075143874f * xx; /* 0.0075143873691558837890625 */
result += -0.16567312f; /* -0.16567312180995941162109375 */
result *= xx;
result += 0.9996968f; /* 0.999696791172027587890625 */
return result * x;
}
where the commented numeric constants are the exact decimal values the shorter floating point constants refer to, in case anyone wants to do some exact numerical error analysis.
Edited to add: there is a tiny bit better solution at -178+19-3 ULPs,
float sin_approx(const float x)
{
const float xx = x * x;
float result = 0.0075143045f * xx; /* 0.007514304481446743011474609375 */
result += -0.16567284f; /* -0.165672838687896728515625 */
result *= xx;
result += 0.9996966f; /* 0.99969661235809326171875 */
return result * x;
}
which yields maximum absolute error -0.0000677852 .. 0.000067783265.
In exact values, the Remez approximation is
yremez(x) = x * ( 0.999696791172027587890625 + x * x * ( - 0.16567312180995941162109375 + x * x * 0.0075143873691558837890625 ) )
and the best I found is
yna(x) = x * ( 0.9996874332427978515625 + x * x * ( - 0.16566254198551177978515625 + x * x * 0.0075119235552847385406494140625 ) )
and their difference is
yna(x) - yremez(x) = x * ( -0.000009357929229736328125 + x * x * ( 0.0000105798244476318359375 + x * x * 0.0000024638138711452484130859375 ) )
That is, if the short-form floating-point coefficients refer to the above exact decimal values. (I use a helper program to calculate and display these, of course.)
I chose the order of the functions in the difference to make the early differences negative, to leave room for the key in the upper left; not because even I am that conceited.
As a plot, the errors (and their differences, since that is also their difference in errors):
(https://www.nominal-animal.net/answers/remez-na.svg)
difference uses the right vertical axis, and phase being x in terms of 2*PI.
Not only do the Remez-based coefficients by emece67 (0.9996968 -0.16567312 0.0075143874) yield smaller absolute error, but they also provide more symmetric absolute error.
The three extrema of the error in the Remez-based approximation are at x≃0.347884, x≃0.976371, and x≃1.413914, (phases 0.05537, 0.15539, and 0.22503) where the absolute errors are approximately -0.0000677, +0.0000677, and -0.0000677: basically the same (or at least very, very close). Even at the upper limit, x=PI/2≃1.57080, the absolute error is the same, approximately +0.0000677. This is a very, very nice three-term sine approximation at single precision; I really like it.
The errors in the best approximation I found earlier (via simple least squares polynomial fitting with controlled distribution of x within 0 to PI/2) are quite asymmetric in comparison.
The errors in the -178+19-3 solution are obviously close to the Remez-based approximation, because the coefficients differ only by a tiny bit; the modified coefficients just happen to quantize better to single-precision floating point numbers. At this scale, the maximum absolute error function is not really a continuous function of the coefficients anymore (because of those per-term rounding errors canceling out now and then; basically a quantization problem), and I suspect only brute force or similar discrete searches in the parameter space can "improve" the coefficients wrt. absolute error. I would not be at all surprised if there are even better solutions in the ±500±100±10 ULP region near the Remez coefficients (or one of the million possible unique single-precision triplets where C1 is between 0.9996960163116455078125 and 0.999697208404541015625, C3 is between -0.165672838687896728515625 and -0.16567134857177734375, and C5 is between 0.007514071650803089141845703125 and 0.007514304481446743011474609375).
-
Edited to add: there is a tiny bit better solution at -178+19-3 ULPs,
float sin_approx(const float x)
{
const float xx = x * x;
float result = 0.0075143045f * xx; /* 0.007514304481446743011474609375 */
result += -0.16567284f; /* -0.165672838687896728515625 */
result *= xx;
result += 0.9996966f; /* 0.99969661235809326171875 */
return result * x;
}
which yields maximum absolute error -0.0000677852 .. 0.000067783265.
Which, FWIW, are equivalent to fractions of:
3253427/432964489
2779529/16777216
8386063/8388608
if that's any help to those of you working in fixed point. The latter two of which are over powers of 2, which makes sense enough (lg(16..M) = 24 bits, sounds suspiciously float-y) but the first one is actually quite a bit more. (I'm just using my JS calculator for this, so it's working in that default (double) precision.) The nearest float of it is a modest 742/98745 = 0.007514304521.., and two steps down is the highest gain term, which... well, play with it yourself, I'm not gonna read off all the probably-useless stats here anyway. :)
https://www.seventransistorlabs.com/Calc/Frac.html (https://www.seventransistorlabs.com/Calc/Frac.html)
Also, asking WolframAlpha about the number gives a good approximation (to full digits), and asking about convergents or continued fraction thereof gets the terms of the above expansion.
Tim
-
.
-
Unfortunately not all functions are as well behaved as sin() for these polynomial approximations, I'm looking now for ways to efficiently compute log2(), but it is not as nice as sin().
Obviously, you only want to do that for x-1 with x between 1.0 and 2.0, just as you only want to do sin() between 0 and pi/2. i.e. both x-1 and log2(x-1) lie between 0.0 and 1.0.
-
Of course, but the approximation is still worse.
-
Do you need something fast, or something light in code and not requiring any math library?
-
Yeah, note that lg2(x) can be reduced to either counting maximum bit position (which is simply the integer logarithm; some ISAs even have an instruction for this), then computing on the range [1.0, 2.0). Also, presumably this is fixed point, since floating already is the first part, and libraries cover the rest?
Or also, if you only have basic arithmetic handy, just use Feynman's algorithm or the like; it's computed directly, and in not much more time than a few longhand multiplications or divisions. Polynomials are only attractive when fast hardware multiplication is available, with bonus points if division is also available.
It's interesting that polynomial approximations are notoriously poor (the (natural) series expansion goes as x^n/n); a rational series is a better fit (i.e. something like 1 - 1/x - 1/x^2 - 1/x^3 + ..., or some combination of p(x)/q(x) as a polynomial ratio, or whatever form you wish to represent it as). Err, what is such a series even called? I'm sure I've seen it before but I can't recall... Not quite Pade approximants, though that generates similar expressions.
Also, offhand, a hybrid approach seems better..?
\[ 1 - \frac{1}{x} + \frac{(x-1)^2}{3} - \frac{(x-1)^3}{7.5} \]
does surprisingly well. log has an infinite repeated pole at 0 (in effect; I think? But the answer from complex analysis is of course more interesting than a repeated pole, i.e., branch cut stuff) but if you aren't modeling arbitrarily close to 0, that really doesn't mean much to you, and any polynomial will do.
I can probably do a lot more with this. The above was just poking some numbers into WolframAlpha; perhaps for every positive power added, a complementary negative power gets added, too? Hmm...
And as usual -- if division is impractical on your platform, disregard rational approaches; a higher-order Taylor series (or better-optimized alternative) will simply perform better. And for that, we can use whatever transforms and tweaks as above.
Tim
-
.
-
This is a real log(x) implementation in c:
/*
log returns the natural logarithm of its floating
point argument.
The coefficients are #2705 from Hart & Cheney. (19.38D)
It calls frexp.
*/
#include <errno.h>
#include "cmath.h"
static const double p0 = -0.240139179559210510e2;
static const double p1 = 0.309572928215376501e2;
static const double p2 = -0.963769093368686593e1;
static const double p3 = 0.421087371217979714e0;
static const double q0 = -0.120069589779605255e2;
static const double q1 = 0.194809660700889731e2;
static const double q2 = -0.891110902798312337e1;
double log(double arg)
{
double x,z, zz, temp;
int exp;
if(arg <= 0.0)
{
errno = EDOM;
return(-HUGE);
}
x = frexp(arg, &exp);
if(x < INV_SQRT2)
{
x *= 2;
exp--;
}
z = (x-1)/(x+1);
zz = z*z;
temp = ((p3*zz + p2)*zz + p1)*zz + p0;
temp = temp/(((1.0*zz + q2)*zz + q1)*zz + q0);
temp = temp*z + exp*LN2;
return(temp);
}
double log10(double arg)
{
return(log(arg)*INV_LN10);
}
It makes the transformation z = (x-1)/(x+1); before compute the polynomial.
The interval of approximation is [0.7071067, 1.4142135]
Edit:
Book Hart & Cheney https://books.google.es/books/about/Computer_Approximations.html?id=jPVQAAAAMAAJ&redir_esc=y&hl=es
-
Approximation of log(x) between [sqrt (2) / 2, sqrt (2)]
def log(x):
a_1 = 2.0000004743492
a_3 = 0.6664575555535
a_5 = 0.41516419048219
z = (x-1)/(x+1)
zz = z * z
return z * (a_1 + zz * (a_3 + zz * a_5))
Absolute error < 4e-8
Edit:
Python program to do the calculations:
import matplotlib.pyplot as plt
import numpy as np
def transf(x):
return (x-1)/(x+1)
num_nodes = 6
interval = [transf(np.sqrt(2)/2), transf(np.sqrt(2))]
def f(x):
return np.log(-(x+1)/(x-1))
def chebyshev_nodes(n, interval, closed_interval=False):
"""Return n chebyshev nodes over interval [a, b]"""
i = np.arange(n)
if closed_interval:
x = np.cos((2*i)*np.pi/(2*(n-1))) # nodes over closed interval [-1, 1]
else:
x = np.cos((2*i+1)*np.pi/(2*n)) # nodes over open interval (-1, 1)
x = np.flip(x)
a, b = interval
return 0.5*(b - a)*x + 0.5*(b + a) # nodes over interval [a, b]
def print_coefs(coefs):
print("\nCoefficients:")
for i in range(len(coefs)):
print(f" a_{i} = {coefs[i]:.14g}")
def tics(interval, numtics):
a, b = interval
return np.linspace(a, b, numtics)
def plot_func(x, y, err_abs, err_rel):
fig, ax = plt.subplots(3)
fig.subplots_adjust(left=0.1, bottom=0.05, right=0.95, top=0.95, wspace=None, hspace=0.35)
ax[0].plot(x, y, linewidth=1)
ax[0].set_title('Function')
#ax[0].spines['left'].set_position('zero')
ax[0].spines['right'].set_color('none')
ax[0].spines['bottom'].set_position('zero')
ax[0].spines['top'].set_color('none')
ax[1].plot(x, err_abs, linewidth=1)
ax[1].set_title('Polynomial absolute error')
#ax[1].spines['left'].set_position('zero')
ax[1].spines['right'].set_color('none')
ax[1].spines['bottom'].set_position('zero')
ax[1].spines['top'].set_color('none')
ax[2].plot(x, err_rel, linewidth=1)
ax[2].set_title('Polynomial relative error')
#ax[2].spines['left'].set_position('zero')
ax[2].spines['right'].set_color('none')
ax[2].spines['bottom'].set_position('zero')
ax[2].spines['top'].set_color('none')
plt.show()
def test_errors(interval, poly_coefs, num_dots=1000):
x_dots = np.linspace(interval[0], interval[1], num_dots)
y_dots = f(x_dots)
y_poly_dots = np.polyval(poly_coefs, x_dots)
# Compute errors
err_abs = y_dots - y_poly_dots
err_rel = np.arange(num_dots).astype(float)
for i in range(len(x_dots)):
if y_dots[i] == 0:
if y_poly_dots[i] == 0:
err_rel[i] = 0.0
else:
err_rel[i] = np.inf
else:
err_rel[i] = err_abs[i] / y_dots[i]
# Show errors
print(f"\nMax absolute error = {max(err_abs):.14g}")
print(f"Min absolute error = {min(err_abs):.14g}")
print(f"Max relative error = {max(err_rel):.14g}")
print(f"Min relative error = {min(err_rel):.14g}")
print(f"Max polynomial value = {max(y_poly_dots):.14g}")
print(f"Min polynomial value = {min(y_poly_dots):.14g}")
plot_func(x_dots, y_dots, err_abs, err_rel)
def main():
x_dots = chebyshev_nodes(num_nodes, interval, closed_interval=True)
y_dots = f(x_dots)
print(f"x nodes = {x_dots}")
print(f"y values = {y_dots}")
degrees = np.arange(1, num_nodes, 2)
poly_coefs = np.polynomial.polynomial.polyfit(x_dots, y_dots, degrees)
print_coefs(poly_coefs)
test_errors([interval[0], interval[1]], np.flip(poly_coefs))
main()
-
There is more than one way of doing it of course, and interestingly, the almost only way exposed here was some kind of polynomial approximation, which doesn't do all that well for log(), unless you use a lot of coefficients.
T3sl4co1l's approach is not polynomial and requires a reciprocal (which may or may not be practical depending on your requirements), but anyway, I tried it and could not make it evaluate to anything close to a log? So could you elaborate?
One typical way of computing log2() is to first extract the integer part of the log2, then work on the fractional part. The integer part can be found using various approaches. But if you're dealing with IEEE 754 FP numbers, and can assume those are normalized, it's actually just a matter of extracting the exponent, which is trivial, and then set the exponent to 0 for working on the fractional part. Oh yes, manipulating IEEE-754 is baddd, but it's actually what most libraries do under the hood.
Then, getting the fractional part can be done using a binary log approach, which allows you to control how many bits of precision you want. The more bits, the longer it takes of course.
Here is a fully functional approach which works on IEEE 754 double precision numbers, you can of course adapt it to single precision (just adjust the part dealing with the exponent), or use it as a starting point if you're going to use any other number format (I wrote a variant of this for fixed-point numbers, for instance.) So depending on the number of bits of precision, it's not ultra-fast, but it can be made as accurate as possible and only requires FP multiplication, compare and addition. And it will work for any x. Again you can translate this for other formats too.
double MyLog2(double x)
{
if (x <= 0.0)
return DBL_MIN;
// Assume x is normalized.
double dX = x;
uint64_t *px = (uint64_t *) &dX; // Oh yeah, that's not pretty.
// Integer part.
int16_t nExp = (int16_t)((*px >> 52) & 0x7FF) - 1023; // IEEE 754 exponent
double dY = (double) nExp, dB = 0.5;
// Fractional part.
*px |= 0x3FF0000000000000; // Set IEEE 754 exponent to 0 (1023)
*px &= 0xBFFFFFFFFFFFFFFF;
const unsigned int nFracBits = 32;
for (unsigned int i = 0; i < nFracBits; i++, dB *= 0.5)
{
dX = dX * dX;
if (dX >= 2.0)
{
dY += dB;
dX *= 0.5;
}
}
return dY;
}
-
There is more than one way of doing it of course, and interestingly, the almost only way exposed here was some kind of polynomial approximation, which doesn't do all that well for log(), unless you use a lot of coefficients.
It doesn't take many coefficients to get a good approximation to log (x)
With only 3 coefficients you can obtain an aproximation with max relative error of 1.2e-7
With 6 coefficients you can obtain an aproximation with max relative error of 3e-14:
#include <errno.h>
#include <math.h>
#define INV_SQRT2 (0.70710678118654752440084436210) /* sqrt(1/2) */
#define LN2 (0.69314718055994530941723212146) /* log(2) */
static const double a_1 = 1.999999999999945;
static const double a_3 = 0.6666666667970299;
static const double a_5 = 0.3999999486999626;
static const double a_7 = 0.2857216790491805;
static const double a_9 = 0.2217418680272728;
static const double a_11 = 0.1960893014786098;
double logpol(double arg) {
double x, z, zz, temp;
int exp;
if(arg <= 0.0) {
errno = EDOM;
return(-HUGE);
}
x = frexp(arg, &exp);
if(x < INV_SQRT2) {
x *= 2;
exp--;
}
z = (x-1)/(x+1);
zz = z*z;
temp = ((((a_11*zz + a_9)*zz + a_7)*zz + a_5)*zz + a_3)*zz + a_1;
temp = temp*z + exp*LN2;
return(temp);
}
-
T3sl4co1l's approach is not polynomial and requires a reciprocal (which may or may not be practical depending on your requirements), but anyway, I tried it and could not make it evaluate to anything close to a log? So could you elaborate?
It wasn't supposed to; well, the display-math one was (fitted to ln(x) from 1 to 2, which... wait, that's not log2 anyway, that should be 1 to e -- well, it was a _very_ basic attempt, you see--). The other suggested series were merely forms to try and fit coefficients to (emphasis on "something like"!).
I'm not very familiar with the (x+1)/(x-1) substitution, but it sounds like it's doing a much better job, so I'm not too bothered about [not] expanding on my ideas. It's probably a better form of ultimately the same sort of thing. :-+
Tim
-
OK no problem!
And yes, what Picuino showed is not polynomial either (in x!)
emece67 tried to use polynomial approx. and could see for themselves it was not the right way of tackling it.
I haven't checked the results myself for Picuino's code - but I do think it's a popular way of doing it (seems to be used in some C lib implementations), so I guess it does the job. But it requires a division, which may cost quite a bit depending on the architecture.
The one I suggested has the benefit of not requiring any division, and being as accurate as you want it to be. It's not as fast obviously. For the 32-bit fractional version, it's on average 8 times slower per log2() than the C lib one (GCC on x86_64). Of course most of the time is spent in the loop. There may be ways to optimize it.
Oh, and you may find the pointer aliasing for extracting the exponent ugly, but if you're using frexp(), as in Picuino's code, then you're going to need to use the math library. And then you may as well use the log2() function directly. :)
-
Oh, and you may find the pointer aliasing for extracting the exponent ugly, but if you're using frexp(), as in Picuino's code, then you're going to need to use the math library. And then you may as well use the log2() function directly. :)
frexp() is very easy to do yourself, without the math library.
Ignoring zero, infinities, and NaNs it's just:
double my_frexp(double d, int *exp) {
uint64_t d_bits = *(uint64_t*)&d;
uint64_t exp_mask = 2047L << 52;
*exp = ((d_bits & exp_mask) >> 52) - 1022;
d_bits = (d_bits & ~exp_mask) | (1022L << 52);
return *(double*)&d_bits;
}
On my Mac that's just...
0000000100003e94 _my_frexp:
100003e94: 08 00 66 9e fmov x8, d0
100003e98: 09 f9 74 d3 ubfx x9, x8, #52, #11
100003e9c: 29 f9 0f 51 sub w9, w9, #1022
100003ea0: 09 00 00 b9 str w9, [x0]
100003ea4: 08 d1 41 92 and x8, x8, #0x800fffffffffffff
100003ea8: 08 21 4b b2 orr x8, x8, #0x3fe0000000000000
100003eac: 00 01 67 9e fmov d0, x8
100003eb0: c0 03 5f d6 ret
-
Oh, and you may find the pointer aliasing for extracting the exponent ugly, but if you're using frexp(), as in Picuino's code, then you're going to need to use the math library. And then you may as well use the log2() function directly. :)
frexp() is very easy to do yourself, without the math library.
Yes, have you seen my code? :)
Apparently you did not quite get what I meant above.
-
Excellent thread with many good suggestions. For those looking for starting points on these types of problems it is hard to do better than "Handbook of Mathematical Functions" by Abramowitz and Stegun. Has tables of values forall functions normally encountered in mathematics and suggested rational approximations for varying levels of accuracy. There is a Dover reprint of the original US Government Printing Office version.
-
.
-
Which, FWIW, are equivalent to fractions of:
3253427/432964489
2779529/16777216
8386063/8388608
Actually, the smallest term is 8068423/1073741824.
Put another way, they are 8068423*2**-30, 2779529*2**-24, and 8386063*2**-23.
If one uses truncating (and not rounding) math, I fully believe that the exact best solution is very near the Remez coefficients, but just some ULPs away, due to the quantization to single-precision floating-point and rounding differences.
If anyone is wondering:
When approximating logarithms, we are relying on the key observation: logb(a b) = logb(a) + logb(b).
Although the basis b=2 makes most sense when dealing with binary-radix floating point numbers, it does expand to any b via loga(x) = logb(x)/logb(a). And since 1/log2(e) ≃ 1.4426950408889634073599246810 and 1/log2(10) ≃ 3.32192809488736234787031942948939, using any other basis is just one multiplication by a constant away, so we don't even need to worry about natural or base-10 logarithms; we get them for basically free if we handle base-2.
Because log2(2) = 1 and log2(1/2) = -1, we can multiply the argument x by any integer power of two 2y, and use log2(x) = log2(x y) + y. This is why we only really need a to handle a small range of arguments.
The reason 0.7071 ≃ sqrt(1/2) ≤ x ≤ sqrt(2) ≃ 1.4142 is chosen as the range, is that 2*sqrt(1/2) = sqrt(2) (and 1/sqrt(2) = sqrt(1/2), and 1/sqrt(1/2) = sqrt(2)). That is, if you have a positive real number r, you can always multiply it with an integer power of two, 2y where y is an integer, to get sqrt(1/2) ≤ r*2y ≤ sqrt(2). Of course you can pick an even larger range to approximate –– just like you don't need to restrict to |x|≤PI/2 for approximating sine –– but the approximation will be more complex than it has to be.
What I am wondering myself, is whether absolute or relative error is the "better" error metric here. log2(sqrt(1/2)) = -1/2, and log2(sqrt(2)) = +1/2, with log2(1) = 0. Also, log2(x) = -log2(1/x), so we definitely have symmetry here.
-
.
-
Here some interesting links about CORDIC method:
Elementary functions for embedded systems (https://www.drdobbs.com/microcontrollers-cordic-methods/184404244)
https://www.hpl.hp.com/hpjournal/72jun/jun72a2.pdf (https://www.hpl.hp.com/hpjournal/72jun/jun72a2.pdf)
https://www.quinapalus.com/efunc.html (https://www.quinapalus.com/efunc.html)
Somebody knows a good CORDIC algoritm for simple precision log(x) function?
-
Program to generate CORDIC table for sine and cosine functions for several words precission lenght:
http://www.dcs.gla.ac.uk/~jhw/cordic/ (http://www.dcs.gla.ac.uk/~jhw/cordic/)
-
.
-
Another "CORDIC"-like approach to logarithm calculation is Feynman's algorithm (https://en.wikipedia.org/wiki/Logarithm#Feynman's_algorithm), which uses the fact that every real \$x\$, \$1 \lt x \lt 2\$, can be approximated with
$$x \approx \prod_{k=1}^{N} \left(1 + 2^{-k}\right) B_k , \quad B_k \in 0, 1$$
where \$B_k\$ is the \$k\$'th fractional bit in \$x\$, and therefore
$$\log(x) = \sum_{k=1}^{N} B_k \log\left(1 + 2^{-k}\right)$$
Since single-precision floating point (IEEE 754 Binary32) has 24 significant bits in its mantissa, \$N = 24\$ should suffice.
That is, we'll only need 24 single-precision floating point numbers, pair-wise.
This is particularly well suited for IEEE 754 binary floating point types.
-
What I am wondering myself, is whether absolute or relative error is the "better" error metric here.
Yeah. I think that would really depend on what you're using the function for.
With that said, as always, of course, before using any kind of "expensive" function or operation, make sure you really need them. Sometimes changing the computation approach just allows to get rid of them. Probably obvious for most people here, but I've see too many using log() or sqrt() when you could definitely do without them.
-
Or the really low-hanging fruits, like integer-argument pow()... :D
Speaking of, anyone know how often GCC or other compilers are able to simplify those? I don't know offhand whether it's smart enough to spot an integer (or say within 1 ULP of an integer cast to float) literal and produce alternative code, or if it really matters much/at all (because the argument has to be cast to float and no assumptions can be made), or if it can even end-run around the semantics and detect that an integer literal has been used? I would assume the latter, not so much, but all are reasonable to varying degrees...
Tim
-
As far as I know, if you pass an integer to a function which has a FP argument, the integer will get promoted to a FP, and then the function will get called. Due to the promotion rule in C, I don't think a compiler would be allowed to optimize this out - at least by using an integer version of the function instead, for instance.
Now you're talking about literals - meaning you have in mind calling such functions with literals (constants). So, in general, would the value of the literal matter? Could the compiler even compute the result at compile-time?
Now feel free to experiment. :)
A small test showed that common math functions seem to be evaluated at compile time if they are called with a literal argument. For instance, the simple:
#include <math.h>
double Foo(double x)
{
return sqrt(2)*x;
}
Yields:
Foo:
.seh_endprologue
mulsd .LC0(%rip), %xmm0
ret
.seh_endproc
.section .rdata,"dr"
.align 8
.LC0:
.long 1719614413
.long 1073127582
Whether that works will all standard math functions, I have no clue!
-
Certainly easy enough to experiment...
https://godbolt.org/z/xcaPnq5YW
I get the same results on this independent of ISA or gcc/clang.
-
Certainly easy enough to experiment...
https://godbolt.org/z/xcaPnq5YW
I get the same results on this independent of ISA or gcc/clang.
Ahh, interesting that it simplifies 2, but not 3...
Ah, that makes sense, -funsafe-math-optimizations will generate the [naively] expected code (but it might not distribute errors exactly the way you like, hence the name of the option). I suppose squaring being a binary operation (x * x), order doesn't matter, but for other powers it does. Also works for negative powers (emitting a division op, and whatever muls to get it there).
Tim
-
Since binary logarithm (https://en.wikipedia.org/wiki/Binary_logarithm#Calculation) requires a large-ish number of squarings, I suppose the alternative is a piecewise cubic approximation:
#include <math.h>
#define LOG2_PARTS 256
#define LOG2_ERROR -HUGE_VALF
float log2_poly[LOG2_PARTS][4];
float log2_approx(float x)
{
if (x <= .0f)
return LOG2_ERROR;
int p, i;
x = 2 * LOG2_PARTS * (frexpf(x, &p) - 0.5f);
i = (int)x;
x -= (float)i;
float result = x * log2_poly[i][3];
result += log2_poly[i][2];
result *= x;
result += log2_poly[i][1];
result *= x;
result += log2_poly[i][0];
return result + p;
}
If we use Cn as shorthand for log2_poly[i][n], then i = 0..N-1 is the piece 0.5+i/(2N) ≤ x < 0.5+(i+1)/(2N), C0 is the base-2 logarithm at the lower boundary, and C0+C1+C2+C3 is the base-2 logarithm at the upper boundary, approximately. The individual polynomial coefficients can be calculated using e.g. the Remez algorithm. N = LOG2_PARTS is the number of polynomials in the entire interval 0.5 ≤ x < 1, and is one of the parameters we can control to tune the fit. For obvious reasons, powers of two N are preferable if we need to extract the mantissa bits from the IEEE 754 Binary32 storage representation.
Of course, each polynomial takes 4×4 bytes = 16 bytes of ROM/Flash, so this is just another method of trading ROM/Flash for speed/accuracy, up to a limit.
-
Late to the party, but I knocked up a CORDIC implementation for you. It generates the tables needed, which in an actual implementation can be initialized with constants, rather than calling build_tables(), and avoid all floating point operations.
mysin() takes a 32-bit integer representing the phase angle (0 = 0 degrees, 0x40000000 = 90 degrees and so on) and returns the sine at that point. Adjust the value of "start" to scale the output range - the "CORDIC gain" is about 1.647, so the current value of 19813 gives a full scale of +/-32627
Here's the output:
$ ./cordic
table[0] is 536870912
table[1] is 316933405
table[2] is 167458907
table[3] is 85004756
table[4] is 42667331
table[5] is 21354465
table[6] is 10679838
table[7] is 5340245
table[8] is 2670163
table[9] is 1335086
table[10] is 667544
table[11] is 333772
table[12] is 166886
table[13] is 83443
table[14] is 41721
table[15] is 20860
table[16] is 10430
table[17] is 5215
table[18] is 2607
table[19] is 1303
table[20] is 651
table[21] is 325
table[22] is 162
table[23] is 81
Actual Calced Error
0 0 0
6365 6365 0
12485 12486 1
18126 18126 0
23070 23070 0
27128 27128 0
30143 30143 0
32000 32000 0
32627 32627 0
32000 32000 0
30143 30143 0
27128 27128 0
23070 23071 1
18126 18126 0
12485 12486 1
6365 6365 0
0 0 0
-6365 -6365 0
-12485 -12486 -1
-18126 -18126 0
-23070 -23070 0
-27128 -27128 0
-30143 -30143 0
-32000 -32000 0
-32627 -32627 0
-32000 -32000 0
-30143 -30143 0
-27128 -27128 0
-23070 -23070 0
-18126 -18126 0
-12485 -12486 -1
-6365 -6365 0
Here's the code:
#include <stdio.h>
#include <stdint.h>
#include <math.h>
static uint32_t table[24] = {0};
#define TABLE_SIZE (sizeof(table)/sizeof(table[1]))
int build_table(void) {
double total = 0;
for(int i = 0; i < TABLE_SIZE; i++) {
double angle = pow(2.0,-i);
double v = atan(angle);
v /= 2*M_PI;
v *= (1<<30);
v *= 4;
table[i] = (int)v;
printf("table[%d] is %6d\n", i, table[i]);
total += v;
}
return 0;
}
int32_t mysin(int32_t in) {
int32_t start = 19813*16;
int32_t out_s = 0;
int32_t out_c = 0;
// Get 'in' to within +/- PI/2
if(in > 0x40000000) {
out_c = -start;
in -= 0x80000000;
} else if(in <= -0x40000000) {
out_c = -start;
in += 0x80000000;
} else {
out_c = start;
}
for(int i = 0; i < TABLE_SIZE; i++) {
int32_t next_c, next_s;
if(in > 0) {
in -= table[i];
next_c = out_c - (out_s>>i);
next_s = out_s + (out_c>>i);
} else {
in += table[i];
next_c = out_c + (out_s>>i);
next_s = out_s - (out_c>>i);
}
out_s = next_s;
out_c = next_c;
}
return out_s/16;
}
int main(int argc, char *argv[]) {
build_table();
int test_points = 32;
printf("\nActual Calced Error\n");
for(int32_t i = 0; i < test_points; i++) {
int actual = (int)(sin(i*M_PI*2/test_points)*32627);
int calc = mysin(0x10000000/test_points*16*i);
printf("%6d %6d %4d\n", actual, calc, calc-actual);
}
}
-
Certainly easy enough to experiment...
https://godbolt.org/z/xcaPnq5YW
I get the same results on this independent of ISA or gcc/clang.
Ahh, interesting that it simplifies 2, but not 3...
Ah, that makes sense, -funsafe-math-optimizations will generate the [naively] expected code (but it might not distribute errors exactly the way you like, hence the name of the option). I suppose squaring being a binary operation (x * x), order doesn't matter, but for other powers it does. Also works for negative powers (emitting a division op, and whatever muls to get it there).
Yeah. Of course, there is no hard rule about this at all and so it's 100% compiler-dependent. I would also expect it to support only a restricted range of functions and possible values.
I guess only if you use C++ "constexpr" can you be sure enough that what can be computed at compile-time will be and give the same result as it would during run time.