### Author Topic: Low-accuracy fast CORDIC  (Read 1753 times)

0 Members and 1 Guest are viewing this topic.

#### Picuino

• Super Contributor
• Posts: 1026
• Country:
##### Low-accuracy fast CORDIC
« on: June 12, 2024, 12:54:11 pm »
Sometimes it is necessary to calculate sines and cosines relatively quickly and more accurately than a Look Up Table can provide.
One solution may be to calculate the sine and cosine with the mathematical library, but in many cases (especially in small microcontrollers) this involves a lot of computational load and a lot of calculation time.

The solution in these cases is to use the CORDIC algorithm which can give results with arbitrary precision (the number of correct bits depends on the number of iterations) and in a fast way (it only needs additions and shifts).

Starting from the algorithm that appears in wikipedia, I have prepared a Python file to calculate the sine and cosine of an angle in integer format with 14 bits of accuracy (2 bytes integer value):

Code: [Select]
import math

BITS = 14
theta_table = [int(math.atan2(1, 2**i)*(2**BITS)) for i in range(BITS)]

def compute_Kn():
k = 2 ** BITS
for i in range(BITS):
k /= math.sqrt(1 + 2 ** (-2 * i))
return round(k)

def CORDIC_integer(alpha):
theta = 0
x = compute_Kn()  # Constant value for a certain number of bits
y = 0
shift = 0
for arc_tangent in theta_table:
sx = x >> shift
sy = y >> shift
shift += 1
if theta < alpha:
theta += arc_tangent
x -= sy
y += sx
else:
theta -= arc_tangent
x += sy
y -= sx
return x , y

def main():
print("BITS =", BITS)
print("Kn =", compute_Kn())
print("theta_table = [")
for theta in theta_table:
print(f"    {theta},")
print("    ]")
print("  x        sin(x)        Error           cos(x)        Error")
for x in range(-90, 91, 5):
cordic_cos, cordic_sin = CORDIC_integer(alpha)
cordic_cos /= (2 ** BITS)
cordic_sin /= (2 ** BITS)
print(f"{x:+05.1f}°  {cordic_sin:+.9f} ({cordic_sin-sin:+.9f})   {cordic_cos:+.9f} ({cordic_cos-cos:+.9f})")

if __name__ == "__main__":
main()

Output:
Code: [Select]
BITS = 14
Kn = 9949
theta_table = [
12868,
7596,
4014,
2037,
1023,
512,
256,
128,
64,
32,
16,
8,
4,
2,
]
x        sin(x)        Error           cos(x)        Error
-90.0°  -0.999938965 (+0.000061035)   -0.000183105 (-0.000183105)
-85.0°  -0.996032715 (+0.000161983)   +0.087097168 (-0.000058575)
-80.0°  -0.984741211 (+0.000066542)   +0.173583984 (-0.000064193)
-75.0°  -0.965942383 (-0.000016557)   +0.258850098 (+0.000031053)
-70.0°  -0.939514160 (+0.000178461)   +0.341796875 (-0.000223268)
-65.0°  -0.906433105 (-0.000125318)   +0.422668457 (+0.000050195)
-60.0°  -0.866149902 (-0.000124499)   +0.499755859 (-0.000244141)
-55.0°  -0.819213867 (-0.000061823)   +0.573486328 (-0.000090108)
-50.0°  -0.766174316 (-0.000129873)   +0.642517090 (-0.000270520)
-45.0°  -0.707092285 (+0.000014496)   +0.706970215 (-0.000136566)
-40.0°  -0.642761230 (+0.000026379)   +0.765991211 (-0.000053232)
-35.0°  -0.573669434 (-0.000092997)   +0.818847656 (-0.000304388)
-30.0°  -0.499877930 (+0.000122070)   +0.866210938 (+0.000185534)
-25.0°  -0.422546387 (+0.000071875)   +0.906372070 (+0.000064283)
-20.0°  -0.342163086 (-0.000142943)   +0.939941406 (+0.000248785)
-15.0°  -0.258789062 (+0.000029983)   +0.965881348 (-0.000044479)
-10.0°  -0.173889160 (-0.000240982)   +0.984802246 (-0.000005507)
-05.0°  -0.087280273 (-0.000124531)   +0.996215820 (+0.000021122)
+00.0°  -0.000061035 (-0.000061035)   +1.000244141 (+0.000244141)
+05.0°  +0.087280273 (+0.000124531)   +0.996032715 (-0.000161983)
+10.0°  +0.173767090 (+0.000118912)   +0.984741211 (-0.000066542)
+15.0°  +0.258789062 (-0.000029983)   +0.965942383 (+0.000016557)
+20.0°  +0.342102051 (+0.000081907)   +0.939514160 (-0.000178461)
+25.0°  +0.422485352 (-0.000132910)   +0.906433105 (+0.000125318)
+30.0°  +0.499816895 (-0.000183105)   +0.866149902 (+0.000124499)
+35.0°  +0.573425293 (-0.000151143)   +0.819213867 (+0.000061823)
+40.0°  +0.642639160 (-0.000148450)   +0.766174316 (+0.000129873)
+45.0°  +0.707153320 (+0.000046539)   +0.707092285 (-0.000014496)
+50.0°  +0.766052246 (+0.000007803)   +0.642761230 (-0.000026379)
+55.0°  +0.819091797 (-0.000060247)   +0.573547363 (-0.000029073)
+60.0°  +0.866149902 (+0.000124499)   +0.499816895 (-0.000183105)
+65.0°  +0.906433105 (+0.000125318)   +0.422485352 (-0.000132910)
+70.0°  +0.939514160 (-0.000178461)   +0.342102051 (+0.000081907)
+75.0°  +0.965942383 (+0.000016557)   +0.258789062 (-0.000029983)
+80.0°  +0.984741211 (-0.000066542)   +0.173889160 (+0.000240982)
+85.0°  +0.996032715 (-0.000161983)   +0.087280273 (+0.000124531)
+90.0°  +0.999938965 (-0.000061035)   +0.000244141 (+0.000244141)

Translating it to C code is not complicated.
« Last Edit: June 12, 2024, 01:17:27 pm by Picuino »

The following users thanked this post: RAPo

#### Picuino

• Super Contributor
• Posts: 1026
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #1 on: June 12, 2024, 05:03:49 pm »
New version with integer math.

Code: [Select]
import math

BITS = 15
RANGE = (2 ** BITS) - 1
theta_table = [int(math.atan2(1, 2**i)*RANGE*2/math.pi) for i in range(BITS)]

def compute_Kn():
k = RANGE
for i in range(BITS):
k *= 1.0 / math.sqrt(1 + 2 ** (-2 * i))
return int(k)

def CORDIC_integer(alpha):
theta = 0
x = compute_Kn()  # Constant value for a certain number of bits
if theta < alpha:
theta += theta_table[0]
y = x
else:
theta -= theta_table[0]
y = -x
shift = 1
for arc_tangent in theta_table[1:]:
sx = x >> shift
sy = y >> shift
shift += 1
if theta < alpha:
theta += arc_tangent
x -= sy
y += sx
else:
theta -= arc_tangent
x += sy
y -= sx
return x, y

def main():
print("BITS =", BITS)
print("Kn =", compute_Kn())
print("theta_table = [")
for theta in theta_table:
print(f"    {theta},")
print("    ]")

sin_desvest = 0
cos_desvest = 0
num_deg = 0
print("\n  x         x_int            sin(x)  Error         cos(x)  Error")
for x in range(-90, 91, 5):
alpha = int(math.radians(x) * RANGE * 2 / math.pi)
cordic_cos, cordic_sin = CORDIC_integer(alpha)
sin_desvest += (cordic_sin - sin) ** 2
cos_desvest += (cordic_cos - cos) ** 2
num_deg += 1
print(f"{x:+05.1f}°  {alpha:+12d}   {cordic_sin:+12d} ({cordic_sin-sin:+4d})   {cordic_cos:+12d} ({cordic_cos-cos:+4d})")
print()
print("sin_desvest =", math.sqrt(sin_desvest/(num_deg-1)))
print("cos_desvest =", math.sqrt(cos_desvest/(num_deg-1)))

if __name__ == "__main__":
main()

Results:
Code: [Select]
BITS = 15
Kn = 19897
theta_table = [
16383,
9671,
5110,
2594,
1302,
651,
325,
162,
81,
40,
20,
10,
5,
2,
1,
]

x         x_int            sin(x)  Error         cos(x)  Error
-90.0°        -32767         -32766 (  +1)             -5 (  -5)
-85.0°        -30946         -32640 (  +2)          +2846 (  -9)
-80.0°        -29126         -32267 (  +2)          +5685 (  -4)
-75.0°        -27305         -31650 (  +0)          +8485 (  +5)
-70.0°        -25485         -30788 (  +2)         +11197 (  -9)
-65.0°        -23665         -29697 (  -1)         +13854 (  +7)
-60.0°        -21844         -28379 (  -2)         +16380 (  -3)
-55.0°        -20024         -26842 (  -1)         +18793 (  -1)
-50.0°        -18203         -25103 (  -3)         +21058 (  -4)
-45.0°        -16383         -23170 (  -1)         +23165 (  -4)
-40.0°        -14563         -21058 (  +4)         +25103 (  +3)
-35.0°        -12742         -18795 (  -1)         +26838 (  -3)
-30.0°        -10922         -16384 (  -1)         +28378 (  +1)
-25.0°         -9101         -13852 (  -5)         +29695 (  -1)
-20.0°         -7281         -11205 (  +1)         +30796 (  +6)
-15.0°         -5461          -8483 (  -3)         +31648 (  -2)
-10.0°         -3640          -5691 (  -2)         +32269 (  +0)
-05.0°         -1820          -2854 (  +1)         +32644 (  +2)
+00.0°            +0             -2 (  -2)         +32771 (  +4)
+05.0°         +1820          +2852 (  -3)         +32639 (  -3)
+10.0°         +3640          +5689 (  +0)         +32266 (  -3)
+15.0°         +5461          +8483 (  +3)         +31649 (  -1)
+20.0°         +7281         +11205 (  -1)         +30788 (  -2)
+25.0°         +9101         +13852 (  +5)         +29697 (  +1)
+30.0°        +10922         +16380 (  -3)         +28379 (  +2)
+35.0°        +12742         +18791 (  -3)         +26842 (  +1)
+40.0°        +14563         +21058 (  -4)         +25103 (  +3)
+45.0°        +16383         +23167 (  -2)         +23170 (  +1)
+50.0°        +18203         +25103 (  +3)         +21058 (  -4)
+55.0°        +20024         +26840 (  -1)         +18795 (  +1)
+60.0°        +21844         +28378 (  +1)         +16384 (  +1)
+65.0°        +23665         +29697 (  +1)         +13852 (  +5)
+70.0°        +25485         +30788 (  -2)         +11205 (  -1)
+75.0°        +27305         +31649 (  -1)          +8483 (  +3)
+80.0°        +29126         +32266 (  -3)          +5691 (  +2)
+85.0°        +30946         +32639 (  -3)          +2854 (  -1)
+90.0°        +32767         +32768 (  +1)             -1 (  -1)

sin_desvest = 2.4094720491334933
cos_desvest = 3.7155828016013257

#### Sensorcat

• Regular Contributor
• Posts: 73
• Country:
• Freelance Sensor Consultant
##### Re: Low-accuracy fast CORDIC
« Reply #2 on: June 12, 2024, 07:51:38 pm »
Nice, but in which way is this more accurate than a LUT? The accuracy that can be achieved with a LUT is limited only by the amount of memory that you can (or are willing to) set aside for the table.

#### peter-h

• Super Contributor
• Posts: 3853
• Country:
• Doing electronics since the 1960s...
##### Re: Low-accuracy fast CORDIC
« Reply #3 on: June 13, 2024, 05:57:59 am »
This may be a similar thing. Super fast and super accurate. I use it for a waveform generator. No idea where I found it

SPC is "samples per cycle" e.g. 1024.

Code: [Select]

// Compact sin(x) function which doesn't need math.h
// Uses Horner's method
// i is from 0 to SPC-1

#define PI 3.14159265358979323846
#define SCALE 13107

int sin2(int i)
{
//for (int n=3; sin != old_sin; n+=2)
for (int n=3; (sin != old_sin) && (n < 30); n+=2)
{
old_sin = sin;
sin += term;
}
return (sin * SCALE) + (sin >= 0 ? 0.5 : -0.5);
}

Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417

#### Picuino

• Super Contributor
• Posts: 1026
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #4 on: June 13, 2024, 07:47:10 am »
Nice, but in which way is this more accurate than a LUT? The accuracy that can be achieved with a LUT is limited only by the amount of memory that you can (or are willing to) set aside for the table.
To achieve the same accuracy you would need a LUT of approximately 16384 x 2 Byte entries (32kBytes total). It could be reduced by taking into account some trigonometric properties, but the table is still too large.
That LUT is unacceptable for most small and not so small micros.

This may be a similar thing. Super fast and super accurate. I use it for a waveform generator. No idea where I found it

SPC is "samples per cycle" e.g. 1024.
Note that on micros that do not have hardware acceleration to perform multiplications, the CORDIC method is still much faster, because it avoids multiplications that are very costly in machine cycles.
The CORDIC method only performs shifts and additions/subtractions, and also does not have many cycles. As many as bits of precision.

#### iMo

• Super Contributor
• Posts: 4951
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #5 on: June 13, 2024, 08:21:26 am »
FYI - Long time back I ran Xilinx'es Microblaze CPU with its HW Cordic library on a Spartan6, published some stuff on the at that time famous "Retrobsd" site (does not exist anymore).
I output the data off the FPGA and compared against excel numbers - here are couple of pictures I've found on my hdd..

PS: found the excel sheets with data and graphs, perhaps you may find it interesting..

The max size which fit into my XC6SLX9 was Q1.18 for sin/cos outputs, and it could generate X millions sin/cos (its hw cordic outputs both sin and cos at once) per second as I can remember..

« Last Edit: June 13, 2024, 08:50:20 am by iMo »

#### peter-h

• Super Contributor
• Posts: 3853
• Country:
• Doing electronics since the 1960s...
##### Re: Low-accuracy fast CORDIC
« Reply #6 on: June 13, 2024, 11:23:47 am »
I think Cordic is also called a Horn algorithm (after Mr Horn). It generates values for an octant only so you have to swap them around. I used this in a graphics library on a Z80 c. 1983. It was pretty fast. To get a better result you had to do anti aliasing which was a lot more work, but if you just want a sin(x) value it was "integer exact".
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417

#### Picuino

• Super Contributor
• Posts: 1026
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #7 on: June 13, 2024, 11:28:33 am »
https://en.wikipedia.org/wiki/CORDIC
Quote
CORDIC was conceived in 1956 by Jack E. Volder at the aeroelectronics department of Convair out of necessity to replace the analog resolver in the B-58 bomber's navigation computer with a more accurate and faster real-time digital solution. Therefore, CORDIC is sometimes referred to as a digital resolver.

#### tggzzz

• Super Contributor
• Posts: 20117
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #8 on: June 13, 2024, 11:52:36 am »
I think Cordic is also called a Horn algorithm (after Mr Horn). It generates values for an octant only so you have to swap them around. I used this in a graphics library on a Z80 c. 1983. It was pretty fast. To get a better result you had to do anti aliasing which was a lot more work, but if you just want a sin(x) value it was "integer exact".

In the summer of 1976 (or 7?) I had a vacation job where I implemented a 4-byte floating point library from scratch. I used CORDIC to implement sin/cos/tan and IIRC on a 6800 it took 30ms to generate the result.

I still have the code somewhere, on fanfold line printer paper, of course
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less

#### iMo

• Super Contributor
• Posts: 4951
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #9 on: June 13, 2024, 09:27:49 pm »
My HP-25 (also around 1976) has got cordic for trigo, afaik. With 2k instr ROM for entire calculator..

https://archived.hpcalc.org/laporte/TheSecretOfTheAlgorithms.htm
https://archived.hpcalc.org/laporte/Trigonometry.htm
« Last Edit: June 13, 2024, 10:14:56 pm by iMo »

#### coppice

• Super Contributor
• Posts: 9052
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #10 on: June 13, 2024, 09:47:37 pm »
The older Intel and Motorola FPUs (MC68881, 8087, 80287, 80387, 486, etc.) used CORDIC for many of their functions. I'm not sure if that continued into the SSE and AVX era. I don't think they have single instruction trig or log functions.

#### Sensorcat

• Regular Contributor
• Posts: 73
• Country:
• Freelance Sensor Consultant
##### Re: Low-accuracy fast CORDIC
« Reply #11 on: June 14, 2024, 11:23:45 pm »
To achieve the same accuracy you would need a LUT of approximately 16384 x 2 Byte entries (32kBytes total). It could be reduced by taking into account some trigonometric properties, but the table is still too large.
That LUT is unacceptable for most small and not so small micros.
All true, it is the assertion from your first post that is wrong.

#### SiliconWizard

• Super Contributor
• Posts: 14985
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #12 on: June 14, 2024, 11:39:58 pm »
Now, CORDIC on a processor is worth it only if it doesn't have a FPU, or if said FPU is very slow. Otherwise, with floating point, there are simpler, faster (if the FPU is decent) and more accurate ways of approximating sin/cos.
I remember a whole thread that was dealing with this very topic. If someone remembers/finds it, they can post a link as a reference.

#### AussieBruce

• Regular Contributor
• Posts: 58
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #13 on: June 15, 2024, 03:30:33 am »
Chebyshev polynomials are also used for calculating trig functions. They work best if you have an FPU.

#### iMo

• Super Contributor
• Posts: 4951
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #14 on: June 15, 2024, 08:45:03 am »
Now, CORDIC on a processor is worth it only if it doesn't have a FPU, or if said FPU is very slow. Otherwise, with floating point, there are simpler, faster (if the FPU is decent) and more accurate ways of approximating sin/cos.
I remember a whole thread that was dealing with this very topic. If someone remembers/finds it, they can post a link as a reference.

The FPUs in todays MCUs work differently compared to the 80x87, 68881, etc.
Those old FPU chips were huge engines running large chunks of code internally to generate sin/cos/tan etc.
Today's FPUs in MCUs typically do a handful of basic operations like add/sub/mult/div/mac and perhaps some conversions. Those sin/cos etc functions are still computed by a software library.

#### Nominal Animal

• Super Contributor
• Posts: 6607
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #15 on: June 15, 2024, 11:55:46 am »
This may be a similar thing.
That one calculates the sine using the series definition,
$$\sin(x) = x - \frac{x^3}{2 \cdot 3} + \frac{x^5}{2 \cdot 3 \cdot 4 \cdot 5} - \frac{x^7}{2 \cdot 3 \cdot 4 \cdot 5 \cdot 6 \cdot 7} + \dots = \sum_{k=0}^\infty \frac{(-1)^k}{(2 k + 1)!} x^{2 k + 1}$$
using up to fifteen first terms, i.e. as a polynomial.

The Horner's method is a way to optimize calculating each summed term:
$$f(x) = C_0 + x C_1 + x C_2 + x C_3 + x C_4 + \dots = C_0 + x ( C_1 + x ( C_2 + x ( C_3 + x ( C_4 + \dots ) ) ) )$$
Here, it means that term $k$ can be calculated from term $k-1$ via
$$s_0 = x, \quad s_{k} = -\frac{x^2}{2 k (2 k + 1)} s_{k-1}$$
In the code shown, n starts at 3 for the second term (corresponding to $k = 1$) and increments by 2 each iteration, and the division includes the negation.  In other words, the divisor (-n * (n - 1)) is equivalent to $-(2 k)(2 k + 1)$, and n = $2 k + 1$.

This form is most commonly used for only the half wave centered around zero, i.e. for $x$ in $[-\pi/2, \pi/2]$, as $-\sin(-x) = \sin(x)$.  The other half wave is $\sin(x) = -\sin(x \pm \pi)$.  Cosine is sine advanced by a quarter wave, $\cos(x) = \sin(x + \pi/2)$.
When you use integer angles, $x = 2 \pi n / N$ with $N = 2^B$, the modulo operation to limit the index $n$ to this half wave is simple: you add $2^{B-2}$ to the index.  If the highest bit (bit $B-1$) is then set, the result needs to be negated.  Do a binary AND operation with $2^{B-1}-1$, and substract $2^{B-2}$.  This ensures $N/2 \le n \le N/2$.

Look-up based methods typically only use the first quarter wave, $x = 0 \dots \pi/2$, divided into $N = 2^B$ equal-size steps, $x = n \pi / (2 N) = n \pi 2^{-B-1}$.  To map an arbitrary integer to this range ($0 \dots N$, i.e. $2^B + 2$ entries; the first extra entries has value 1.0, and the last extra entry is a copy of the last valid entry), you use the $B$ least significant bits as an unsigned integer $i$.  If bit $B+2$ is set, the result is negative, otherwise it is positive.  If bit $B+1$ is set, then $n = 2^B - i$, else $n = i$.

Interpolated look-up methods use some of the least significant bits of the original index as a fractional part in [0, 1).  The fractional part then interpolates between the entry calculated as above, and the one before or after (before if bit $B+1$ was set in the original index).  The fractional part interpolates between these two entries.  (The second extra entry is only used for when the angle is exactly ±90°, or 360°±n×90°.)

Linear interpolation is trivial.  If $v_0$ is the value at the base entry, $v_1$ is the value in the other entry, and $p$ is the fractional part, $[0, 1)$, then the interpolated value $v$ is
$$v = (1 - p) v_0 + p v_1 = v_0 + p (v_1 - v_0)$$
where the first form is numerically stable (and preferred if you use floating-point numbers).  When $p = 0$, $v = v_0$, and when $p = 1$, $v = v_1$.
If $p$ is the fractional part of the index as an integer in $[0, P)$, i.e. $p = 0 \dots P - 1$, then
$$v = \frac{(P - p) v_0 + p v_1}{P} = v_0 + \frac{p (v_1 - v_0)}{P}$$
where the division is just a bit shift.  Note that you need to use a large enough integer type for the multiplication.

Cubic interpolation is not much more complex, and only requires two values per entry: the value at that point, $v_n$, and the tangent or slope $t_n$ at that point.  Then,
\begin{aligned} v &= v_0 (1 - p)^3 + (3 v_0 + t_0) (1 - p)^2 p + (3 v_1 - t_1) (1 - p) p^2 + v_1 p^3 \\ ~ &= (1 - p)^2 \bigl( (1-p) v_0 + p (3 v_0 + t_0 ) \bigr) + p^2 \bigl( (1-p)( 3 v_1 - t_1 ) + p v_1 \bigr) \\ ~ &= (1 - p)^2 \bigl( v_0 + p ( 2 v_0 + t_0 ) \bigr) + p^2 \bigl( v_1 + (1 - p)(2 v_1 - t_1) \bigr) \\ \end{aligned}
which is pretty stable numerically.  If you use fixed-point numbers, you can calculate $p^2$ first, then $(1-p)^2 = 1 - 2 p - p^2$.  In total, you need five fixed-point multiplications to calculate the polynomial.

For a look-up table with $2^B$ entries in the first quarter wave,
\begin{aligned} v_n &= K \sin\left(\frac{\pi n}{2 N}\right) \\ t_n &= \frac{\pi K}{2 N} \cos\left(\frac{\pi n}{2 N}\right) \\ \end{aligned}
where $K$ is the peak value ($1.0$). $v_n$ and $t_n$ are positive (or zero) for $n = 0 \dots N$.

If you use a fixed point type with 24 fractional bits and 8 integral bits including sign, and a full turn corresponds to 1.0, 66 look-up entries (528 bytes) with the cubic interpolation method will yield sines and cosines within ±7/16777216 of the expected value.  If the multiplication rounds halfway away from zero (by adding 8388608 if the product is positive, -8388608 if negative, before the right shift), the error reduces to ±4/16777216 (in any representable angle). Thus, the limiting factor is not the size of the look-up table, but the fixed-point precision.

With single-precision floating-point, 34 look-up entries (dividing the circle into 128 segments), using 264 bytes, the cubic approximation should yield sines and cosines to within ±0.0000003 of the correct value.  Again, the precision is the limit, and a larger table won't help much.  Using double precision for the interpolation calculation (but still single-precision look-up table and results), it yields sines and cosines to within ±0.00000021.
« Last Edit: June 15, 2024, 12:00:21 pm by Nominal Animal »

#### Nominal Animal

• Super Contributor
• Posts: 6607
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #16 on: June 16, 2024, 02:11:35 am »
To summarize or reword my above, way too long post:
• CORDIC is useful when you don't have a fast multiplication, as it uses only bit shifts and additions and subtractions.

• When you have a fast integer multiplication, you can use fixed point arithmetic (arithmetic shift right after multiplication) and a 500-byte lookup table to get better than 20 bits of precision via piecewise cubic interpolation.

• If you have fast single-precision float support (Cortex-M4F, Cortex-M7), you need only 34 precalculated floats to get sine and cosine at about 22 bits of precision, for the cost of four table lookups, five multiplications, and some additions and subtractions.

• If you don't have the memory for precalculating constants, you can use single- or double-precision floating point and one of the series definitions, and use the half wave centered at 0 for sine.  For cosine, add a quarter wave to the argument, and then calculate as for sine.  The easiest series definition is
$$s_0 = x, \quad s_{k+1} = \frac{- s_0^2 \, s_k}{(2 k + 2)(2 k + 3)}$$
If you use a scaled $x$, for example so that $F$ corresponds to full wave or turn, then
$$s_0 = \frac{2 \pi x }{F}, \quad s_{k+1} = \frac{ - s_0^2 \, s_k }{(2 k + 2)(2 k + 3)}$$

• For cubic interpolation, you need the sine value at each look-up point, as well as its tangent or slope at each look-up point.  When $F$ corresponds to the length of a full wave or one full rotation in angular terms, and $A$ is the amplitude ($A = 2^B$, later on), these are
\begin{aligned} v_x &= A \sin\left(\frac{2 \pi x}{F}\right) \\ t_x &= \frac{2 \pi A}{F} \cos\left(\frac{2 \pi x}{F}\right) \\ \end{aligned}

• Cubic interpolation of $v$ between $v_0, t_0$ and $v_1, t_1$ using $p = 0 \dots 1$ is
$$v = (1 - p)^3 v_0 + (1 - p)^2 p ( 3 v_0 + t_0 ) + (1 - p) p^2 ( 3 v_1 - t_1 ) + p^3 v_1$$
which can be calculated in form
$$v = (1 - p)^2 \biggl( v_0 + p (2 v_0 + t_0 ) \biggr) + p^2 \biggl( v_1 + (1 - p) (2 v_1 - t_1 ) \biggr)$$
where the idea is you calculate $p^2$, $(1 - p)^2 = 1 + 2 p - p^2$, and $(1 - p)$ first, so you only need five multiplications and about 11 additions or subtractions.  Multiplications and additions/subtractions interleave well (so that an operation doesn't need the result from the immediately preceding operation), which helps on architectures with dual-issue ALUs or separate FP ALUs for multiplication and addition/subtraction.

An alternate, possibly more accurate way to evaluate the above piecewise cubic polynomial for $v$ is to use De Casteljau's algorithm for a single-dimensional Bézier curve.  If we calculate (edit: fixed errors)
$$B_0 = v_0, ~ B_1 = v_0 + \frac{1}{3} t_1, ~ B_2 = v_1 - \frac{1}{3} t_0, ~ B_3 = v_1$$
then our interpolation can be written in Bézier form,
$$v = (1 - p)^3 B_0 + 3 (1 - p)^2 p B_1 + 3 (1 - p) p^2 B_2 + p^3 B_3$$
and the application of De Casteljau's algorithm yields steps
\begin{aligned} t_{20} &= (1 - p) B_0 + p B_1 \\ t_{21} &= (1 - p) B_1 + p B_2 \\ t_{22} &= (1 - p) B_2 + p B_3 \\ t_{10} &= (1 - p) t_{20} + p t_{21} \\ t_{11} &= (1 - p) t_{21} + p t_{22} \\ v &= (1 - p) t_{10} + p t_{11} \\ \end{aligned} \quad \text{where} \quad \begin{aligned} (1 - p) B_1 &= B_1 - p B_1 \\ (1 - p) B_2 &= B_2 - p B_2 \\ (1 - p) t_{21} &= t_{21} - p t_{21} \\ \end{aligned}
so you end up needing about 11 multiplications (9 if you pre-divide $t_k$ by three) and 12 additions or subtractions, or so; but this should have excellent numerical stability and precision, and is also well suited for fixed-point arithmetic implementation.  For the fixed-point arithmetic, I recommend implementing
$$\operatorname{linear\_interpolate}\left(\frac{i}{2^B}, c_0, c_1\right) = \left(1 - \frac{i}{2^B}\right) c_0 + \frac{i}{2^B} c_1 = c_0 + \frac{i ( c_1 - c_0 )}{2^B}$$
with rounding, i.e. something like
Code: [Select]
// B is the number of fractional bits in i, and i = 0 .. 2**B-1 = ((1 << B) - 1), inclusive.

// c0 is the value at i = 0, and c1 is the value at i = 1<<B.
int32_t  linear_interpolate(uint32_t i, int32_t c0, int32_t c1) {
return (c0 > c1) ? c0 - (int32_t)(((uint64_t)(c0 - c1) * (uint64_t)i + (1 << (B-1))) >> B)
: c0 + (int32_t)(((uint64_t)(c1 - c0) * (uint64_t)i + (1 << (B-1))) >> B);
}

// v0 is the value and t0 is one third of the slope at i = 0, and
// v1 is the value and t1 is one third of the slope at i = 1<<B.
// That is, this expects t0 and t1 to have been divided by three.
int32_t  cubic_interpolate(uint32_t i, int32_t v0, int32_t t0, int32_t v1, int32_t t1) {
const int32_t  z0 = linear_interpolate(i, v0, v0 + t0);
const int32_t  z1 = linear_interpolate(i, v0 + t0, v1 - t1);
const int32_t  z2 = linear_interpolate(i, v1 - t1, v1);
const int32_t  x0 = linear_interpolate(i, z0, z1);
const int32_t  x1 = linear_interpolate(i, z1, z2);
return linear_interpolate(i, x0, x1);
}
where B is the number of fractional bits in i, and t[] have been pre-divided by 3.  It does not matter if c0 and c1 are integers or signed fixed-point integers. (Fixed-point addition and subtraction is as cheap as on same size integers, so they don't matter much.)

(Edit: Fixed math error in the coefficients, and did minor rewording.  And added the cubic_interpolate() function.)
« Last Edit: June 16, 2024, 12:24:50 pm by Nominal Animal »

The following users thanked this post: iMo

• Super Contributor
• Posts: 3626
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #17 on: June 16, 2024, 01:22:40 pm »
here is better way:
Code: [Select]
#define SR 48000        // Sample rate
#define FREQ 1000       // 1000 Hz sine

#define PI  3.1415926535897932384626433832795
#define PW  ((float)FREQ * PI / SR)

static float R = (float)(4 * PW * PW); // for better approximation you can use R = pow(2*sin(PW), 2);
static float V = 0;
static float X = 2000;//4096/3.3 * 0.5;      // init value
static float S = 2048;//1500; // bias offset

uint16_t getSineNextSample() {
V -= X * R;
X += V;
return (uint16_t)(X + S);
}
« Last Edit: June 16, 2024, 01:32:01 pm by radiolistener »

#### Nominal Animal

• Super Contributor
• Posts: 6607
• Country:
##### Re: Low-accuracy fast CORDIC
« Reply #18 on: June 16, 2024, 07:51:01 pm »
Better how?  I personally don't often need to periodically sample a sine wave, which is what that code does; I more often need random sine and cosine values.

For those with hardware float support in their MCU's, I started a separate thread describing a lightweight example implementation for $\operatorname{usin}(x) = \sin(2\pi x)$ and $\operatorname{ucos}(x) = \cos(2 \pi x)$.  Less than 440 bytes total on Cortex-M4F and -M7F, with about 21 bits (six decimal digits) of precision.

Smf