Author Topic: Atan2 lookup table, for 256 part circle?  (Read 1409 times)

0 Members and 1 Guest are viewing this topic.

Offline InfravioletTopic starter

  • Super Contributor
  • ***
  • Posts: 1171
  • Country: gb
Atan2 lookup table, for 256 part circle?
« on: July 30, 2024, 10:57:22 pm »
I'm looking for a really fast way to calculate an atan2 function, but not to perfect accuracy.

My inputs are two integers, each guaranteed to be in the range -200 to +200. I need an integer output between 1 and 256 (a circle divided in to 256 parts, rather than 360 degrees or 2*pi radians) accurate to 1 part in those 256, or perhaps accurate to +/-0.5.

int16_t Angle=(256.0/360.0)*(180.0/3.1415926535)*atan2(lengthOne,lengthTwo) ,  with the int16_t later subjected to clamping to the 0<=output<256 range, matches the mapping I'm trying to create, but I need to make it run much quicker.

Does anyone by any chance have existing code which does this from a lookup table?

Generating such a lookup table from scratch doesn't seem too obvious, as firstly: atan2 takes two arguments not just one, so such a table could be very large sized with the square of the input resolution, but secondly: if one works through breaking atan2 down in to atan wrapped up within the various signed logic, you still end up trying to generate a lookup table for which you tend to need entries much more closely spaced in some regions than others again making it bigger than ideal.

Most of the existing discussion though, of the difficulties of lookup tables for atan2 and of the use of cordic algorithm alternatives, assumes you want tighter accuracy than, in effect, 1.4 degrees and that you're wanting to work with floating point integers for giving radian units in the output. But with a 256 part circle as the output, and limits on the resolution of my input numbers I wondered whether lookup table definitions for this already exist?

Thanks

 

Offline langwadt

  • Super Contributor
  • ***
  • Posts: 4803
  • Country: dk
 

Offline ledtester

  • Super Contributor
  • ***
  • Posts: 3248
  • Country: us
Re: Atan2 lookup table, for 256 part circle?
« Reply #2 on: July 30, 2024, 11:31:50 pm »
Does anyone by any chance have existing code which does this from a lookup table?

Generating such a lookup table from scratch doesn't seem too obvious, as firstly: atan2 takes two arguments not just one, so such a table could be very large sized with the square of the input resolution, but secondly: if one works through breaking atan2 down in to atan wrapped up within the various signed logic, you still end up trying to generate a lookup table for which you tend to need entries much more closely spaced in some regions than others again making it bigger than ideal.

I found this:

https://github.com/xiezhq-hermann/atan_lookup/blob/master/atan.cpp

Note that it basically only has a lookup table for 0 to 45 degrees and uses trig identities for the other sectors. It uses floating point but all you have to compute is round(slope*100) -- it uses a 102-element lookup table, so maybe you'll want  to compute round(slope*256)

It says it has "enough accuracy".

 

Offline radiolistener

  • Super Contributor
  • ***
  • Posts: 4089
  • Country: 00
Re: Atan2 lookup table, for 256 part circle?
« Reply #3 on: July 30, 2024, 11:43:22 pm »
Does anyone by any chance have existing code which does this from a lookup table?

Code: [Select]
int fast_atan2(int y, int x)
/* pre scaled for int16 */
{
int yabs, angle;
int pi4=(1<<12), pi34=3*(1<<12);  // note pi = 1<<14
if (x==0 && y==0) {
return 0;
}
yabs = y;
if (yabs < 0) {
yabs = -yabs;
}
if (x >= 0) {
angle = pi4  - pi4 * (x-yabs) / (x+yabs);
} else {
angle = pi34 - pi4 * (x+yabs) / (yabs-x);
}
if (y < 0) {
return -angle;
}
return angle;
}

With LUT:
Code: [Select]
static int *atan_lut = NULL;
static int atan_lut_size = 131072; /* 512 KB */
static int atan_lut_coef = 8;

int atan_lut_init(void)
{
int i = 0;

atan_lut = malloc(atan_lut_size * sizeof(int));

for (i = 0; i < atan_lut_size; i++) {
atan_lut[i] = (int) (atan((double) i / (1<<atan_lut_coef)) / 3.14159 * (1<<14));
}

return 0;
}

int lut_atan2(int cj, int cr)
/* special cases */
if (cr == 0 || cj == 0) {
if (cr == 0 && cj == 0)
{return 0;}
if (cr == 0 && cj > 0)
{return 1 << 13;}
if (cr == 0 && cj < 0)
{return -(1 << 13);}
if (cj == 0 && cr > 0)
{return 0;}
if (cj == 0 && cr < 0)
{return 1 << 14;}
}

/* real range -32768 - 32768 use 64x range -> absolute maximum: 2097152 */
x = (cj << atan_lut_coef) / cr;
x_abs = abs(x);

if (x_abs >= atan_lut_size) {
/* we can use linear range, but it is not necessary */
return (cj > 0) ? 1<<13 : -(1<<13);
}

if (x > 0) {
return (cj > 0) ? atan_lut[x] : atan_lut[x] - (1<<14);
} else {
return (cj > 0) ? (1<<14) - atan_lut[-x] : -atan_lut[-x];
}

return 0;
}
 

Offline InfravioletTopic starter

  • Super Contributor
  • ***
  • Posts: 1171
  • Country: gb
Re: Atan2 lookup table, for 256 part circle?
« Reply #4 on: July 31, 2024, 12:56:36 am »
langwadt, the fast cordic 8 bit version definitely works, but isn't much faster than the default atan2 function. On a 16MHz atmega I get 120us for it, vs 192us for the default function.

radiolistener, what units is your fast_atan2 function written in? I tested the sped, it is wondefully fast, 24us, but I can't understand it's output. When compared against a normal atan2 function it gives very unusual results, unless perhaps your bit order is backwards relative to mine? When I take an angle, use it to generate example sin and cos components, then provide these components in to the function I get 4096 out for all pairs of components which would give an angle in the 0 to 90 degree region, 12288 for components correpsonding to angles in the 90 to 180 region, then -12288 and -4096 for the next quadrants.  As for the lookup table version, how would I go about reconstructing that for an 8 bit integere output, I suspect that would massively reduce the size of the table needed too?

ledtester, thanks too. I'm going to give that one a go, but need to translate it to use ints instead of floats and wrap it up in atan2 logic before I can test it.

EDIT: ledtester, looks like that one can get down to 74us when converted to mostly using ints and just the one instance of a float division in each call of it. Accuracy seems to be consistently to within 2/256 parts. It's very good indeed, though I'd still like to see if a lookup table can be yet faster, and accurate to 1 part in 256.
« Last Edit: July 31, 2024, 01:33:34 am by Infraviolet »
 

Offline ledtester

  • Super Contributor
  • ***
  • Posts: 3248
  • Country: us
Re: Atan2 lookup table, for 256 part circle?
« Reply #5 on: July 31, 2024, 03:36:01 am »
For the method I linked to you only have to perform enough of the division to get 8 bits of the quotient.

This page:

http://www.rjhcoding.com/avr-asm-8bit-division.php

shows how to do it in 97 clocks by dropping down to assembly. It also mentions app note AVR200:

https://ww1.microchip.com/downloads/en/AppNotes/doc0936.pdf

which contains a bunch of AVR division (and multiplication) routines.
 
The following users thanked this post: RAPo

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7125
  • Country: fi
    • My home page and email address
Re: Atan2 lookup table, for 256 part circle?
« Reply #6 on: July 31, 2024, 10:59:21 am »
This does use __udivmodhi4 (unsigned 16-bit division, 16:8=8-bit), for 172 bytes of code and 33 bytes of Flash lookup data using avr-gcc 5.4.0 -Os, but yields 75% correct results, 12.5% over by one, and 12.5% under by one, with absolute error within ±1.09:
Code: [Select]
// SPDX-License-Identifier: CC0-1.0
#include <stdint.h>

static const int8_t octant_angle[33] = {
     0,  1,  3,  4,  5,  6,  8,  9, 10, 11, 12, 13, 15, 16, 17, 18,
    19, 20, 21, 22, 23, 24, 25, 25, 26, 27, 28, 29, 29, 30, 31, 31,
    32
};

static int8_t within_octant(uint8_t major, uint8_t minor) {
    if (!major)
        return 0;
    else
        return octant_angle[((uint_fast16_t)minor * 32 + major / 2)/ major];
}

int8_t atan2i(int_fast16_t y, int_fast16_t x) {
    if (y < 0) {
        const uint8_t ay = -y;
        if (x < 0) {
            const uint8_t ax = -x;
            if (ay < ax) {
                return -128 + within_octant(ax, ay);
            } else {
                return -64 - within_octant(ay, ax);
            }
        } else {
            const uint8_t ax = +x;
            if (ay < ax) {
                return -within_octant(ax, ay);
            } else {
                return -64 + within_octant(ay, ax);
            }
        }
    } else {
        const int_fast16_t ay = +y;
        if (x < 0) {
            const int_fast16_t ax = -x;
            if (ay < ax) {
                return 128 - within_octant(ax, ay);
            } else {
                return 64 + within_octant(ay, ax);
            }
        } else {
            const int_fast16_t ax = +x;
            if (ay < ax) {
                return within_octant(ax, ay);
            } else {
                return 64 - within_octant(ay, ax);
            }
        }
    }
}
The atan2i() picks the correct octant, and within_octant(major, minor) yields the angle within the octant, given major ≥  minor0.

The CORDIC-based version,
Code: [Select]
// SPDX-License-Identifier: CC0-1.0
#include <stdint.h>

const uint16_t  cordic_angle[] = { 8192, 4836, 2555, 1297, 651, 326, 163, 81, 41, 20, 10, 5, 3, 1, 1 };

static int8_t within_quadrant(uint16_t uy, uint16_t ux) {
    if (!(uy | ux))
        return 0;

    // Scale ux and uy so that the larger one is between 16384..32767.
    while (!((uy|ux) & 16384)) {
        ux <<= 1;
        uy <<= 1;
    }
    // Ensure both are less than 32768.
    if (((uy|ux) & 32768)) {
        ux >>= 1;
        uy >>= 1;
    }

    uint16_t  py = uy;
    uint16_t  px = ux;

    uint16_t  a = 127; // Rounding!
    uint8_t   i = 0;
    while (1) {
        if (uy >= px) {
            ux += py;
            uy -= px;
            if (ux & 32768) {
                ux >>= 1;
                uy >>= 1;
            }
            a += cordic_angle[i];
            py = uy >> i;
            px = ux >> i;
        }

        if (++i >= sizeof cordic_angle / sizeof cordic_angle[0])
            return (int8_t)(a >> 8);

        px >>= 1;
        py >>= 1;
    }
}

int8_t atan2i(int_fast16_t y, int_fast16_t x) {
    if (y < 0) {
        if (x < 0) {
            return -128+within_quadrant(-y, -x);
        } else {
            return     -within_quadrant(-y, +x);
        }
    } else {
        if (x < 0) {
            return 128-within_quadrant(+y, -x);
        } else {
            return    +within_quadrant(+y, +x);
        }
    }
}
yields an absolute error range of -0.505234 .. +0.505234, with 99.8% of the results correct, and 0.1% (194 of the 401×401 cases) being incorrectly rounded to just above the expected integer value, and 0.1% (194 of the 401×401 cases) incorrectly rounded to just below the expected integer value.  Using avr-gcc 5.4.0 -Os, this generates 226 bytes of code and 30 bytes of look-up table in Flash.  (cordic_angle are the angles corresponding to \$\operatorname{atan2}\left(1,2^i\right)\$ for \$i=0, 1, 2, \dots\$, with full turn being 65536 angular units, since the angles need extra precision.)

Both atan2i(y,x) functions return the angle in -128 .. 127, where positive x axis is at angle 0, and positive y axis is at angle +64.
« Last Edit: July 31, 2024, 11:05:35 am by Nominal Animal »
 

Offline radiolistener

  • Super Contributor
  • ***
  • Posts: 4089
  • Country: 00
Re: Atan2 lookup table, for 256 part circle?
« Reply #7 on: July 31, 2024, 07:13:28 pm »
radiolistener, what units is your fast_atan2 function written in? I tested the sped, it is wondefully fast, 24us, but I can't understand it's output. When compared against a normal atan2 function it gives very unusual results, unless perhaps your bit order is backwards relative to mine? When I take an angle, use it to generate example sin and cos components, then provide these components in to the function I get 4096 out for all pairs of components which would give an angle in the 0 to 90 degree region, 12288 for components correpsonding to angles in the 90 to 180 region, then -12288 and -4096 for the next quadrants. 

it was written for DSP FM demodulator used in rtl_fm, see source code (see function fm_demod):
https://github.com/osmocom/rtl-sdr/blob/master/src/rtl_fm.c

it has a switch, so you can change between fast_atan, lut and classic atan2.

This source also has many other interesting hacks, it apply filters and demodulation with incredible low CPU load :)

As for the lookup table version, how would I go about reconstructing that for an 8 bit integere output, I suspect that would massively reduce the size of the table needed too?

yes, talble size is defined with atan_lut_size and depends on required angle resolution.

Note that it was written for int16_t input samples.


« Last Edit: July 31, 2024, 07:19:55 pm by radiolistener »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7125
  • Country: fi
    • My home page and email address
Re: Atan2 lookup table, for 256 part circle?
« Reply #8 on: July 31, 2024, 08:26:23 pm »
If you don't mind using 20301 bytes for the lookup table (first octant) and 218 bytes for code (avr-gcc 5.4.0 -Os), then you can use a triangular table where entry (x,y) is at index i=y+x*(x+1)/2 for 0≤yx.  The following atan2i(y,x) will yield correct integer angles -128..127 for x=-200..+200, y=-200..+200.  Note that the first entry in the table corresponds to the angle for atan2i(0,0), second to atan2i(0,1), third to atan2i(1,1), fourth to atan2i(0,2), fifth to atan2i(1,2), sixth to atan2i(2,2), and so on.
Code: [Select]
// SPDX-License-Identifier: CC0-1.0
#include <stdint.h>

static const int8_t angle[20301] = {
    // size_t i = 0;
    // for (size_t x = 0; x <= 200; x++)
    //    for (size_t y = 0; y <= x; y++)
    //        angle[i++] = round(atan((double)y, (double)x) * 40.74366543152520595683424342336367668082166930955685087940284);
};

static inline int8_t within_octant(uint_fast16_t x, uint_fast16_t y) {
#ifdef MIGHT_EXCEED_200
    // Note: dividing x and y by their GCD would be better.
    while (x > 200) {
        x >>= 1;
        y >>= 1;
    }
#endif
    return angle[y + x * (x + 1) / 2];
}

int8_t atan2i(int_fast16_t y, int_fast16_t x) {
    if (y < 0) {
        const uint8_t ay = -y;
        if (x < 0) {
            const uint8_t ax = -x;
            if (ay < ax) {
                return -128 + within_octant(ax, ay);
            } else {
                return -64 - within_octant(ay, ax);
            }
        } else {
            const uint8_t ax = +x;
            if (ay < ax) {
                return -within_octant(ax, ay);
            } else {
                return -64 + within_octant(ay, ax);
            }
        }
    } else {
        const int_fast16_t ay = +y;
        if (x < 0) {
            const int_fast16_t ax = -x;
            if (ay < ax) {
                return 128 - within_octant(ax, ay);
            } else {
                return 64 + within_octant(ay, ax);
            }
        } else {
            const int_fast16_t ax = +x;
            if (ay < ax) {
                return within_octant(ax, ay);
            } else {
                return 64 - within_octant(ay, ax);
            }
        }
    }
}
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7125
  • Country: fi
    • My home page and email address
Re: Atan2 lookup table, for 256 part circle?
« Reply #9 on: August 01, 2024, 01:24:16 am »
Here is another variant of atan2i(y,x) that gives the same answers as round(atan2(y, x)*128/Pi) for all x=-255..+255 and y=-255..+255 (ignoring x=y=0):
Code: [Select]
// SPDX-License-Identifier: CC0-1.0
#include <stdint.h>

static const struct {
    uint8_t  x;
    uint8_t  y;
} upper_limit[] = {
    { .x = 163, .y =   2 },
    { .x = 163, .y =   6 },
    { .x = 114, .y =   7 },
    { .x = 151, .y =  13 },
    { .x = 253, .y =  28 },
    { .x =  81, .y =  11 },
    { .x = 230, .y =  37 },
    { .x = 188, .y =  35 },
    { .x = 137, .y =  29 },
    { .x = 219, .y =  52 },
    { .x = 129, .y =  34 },
    { .x = 169, .y =  49 },
    { .x = 161, .y =  51 },
    { .x = 125, .y =  43 },
    { .x = 113, .y =  42 },
    { .x = 253, .y = 101 },
    { .x =   7, .y =   3 },
    { .x = 131, .y =  60 },
    { .x = 209, .y = 102 },
    { .x = 239, .y = 124 },
    { .x = 129, .y =  71 },
    { .x = 151, .y =  88 },
    { .x =  99, .y =  61 },
    { .x = 186, .y = 121 },
    { .x =  86, .y =  59 },
    { .x = 101, .y =  73 },
    { .x = 255, .y = 194 },
    { .x =   5, .y =   4 },
    { .x = 227, .y = 191 },
    { .x =  95, .y =  84 },
    { .x = 197, .y = 183 },
    { .x = 206, .y = 201 },
};
#define  upper_limits  (sizeof upper_limit / sizeof upper_limit[0])

static inline int8_t within_octant(uint8_t x, uint8_t y) {
    uint8_t  b = 0;
    uint8_t  q = upper_limits;
    while (1) {
        uint8_t           i = (b + q) / 2;
        const uint16_t  lhs = (uint16_t)y * upper_limit[i].x;
        const uint16_t  rhs = (uint16_t)x * upper_limit[i].y;
        if (lhs < rhs) {
            if (q == i)
                return i;
            else
                q = i;
        } else
        if (lhs > rhs) {
            if (b == i)
                return i + 1;
            else
                b = i;
        } else {
            return i;
        }
    }
}

int8_t atan2i(int_fast16_t y, int_fast16_t x) {
    if (y < 0) {
        const uint8_t ay = -y;
        if (x < 0) {
            const uint8_t ax = -x;
            if (ay < ax) {
                return -128 + within_octant(ax, ay);
            } else {
                return -64 - within_octant(ay, ax);
            }
        } else {
            const uint8_t ax = +x;
            if (ay < ax) {
                return -within_octant(ax, ay);
            } else {
                return -64 + within_octant(ay, ax);
            }
        }
    } else {
        const int_fast16_t ay = +y;
        if (x < 0) {
            const int_fast16_t ax = -x;
            if (ay < ax) {
                return 128 - within_octant(ax, ay);
            } else {
                return 64 + within_octant(ay, ax);
            }
        } else {
            const int_fast16_t ax = +x;
            if (ay < ax) {
                return within_octant(ax, ay);
            } else {
                return 64 - within_octant(ay, ax);
            }
        }
    }
}
This one is based on finding the point in (0,0)-(255,255) that is closest to but does not exceed the angle that rounds to 0, 1, 2, .., 32; these points have been saved to the upper_limit[] array.
If we have P0(x0,y0) and P1(x1,y1) in the first quadrant (nonnegative coordinates), and x0*y1 > y0*x1, then P0 is at a bigger angle than P0.  If x0*y1 < y0*x1, then P1 is at a bigger angle than P0.  If x0*y1 = y0*x1, then P0 and P1 are on the same line passing through origin.

While this does up to 64 8×8=16-bit multiplications, they only take 2 clocks each on AVR.  avr-gcc 5.4.0 -Os generates 242 bytes of code, and the table takes an additional 64 bytes.

I haven't microbenchmarked any of the atan2i() functions  –– should time all (-200..+200)×(-200..+200) = 160,801 cases –– so I don't really know which ones are "fast".  I do have an ATmega32u4 and an AT90USB1286 somewhere, if I could just find them...
« Last Edit: August 01, 2024, 01:27:28 am by Nominal Animal »
 
The following users thanked this post: SiliconWizard

Offline InfravioletTopic starter

  • Super Contributor
  • ***
  • Posts: 1171
  • Country: gb
Re: Atan2 lookup table, for 256 part circle?
« Reply #10 on: August 05, 2024, 01:20:43 am »
Thanks for the extra options.

Nominal Animal, I'm going to give your new ones a go in the next few days, they look very promising.
 

Offline InfravioletTopic starter

  • Super Contributor
  • ***
  • Posts: 1171
  • Country: gb
Re: Atan2 lookup table, for 256 part circle?
« Reply #11 on: August 18, 2024, 11:51:17 pm »
Nominal Animal, your Reply #6 octant angle version is excellent. I found it accurate to within +/- 1 part for all but 432 out of 160801 tested x,y combinations. Accurate to within +/-2 for these 432 combinations. I added an extra line to ensure the output goes to a default 0 when both inputs are zero, and then the whole thing ran in <18 microseconds on a 16MHz AVR. Absolutely brilliant, thanks.
 
The following users thanked this post: Nominal Animal


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf