Author Topic: Realistically implementing trig functions on an embedded systems with FPU (Read 4155 times)

BurnedResistor · « **on:** January 06, 2017, 10:49:24 pm »

Hello all!

I am gonna preface this by saying that I have very little experience with embedded systems, so bear with me, and don't be afraid to tell me I am going down an stupid rabbit hole. I am interested in learning about the theory behind what is to follow but the would lie to implement this in an efficient manner.

I am looking into learning about sensor fusion for a 6dof or maybe later 9dof imu. Before I start digging into that I though I would lay out the essentials, which I am sure I will need down the road (Bad approach? Let me know!). One big essential is going to be trig functions.

Technical details: I am using and stm32f446 with hardware fpu running at max speed.
Code size really is not limited (but I would rather not implement a lookup table, Again, stupid? Good idea? Lemme know!)
I am looking for relatively fast execution as I will try to run the sensor fusion algorithm at as many kHz as possible

( in all seriousness though, I am looking at 1khz maybe 4khz update rate if at all possible, thats what the sensor can go up to! I can of course scale that down a bit if it is unrealistic!)

My first concern is what degree of precision I will need for my trig functions. I realize this is a little hard to say without a sensor fusion algorithm set, but a ballpark would be a good place to start. I however have no idea what is a good starting point. One decimal? 2? 3? 5? 10?

Then comes the question of how this should be implemented. Should I just use the builtin c sin functions? I hear they are quite bloated. Plus I like the idea of this being a bit of a learning opportunity!

So as mentioned before the first option would be a lookup table. I would really only need to store the values corresponding from zero to pi/2 as the rest could be determined from this quite easily by inverting etc etc. And cosine and tangent are just a phase shift. However this seems to be only a good approach on small devices, while doing integer math. I think (correct me if wrong) that with hardware fpu available this is pointless and outdated.

I have done enough research to realize Taylor series are probably not the right approach (correct me!)

So it seems that with an fpu available my best option would be Chebyshev pilynomails/series (?) And the best book on the topic seems to be John Hart's "Computer Approximations".Firstly this brings me back to my point about accuracy as I will have to determine that if I want to follow that book it seems. Secondly I am more than willing to read the book, however I really don't want to waste my time. Is it worth it? Is there a better approach that is not a book?

Again, am I looking at this problem the wrong way completely and everything above is pointless? I wanna hear it all

E: Sorry for the spelling and grammar above! Typed on mobile....

Feynman · « **Reply #1 on:** January 07, 2017, 12:13:17 am »

Since you are using an ARM Cortex-M4, I would start with the CMSIS trig functions and see if they meet your requirements. There are fixed and floating point versions of every function available and they basically use a combination of look-up-table an interpolation.

Quote from: BurnedResistor on January 06, 2017, 10:49:24 pm

I have done enough research to realize Taylor series are probably not the right approach (correct me!)

Well, it depends on your needs. Taylor series are fine. One just has to tune the taylor coefficients a little, since the text-book coefficients for an ideal taylor polynomial cause larger maximum errors. Hart's Computer Approximations is full of such coefficients for almost every imaginable math function.
But from my experience: The CMSIS trig functions are usally faster and more accurate than taylor approximations, if you have a FPU.

nctnico · « **Reply #2 on:** January 07, 2017, 12:27:11 am »

I'd just use the standard C functions and floating point library. No need to re-invent the wheel.
At some point you'll will want to switch from float to integer numbers. There are 'magic numbers' with which you can do the conversion quickly by adding a number and casting the float to an integer.
http://stereopsis.com/FPU.html

mikeselectricstuff · « **Reply #3 on:** January 07, 2017, 08:52:39 am »

Don't forget that floating pint gives you range, not precision. A fixed point value occupying the same number of bits will give you more precision, and may be faster than floating point.
You need you carefully look at the range of all your inputs and intermediate values to see if a fixed-point implementation may be better. It may be that some mix of fixed floating is the optimal solution.
Very few things in thee mbedded world have enough range to actually need floating point.

Of course if your MCU has hardware floating point, it may not be any slower than integer, and could even be faster it it has the ability to do FPU ops in parallel with the CPU's integer ALU.

Best bet is to start by implementing things the simplest & easiest way and see if it's good enough, then look at optimising anything that seems inefficient.
Library functions will generally be efficient on avarage as they are written to be general-purpose, but could potentially be a lot less than optimal for your specific case.

westfw · « **Reply #4 on:** January 07, 2017, 10:02:08 am »

The "small and efficient" software trig libraries I've seen (avr-libc http://svn.savannah.nongnu.org/viewvc/avr-libc/trunk/avr-libc/libm/fplib/ and qfplib for ARM http://www.quinapalus.com/qfplib.html ) all seem to use common iterative code (power series or cordic) for all the trig and math functions, with a relatively small number of terms. (sin in avr-libc calls a routine that does 5 terms of odd exponents.) There's no reason you couldn't use one of these algorithms with HW floating point for the basic multiple/add operations.)

krho · « **Reply #5 on:** January 07, 2017, 03:45:42 pm »

Don't worry. We are running 9DOF @ 100Hz on ATXmega @32MHz + some other functionality. And the code is in C with soft FPU

orolo · « **Reply #6 on:** January 07, 2017, 05:52:09 pm »

Quote from: BurnedResistor on January 06, 2017, 10:49:24 pm

I have done enough research to realize Taylor series are probably not the right approach (correct me!)

So it seems that with an fpu available my best option would be Chebyshev pilynomails/series (?) And the best book on the topic seems to be John Hart's "Computer Approximations".Firstly this brings me back to my point about accuracy as I will have to determine that if I want to follow that book it seems. Secondly I am more than willing to read the book, however I really don't want to waste my time. Is it worth it? Is there a better approach that is not a book?

I would agree with you, Taylor rules for local approximations, but diverges rather fast. Looking at the avr-libc assembly code linked by westfw, they implement the sine function, and use phase shift to compute the cosine. The sine is computed with good old Taylor, p_taylor = x-.1666666664*x^3+0.83333315e-2*x^5-0.1984090e-3*x^7+0.27526e-5*x^9-2.39*10^(-8)*x^11. A quick computation of the error from 0 to Pi (the library uses this for the cosine, so we can't stop at Pi/2) gives the graphic attached below (taylor.png).

On the other hand, an approximation by Legendre polynomials gives the best quadratic fit in an interval. To get simmilar results, I developed sin(x) for Legendre polynomials to order 9 in the interval [-Pi,Pi]. The result was:

w := -.1666325951*x^3+0.83123889e-2*x^5-0.1931628e-3*x^7+0.21733e-5*x^9+.9999845942*x

The absolute error is below, in legendre_9.png. A comparative follows, comparative.png.

So just nine terms of Legendre beat eleven of Taylor (the quadratic error in the interval is two orders of magnitude below for Legendre). I also computed the eleven terms for Legendre, and the error curve is almost indistinguishable from the axis when plotted together with Taylor. Nine terms seemed more instructive.

About Chebishev polynomials, they are weighted by a nonlinear function. I'm not immediatly sure that they will give a numerical advantage here.

BurnedResistor · « **Reply #7 on:** January 07, 2017, 10:17:06 pm »

Quote from: krho on January 07, 2017, 03:45:42 pm

Don't worry. We are running 9DOF @ 100Hz on ATXmega @32MHz + some other functionality. And the code is in C with soft FPU

Great thing about hobby projects? Use a 10buck processor that is totally overpowered? Why not

The board I designed and am now bringing up was really intended as a prototyping board for PID control and sensor fusion etc. etc. so I wanted a lot of headroom. I figured FPU and a 165Mhz processor should be able to toss that

BurnedResistor · « **Reply #8 on:** January 07, 2017, 10:20:57 pm »

Quote from: orolo on January 07, 2017, 05:52:09 pm

Quote from: BurnedResistor on January 06, 2017, 10:49:24 pm
I have done enough research to realize Taylor series are probably not the right approach (correct me!)

So it seems that with an fpu available my best option would be Chebyshev pilynomails/series (?) And the best book on the topic seems to be John Hart's "Computer Approximations".Firstly this brings me back to my point about accuracy as I will have to determine that if I want to follow that book it seems. Secondly I am more than willing to read the book, however I really don't want to waste my time. Is it worth it? Is there a better approach that is not a book?
I would agree with you, Taylor rules for local approximations, but diverges rather fast. Looking at the avr-libc assembly code linked by westfw, they implement the sine function, and use phase shift to compute the cosine. The sine is computed with good old Taylor, p_taylor = x-.1666666664*x^3+0.83333315e-2*x^5-0.1984090e-3*x^7+0.27526e-5*x^9-2.39*10^(-8)*x^11. A quick computation of the error from 0 to Pi (the library uses this for the cosine, so we can't stop at Pi/2) gives the graphic attached below (taylor.png).

On the other hand, an approximation by Legendre polynomials gives the best quadratic fit in an interval. To get simmilar results, I developed sin(x) for Legendre polynomials to order 9 in the interval [-Pi,Pi]. The result was:

w := -.1666325951*x^3+0.83123889e-2*x^5-0.1931628e-3*x^7+0.21733e-5*x^9+.9999845942*x

The absolute error is below, in legendre_9.png. A comparative follows, comparative.png.

So just nine terms of Legendre beat eleven of Taylor (the quadratic error in the interval is two orders of magnitude below for Legendre). I also computed the eleven terms for Legendre, and the error curve is almost indistinguishable from the axis when plotted together with Taylor. Nine terms seemed more instructive.

About Chebishev polynomials, they are weighted by a nonlinear function. I'm not immediatly sure that they will give a numerical advantage here.

Quote from: westfw on January 07, 2017, 10:02:08 am

The "small and efficient" software trig libraries I've seen (avr-libc http://svn.savannah.nongnu.org/viewvc/avr-libc/trunk/avr-libc/libm/fplib/ and qfplib for ARM http://www.quinapalus.com/qfplib.html ) all seem to use common iterative code (power series or cordic) for all the trig and math functions, with a relatively small number of terms. (sin in avr-libc calls a routine that does 5 terms of odd exponents.) There's no reason you couldn't use one of these algorithms with HW floating point for the basic multiple/add operations.)

Quote from: mikeselectricstuff on January 07, 2017, 08:52:39 am

Don't forget that floating pint gives you range, not precision. A fixed point value occupying the same number of bits will give you more precision, and may be faster than floating point.
You need you carefully look at the range of all your inputs and intermediate values to see if a fixed-point implementation may be better. It may be that some mix of fixed floating is the optimal solution.
Very few things in thee mbedded world have enough range to actually need floating point.

Of course if your MCU has hardware floating point, it may not be any slower than integer, and could even be faster it it has the ability to do FPU ops in parallel with the CPU's integer ALU.

Best bet is to start by implementing things the simplest & easiest way and see if it's good enough, then look at optimising anything that seems inefficient.
Library functions will generally be efficient on avarage as they are written to be general-purpose, but could potentially be a lot less than optimal for your specific case.

Quote from: nctnico on January 07, 2017, 12:27:11 am

I'd just use the standard C functions and floating point library. No need to re-invent the wheel.
At some point you'll will want to switch from float to integer numbers. There are 'magic numbers' with which you can do the conversion quickly by adding a number and casting the float to an integer.
http://stereopsis.com/FPU.html

Quote from: Feynman on January 07, 2017, 12:13:17 am

Since you are using an ARM Cortex-M4, I would start with the CMSIS trig functions and see if they meet your requirements. There are fixed and floating point versions of every function available and they basically use a combination of look-up-table an interpolation.

Quote from: BurnedResistor on January 06, 2017, 10:49:24 pm
I have done enough research to realize Taylor series are probably not the right approach (correct me!)
Well, it depends on your needs. Taylor series are fine. One just has to tune the taylor coefficients a little, since the text-book coefficients for an ideal taylor polynomial cause larger maximum errors. Hart's Computer Approximations is full of such coefficients for almost every imaginable math function.
But from my experience: The CMSIS trig functions are usally faster and more accurate than taylor approximations, if you have a FPU.

Thank you all for the very detailed explainations, it has helped a lot and lent itself to many more hours of learning and google rabbit holes

I was not aware that the cmsis implements trig functions so I will probably start with those. I am still however interested in this. I have to write a math paper about a topic of my choosing anyhow, so I think this will be a very interesting topic!


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Realistically implementing trig functions on an embedded systems with FPU (Read 4155 times)

BurnedResistor

Realistically implementing trig functions on an embedded systems with FPU

Feynman

Re: Realistically implementing trig functions on an embedded systems with FPU

nctnico

Re: Realistically implementing trig functions on an embedded systems with FPU

mikeselectricstuff

Re: Realistically implementing trig functions on an embedded systems with FPU

westfw

Re: Realistically implementing trig functions on an embedded systems with FPU

krho

Re: Realistically implementing trig functions on an embedded systems with FPU

orolo

Re: Realistically implementing trig functions on an embedded systems with FPU

BurnedResistor

Re: Realistically implementing trig functions on an embedded systems with FPU

BurnedResistor

Re: Realistically implementing trig functions on an embedded systems with FPU

Share me