Author Topic: ATmeage32U4 floating point calculations per second?  (Read 4644 times)

0 Members and 1 Guest are viewing this topic.

Offline 3dgeoTopic starter

  • Frequent Contributor
  • **
  • Posts: 289
  • Country: au
ATmeage32U4 floating point calculations per second?
« on: February 19, 2019, 03:46:19 pm »
Hey,

I want to dynamically calculate animation for 105 RGB LEDs with ATmega32U4 according to LED position on the board and was wondering how many (approximately) floating point calculations per second ATmega32U4 can do? It's probably a bit unusual question, but I never had to worry about computing power before. I should expect double, triple or quadruple digit calculation per second?

Thanks.
« Last Edit: February 20, 2019, 11:34:00 am by 3dgeo »
 

Offline nsrmagazin

  • Regular Contributor
  • *
  • !
  • Posts: 156
  • Country: ru
Re: ATmeage32U4 floating point calculations per second?
« Reply #1 on: February 20, 2019, 11:59:00 am »
You have to check the datasheet or ask the manufacturer.
Hi all!
If you like the post, please press "thanks".
 

Offline nick_d

  • Regular Contributor
  • *
  • Posts: 120
Re: ATmeage32U4 floating point calculations per second?
« Reply #2 on: February 20, 2019, 12:07:43 pm »
From what I understand, this is a standard AVR part, similar to an ATmega8 but with more memory and/or pins?

If so, it cannot do floating point calculations, except by emulation in software.

All is not lost though. For what you want to do, fixed point calculations are probably all you need.

I'm pretty sure the ATmega series includes an 8x8 bit multiplier (this wasn't present in the earlier AVRs such as AT90S4433). The multiplier will make intensity/brightness calculations easier.

On the other hand, if you're still in the design stage and you want easy floating point calculations, what about changing to an ARM? Something like, the 1bitsy or Teensy?

I could help you more with this, but it would first be useful to get an idea where you're at with computer and/or binary arithmetic in general?

cheers, Nick

 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8240
Re: ATmeage32U4 floating point calculations per second?
« Reply #3 on: February 20, 2019, 12:22:05 pm »
Do you need floating point? This sounds like an application where simple fixed-point maths will do.
 

Offline 3dgeoTopic starter

  • Frequent Contributor
  • **
  • Posts: 289
  • Country: au
Re: ATmeage32U4 floating point calculations per second?
« Reply #4 on: February 20, 2019, 01:45:01 pm »
ATmega32U4 is MCU used in LEONARDO, I will need accurate calculations. I can probably multiply everything by 1000 and use integers. I didn't do  so complex math before with MCU.

My idea is for example to pick a point and "draw" increasing over time circle, calculate feather amount on circle perimeter and convert that math to LED pixels RGB color according to LEDs position on a PCB. This example should look like rain drops ripple in a puddle, this is only one idea I have in my mind...
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 3233
  • Country: gb
Re: ATmeage32U4 floating point calculations per second?
« Reply #5 on: February 20, 2019, 02:18:37 pm »
You certainly don't need floating point math to achieve that.  Unless you need to keep the internal representation of numbers in powers of 10 for some reason, it's much more efficient to scale by a power of two (e.g. 1024 rather than 1000) since multiplication and division can easily be performed with shifts.
 
The following users thanked this post: NiHaoMike

Offline 3dgeoTopic starter

  • Frequent Contributor
  • **
  • Posts: 289
  • Country: au
Re: ATmeage32U4 floating point calculations per second?
« Reply #6 on: February 20, 2019, 03:32:29 pm »
multiplication and division can easily be performed with shifts.

Shifts? You mean bitmath? Please provide a bit more info.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6172
  • Country: fi
    • My home page and email address
Re: ATmeage32U4 floating point calculations per second?
« Reply #7 on: February 20, 2019, 06:53:39 pm »
On a microcontroller without hardware floating-point support, you are better off using fixed-point arithmetic with some number of bits dedicated for the fraction.

Simply put, instead of using a floating point value, you use an integer multiplied by 2B where B is the number of fractional bits.



The common notation for a fixed point type with n fractional bits is Qn.

You can convert R from floating-point to Qn using round(R * (1 << n)).  For example, 0.5 in Q4 is 0.5×24 = 8.  In Q8, 0.5×28 = 128.

To convert a nonnegative integer K to fixed-point F with n fractional bits, use F = K << n .  Inversely, K = F >> n.
(If you use GCC, you can use that for all integers; the C standard leaves negative integer shifting to implementations, so that may not work with all C compilers.)

Alternatively, you can calculate Q = 2n by hand, and then use F = K*Q and K = F/Q, respectively.  If Q is a power of two, and you tell your C compiler to optimize (I always use -O2 with GCC), it will implement the division as a bit shift.

Note that when calculating K, you can apply rounding by adding/subtracting Q/2 to/from F before the division.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4196
  • Country: us
Re: ATmeage32U4 floating point calculations per second?
« Reply #8 on: February 20, 2019, 07:10:02 pm »
https://www.nongnu.org/avr-libc/user-manual/benchmarks.html


The avr-libc software float code does basic math in about 150 to 500 cycles, and trig functions in about 3000 cycles.   That's more than 5000 ops/s, using code that fits pretty easily in the 28k available on your Leonardo.Go ahead and try it out.


A float divide is somewhat faster than a 32bit integer divide, btw...
(and I guess it follows that a float multiply by a floating point fraction is significantly faster than an 32bit integer divide...)
« Last Edit: February 20, 2019, 11:08:27 pm by westfw »
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8110
  • Country: fi
Re: ATmeage32U4 floating point calculations per second?
« Reply #9 on: February 21, 2019, 09:39:23 am »
Note that the AVR instruction set doesn't have a "shift by n places" instruction, only "shift by 1". If you shift by 5, compiler uses five such instructions. Therefore, shifting by 8 is often more efficient than, say, by 5, the compiler can now access bytes differently.
 

Offline Rerouter

  • Super Contributor
  • ***
  • Posts: 4694
  • Country: au
  • Question Everything... Except This Statement
Re: ATmeage32U4 floating point calculations per second?
« Reply #10 on: February 21, 2019, 09:47:19 am »
If you give us an idea on what kind of floating point math you needed for the animations, we may be able to help reduce the math,
 

Offline 3dgeoTopic starter

  • Frequent Contributor
  • **
  • Posts: 289
  • Country: au
Re: ATmeage32U4 floating point calculations per second?
« Reply #11 on: February 21, 2019, 11:11:12 am »
Wow, thank You all for information.
Sadly, I'm not that advanced to understand most of it  :-[

Ok, I'll give more specific example.
I don't know if it's the right way to do it, but for now it's the only way I can think of:
Let say I have a LED matrix and want to create puddle ripple animation.
First I pick origin point of the LED matrix and create a data table with each LED distance from origin.
Next – picking random point in matrix surface area and create a circle which increases over time.
Around that circle will be "glow" area.
Next step is to calculate each LED glow value according to time, circle position radius and circle perimeter "glow" value.

I hope I wrote it clearly enough. My LED matrix isn't grid matrix (LEDs are positioned randomly over the surface) thats why I want to use this approach.
As I said I didn't do something like this before, I can do math to calculate and map all of this even using only integers, but I don't know is this the right way to do it.  ^-^
 

Offline Rerouter

  • Super Contributor
  • ***
  • Posts: 4694
  • Country: au
  • Question Everything... Except This Statement
Re: ATmeage32U4 floating point calculations per second?
« Reply #12 on: February 21, 2019, 11:39:29 am »
guessing this is for a RGB Keyboard backlighting, means its not a fixed grid, but you can atleast break the math into groups with some offset to reduce the math.

So lets begin by simplifying the output, you have 105 RGB Leds, lets say 8 bit per colour per led. Ok, a 305 byte array for the whole array, you have 2.5KB or RAM so not too bad so far,

Next we have groupings, keys that are all the same size, just with a slight offset from the previous row, you only need to store the offset per group, with a few special cases e.g. the enter key if its an L shape.

Ok, so a few hundred bytes has the output map, (or frame buffer), and the offset table for each row of keys more or less, now comes the math,

Now comes the math, for a ripple, you are expanding a circle, at a certain distance you begin increasing brightness until you match the circles radius, then decrease to a point. So you just need to check your distance from your start point to that key, if its within the animation range, you set its brightness,

Ok so how to approach this, you just need to calculate the distance between 2 points, now you would immediatly think pythagorus, but we don't actually need the square root, we know the radius of your circle and can just square it when you compare your numbers,

ok, so starting point X5, Y2 on our grid, we run through all 105 keys, and save there distance to that point. then begin the animation,
you find which keys are in the animation range, and update there brightness based on how close they are to the ideal distance, and iterate this going forward for the animation. should be a fairly low cost method that only updates the LED's when they are actually being effected by the animation.

Edit: with the method I gave, you may not even need a frame buffer, but it comes down to what other things you want it to do
« Last Edit: February 21, 2019, 11:42:19 am by Rerouter »
 
The following users thanked this post: 3dgeo

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8240
Re: ATmeage32U4 floating point calculations per second?
« Reply #13 on: February 21, 2019, 12:24:14 pm »
Wow, thank You all for information.
Sadly, I'm not that advanced to understand most of it  :-[

Ok, I'll give more specific example.
I don't know if it's the right way to do it, but for now it's the only way I can think of:
Let say I have a LED matrix and want to create puddle ripple animation.
First I pick origin point of the LED matrix and create a data table with each LED distance from origin.
Next – picking random point in matrix surface area and create a circle which increases over time.
Around that circle will be "glow" area.
Next step is to calculate each LED glow value according to time, circle position radius and circle perimeter "glow" value.

I hope I wrote it clearly enough. My LED matrix isn't grid matrix (LEDs are positioned randomly over the surface) thats why I want to use this approach.
As I said I didn't do something like this before, I can do math to calculate and map all of this even using only integers, but I don't know is this the right way to do it.  ^-^
Precalculate as much as you can. You definitely don't need FP math for this, maybe not even fixed-point.

Here's a sample of what can be done with a similar MCU to what you have: https://www.linusakesson.net/scene/craft/
 

Offline nick_d

  • Regular Contributor
  • *
  • Posts: 120
Re: ATmeage32U4 floating point calculations per second?
« Reply #14 on: February 21, 2019, 02:00:02 pm »
If the centre position of the puddle is fixed for all time then just measure the distance of each LED from this known point and enter this into a read-only distance table. if not then you'll have to recalculate the distance table each time the centre position changes.

To calculate the distance table (as opposed to measuring it) you will need to know the x,y position of each LED and the centre position say x',y'. The distance is then sqrt((x-x')^2+(y-y')^2). If you are writing in C and use signed int or signed char for the coordinates then all is easy except for the square root.

The easiest way for square root from say a 16 bit integer to an 8 bit integer would be a lookup table, however since you don't want a 64 kbyte lookup table, range reduction techniques can be used. Each time you divide the original number by 4 to get it into range, the square root from the lookup table must be multiplied by 2 to compensate. Other possible methods are Newton's or bisection. It does not matter if this is slow, if the puddle centre changes infrequently.

Next you need to generate the ripple. Say the distance from the centre is r. Then a formula for the brightness might be 127.5 + 127.5 sin(ar + bt) where t is time and a, b indicate how many ripples you want per distance unit and how fast they should ripple outwards respectively. This would give a brightness 0 to 255. Again calculate ar + bt and use a lookup table for 127.5 + 127.5 sin x.Here the x should be scaled so that 256 units rather than 2 pi radians is a full cycle. Then range reduction is achieved by "% 256" or equivalently ”& 0xff" in C code.

cheers, Nick
 
The following users thanked this post: 3dgeo

Offline 3dgeoTopic starter

  • Frequent Contributor
  • **
  • Posts: 289
  • Country: au
Re: ATmeage32U4 floating point calculations per second?
« Reply #15 on: February 21, 2019, 02:04:56 pm »
guessing this is for a RGB Keyboard backlighting, means its not a fixed grid, but you can atleast break the math into groups with some offset to reduce the math.

So lets begin by simplifying the output, you have 105 RGB Leds, lets say 8 bit per colour per led. Ok, a 305 byte array for the whole array, you have 2.5KB or RAM so not too bad so far,

Next we have groupings, keys that are all the same size, just with a slight offset from the previous row, you only need to store the offset per group, with a few special cases e.g. the enter key if its an L shape.

Ok, so a few hundred bytes has the output map, (or frame buffer), and the offset table for each row of keys more or less, now comes the math,

Now comes the math, for a ripple, you are expanding a circle, at a certain distance you begin increasing brightness until you match the circles radius, then decrease to a point. So you just need to check your distance from your start point to that key, if its within the animation range, you set its brightness,

Ok so how to approach this, you just need to calculate the distance between 2 points, now you would immediatly think pythagorus, but we don't actually need the square root, we know the radius of your circle and can just square it when you compare your numbers,

ok, so starting point X5, Y2 on our grid, we run through all 105 keys, and save there distance to that point. then begin the animation,
you find which keys are in the animation range, and update there brightness based on how close they are to the ideal distance, and iterate this going forward for the animation. should be a fairly low cost method that only updates the LED's when they are actually being effected by the animation.

Edit: with the method I gave, you may not even need a frame buffer, but it comes down to what other things you want it to do

Yes, You get the idea. I will need 7 frame buffers (8 bit per color, 15 RGB LEDs per group, total 105 RGB LEDs), only one frame buffer will be lit at one time. My plan is to calculate one frame buffer (15RGB leds) and wait till it's time to display them, then calculate next frame buffer and repeat... Waiting is for constant "FPS", I think 60 "FPS" should be enough. If goal is 60 "FPS" that means 420 frame buffer calculations – 2,38 milliseconds per frame buffer (15 RGB LEDs) calculation, actual LED color set and other stuff that isn't so CPU demanding.
Is it possible? I'm kinda surprised myself that I have no idea if it is or not  ;D
P.S. it's a keyboard, but it has very little similarities with regular keyboard.
 

Offline 3dgeoTopic starter

  • Frequent Contributor
  • **
  • Posts: 289
  • Country: au
Re: ATmeage32U4 floating point calculations per second?
« Reply #16 on: February 21, 2019, 02:39:57 pm »
If the centre position of the puddle is fixed for all time then just measure the distance of each LED from this known point and enter this into a read-only distance table. if not then you'll have to recalculate the distance table each time the centre position changes.

To calculate the distance table (as opposed to measuring it) you will need to know the x,y position of each LED and the centre position say x',y'. The distance is then sqrt((x-x')^2+(y-y')^2). If you are writing in C and use signed int or signed char for the coordinates then all is easy except for the square root.

The easiest way for square root from say a 16 bit integer to an 8 bit integer would be a lookup table, however since you don't want a 64 kbyte lookup table, range reduction techniques can be used. Each time you divide the original number by 4 to get it into range, the square root from the lookup table must be multiplied by 2 to compensate. Other possible methods are Newton's or bisection. It does not matter if this is slow, if the puddle centre changes infrequently.

Next you need to generate the ripple. Say the distance from the centre is r. Then a formula for the brightness might be 127.5 + 127.5 sin(ar + bt) where t is time and a, b indicate how many ripples you want per distance unit and how fast they should ripple outwards respectively. This would give a brightness 0 to 255. Again calculate ar + bt and use a lookup table for 127.5 + 127.5 sin x.Here the x should be scaled so that 256 units rather than 2 pi radians is a full cycle. Then range reduction is achieved by "% 256" or equivalently ”& 0xff" in C code.

cheers, Nick

So calculating root will be most challenging thing?
My idea was to compare LEDs distance to a ripple circle origin – if distance is equal to a circle radius then LED brightness is 100%, if distance is less or more but still in a circles "glow" area reduce LED intensity accordingly.
It's not the only animation style I have in mind, don't let this animation example get all the attention. I can do math, question is can 32U4 handle it? If so can it handle it fast enough?  ^-^




« Last Edit: February 21, 2019, 02:50:49 pm by 3dgeo »
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3137
  • Country: ca
Re: ATmeage32U4 floating point calculations per second?
« Reply #17 on: February 21, 2019, 03:02:37 pm »
So calculating root will be most challenging thing?
My idea was to compare LEDs distance to a ripple circle origin – if distance is equal to a circle radius then LED brightness is 100%, if distance is less or more but still in a circles "glow" area reduce LED intensity accordingly.

You don't need to calculate square root for this. Use the square of the distance instead and compare it to the square of the radius.
 

Offline RES

  • Regular Contributor
  • *
  • Posts: 109
  • Country: 00
Re: ATmeage32U4 floating point calculations per second?
« Reply #18 on: February 21, 2019, 03:11:12 pm »
You can reduce overhead by calculating 1/4th of each circle by using copy, mirror and flip. You can reduce the size of the RGB LED chips matrix by 4 if you are only displaying circle patterns -> 4 LED chips parallel per quadrant of the circle. 304/4 -> LED chip matrix of 19*4*4 LEDs (76 by 4 parallel branches)
If you should apply charlieplexing you need 18 i/o's -> 18*(18-1)=306 LEDs.(2 extra LEDs) But smaller on-time's per LED (then row/column muxing a matrix), then use, or very bright RGB LEDs direct on the i/o's, or 18 transistor push-pull drivers. On-time divided by 256 for brightness control (using an 8-bit log table the LED intensity will be linear to the eye)

Offline 3dgeoTopic starter

  • Frequent Contributor
  • **
  • Posts: 289
  • Country: au
Re: ATmeage32U4 floating point calculations per second?
« Reply #19 on: February 21, 2019, 03:37:20 pm »
You don't need to calculate square root for this. Use the square of the distance instead and compare it to the square of the radius.

Probably yes, I can't see why this shouldn't work.

You can reduce overhead by calculating 1/4th of each circle by using copy, mirror and flip. You can reduce the size of the RGB LED chips matrix by 4 if you are only displaying circle patterns -> 4 LED chips parallel per quadrant of the circle. 304/4 -> LED chip matrix of 19*4*4 LEDs (76 by 4 parallel branches)
If you should apply charlieplexing you need 18 i/o's -> 18*(18-1)=306 LEDs.(2 extra LEDs) But smaller on-time's per LED (then row/column muxing a matrix), then use, or very bright RGB LEDs direct on the i/o's, or 18 transistor push-pull drivers. On-time divided by 256 for brightness control (using an 8-bit log table the LED intensity will be linear to the eye)

Actually I don't need to calculate circle at all, I just need to compare distances and set brightness accordingly as described at my last post.
Again, this is not about making this animation style to work, it's about MCU horse power to handle it  ;D
Hardware is mostly sorted out: 3 * tlc59116 and 7 Pfets. I already have working prototype, now it's coding time...  ^-^
 

Offline edavid

  • Super Contributor
  • ***
  • Posts: 3381
  • Country: us
Re: ATmega32U4 floating point calculations per second?
« Reply #20 on: February 21, 2019, 05:54:41 pm »
To answer the original question, this page says:

Quote
Floating point arithmetic on the AVR Mega series is fairly fast in GCC if the library libm.a in linked in. An IEEE floating point multiply takes about 125 cycles and an IEEE floating add takes 75 to 275 cycles.

So, a 16MHz ATmega32U4 can do about 100KFLOPS.

P.S. OP, would you please fix the title of this thread?
« Last Edit: February 21, 2019, 07:24:25 pm by edavid »
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3137
  • Country: ca
Re: ATmeage32U4 floating point calculations per second?
« Reply #21 on: February 21, 2019, 06:48:39 pm »
I just need to compare distances and set brightness accordingly as described at my last post.
Again, this is not about making this animation style to work, it's about MCU horse power to handle it  ;D

Your coordinates will most likely fit into 8-bit integers. So you need 3 8-bit additions/subtractions and 2 8-bit squarings. Squarings may be replaced by a table lookup. Thus the load is really small.

Of course, if you do the same with floating point doubles, and also add square-rooting and logarithm calculations, it is several hundred times greater load on the CPU.
 

Offline Rerouter

  • Super Contributor
  • ***
  • Posts: 4694
  • Country: au
  • Question Everything... Except This Statement
Re: ATmeage32U4 floating point calculations per second?
« Reply #22 on: February 21, 2019, 07:54:06 pm »
Ok as for CPU load, if your running the chip at 16MHz you have roughly 16 million instructions per second, or 266000 per 60Hz frame,

now you just need to decimate the math down to only taking less than that
so we have our starting point, we now work through all the squares of the key distances, as this is done before the animation begins, it can have a whole frame to itself,

Now onto the animation. you have the radius, so you add the width of the animation and get the min and max radius for the animation, from that point its a for loop that compares the key distance is within the min and max,

Enter another chunk of code that then updates the frame buffers brightness for that key based on the distance between the main radiuss and the min or max depending on whether its under or over, if your using a lookup table or some math for the brightness curve, just avoid divide and square root
 
The following users thanked this post: 3dgeo

Offline 3dgeoTopic starter

  • Frequent Contributor
  • **
  • Posts: 289
  • Country: au
Re: ATmeage32U4 floating point calculations per second?
« Reply #23 on: February 22, 2019, 03:30:06 am »
Thanks all for help, will have working code soon.
Tho I encountered first hardware problem – Pfets I used is too slow and slight flicker is visible, most people probably wouldn't noticed, but I do.  It's not due to a code, I will try to inspect with scope tomorrow.
Also my first idea was by using port manipulation update fets –  tun fets off, update LED driver ICs with new colors and turn fets on again, but this method reduces brightness way too much. I prefer slight color contamination by switching fets before setting colors (have to find out if TLC59116 has latching method after sending all colors to IC). And I probably will need to create my own library for TLCs...
 

Offline nick_d

  • Regular Contributor
  • *
  • Posts: 120
Re: ATmeage32U4 floating point calculations per second?
« Reply #24 on: February 22, 2019, 05:15:55 am »
The problem with the PFETs might be not so much the switching speed but rather the gate capacitance. If your micro cannot source a high current from an output pin for a brief time to charge or discharge the gate, then switching speed will be compromised.

High power, low on-resistance FETs have more gate capacitance. Thus paradoxically the solution might be to swap out the FETs for a less capable part. I have used a 2N7002 with success for this sort of thing in the past, it's an N-type that can only switch 300mA but the gate is easy to drive.

You can also buffer the gate with smaller FETs (but you need an N-type and a P-type for each gate) or as an integrated solution you can use something like 74AC04 or 74AC573.

By the way, make sure you are using logic-level FETs unless you have 10V gate drive available. This can be a trick as many popular high power, low on-resistance FETs are not logic level.

cheers, Nick
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf