### Author Topic: Why is a PC so fast compared to BeagleBoard?  (Read 1729 times)

0 Members and 1 Guest are viewing this topic.

#### hamster_nz

• Super Contributor
• Posts: 2197
• Country:
##### Re: Why is a PC so fast compared to BeagleBoard?
« Reply #25 on: August 10, 2019, 11:15:43 pm »
Revisit your code with the knowledge that sin() and cos() is very expensive on your platform, and are to be avoided as much as possible.

On a hobby DSP project using a lookup table made things 4x faster.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.

#### magic

• Super Contributor
• Posts: 1652
• Country:
##### Re: Why is a PC so fast compared to BeagleBoard?
« Reply #26 on: August 11, 2019, 08:48:04 am »
Another big weakness of most ARM boards in comparison with x86 PC is memory bus width, and perhaps frequency too, both limiting RAM bandwidth.
Not the limiting factor in this case, though.

#### rstofer

• Super Contributor
• Posts: 6944
• Country:
##### Re: Why is a PC so fast compared to BeagleBoard?
« Reply #27 on: August 11, 2019, 04:55:42 pm »
Another possible speedup is in pulling loop invariant code outside the loop.

do i = 1, 100
ohmega(i) = 2.0 * Pi * f(i) // or some such  ** BAD **
end do

Here the 2.0 * Pi is loop invariant and should be pulled out of the loop.  Sometimes the compiler will do it, other times not.  Sometimes it is not obvious what is loop invariant because there is a bunch of code inside the loop.

twoPi = 2.0 * Pi  // ** BETTER **
do i = 1, 100
ohmega(i) = twoPi * f(i)
end do

Lookup tables for trig functions is a good idea.  You can even do a wee bit of interpolation between value if necessary.  You only need the table to cover 0..Pi/2 (90 degrees) so it doesn't necessarily have to take a lot of space.  You can also manipulate the argument for cos() such that it can also use the same table.  You can improve on the idea by not adding 3*Pi/4 but rather just start your index 3/4 the way through the table and deal with wrap-around.  Much faster...

#### ralphrmartin

• Frequent Contributor
• Posts: 312
• Country:
##### Re: Why is a PC so fast compared to BeagleBoard?
« Reply #28 on: August 14, 2019, 06:41:21 pm »
Caches primarily, then effective branch prediction and speculative execution.

Shouldn't be any branch prediction in trig function calculation, or matrix multiplication.

#### coppice

• Super Contributor
• Posts: 5057
• Country:
##### Re: Why is a PC so fast compared to BeagleBoard?
« Reply #29 on: August 14, 2019, 06:55:35 pm »
Another possible speedup is in pulling loop invariant code outside the loop.
Modern compilers are usually pretty good at doing that for themselves, as long as you have optimisation enabled.

#### Manx

• Contributor
• Posts: 28
• Country:
##### Re: Why is a PC so fast compared to BeagleBoard?
« Reply #30 on: August 30, 2019, 09:31:18 am »
Fun fact. I bought myself a new Nucleo board, H743ZI. It's 400 MHz Cortex M7. The same test that took 145 us on 180 MHz Nucleo F4 and 50 us on 1GHz Cortex A8 PocketBeagle, now takes... 45 us.

So without bothering with an operating system, boot up time etc, I have calculation speed the same as on a 2.5 times faster processor. And this happens with standard libraries and generic code, without pondering if I maybe could do some highly optimized custom code for a specific processor. So to me it looks like Cortex A8 is not that good, but Cortex M7 is absolutely great.

As for using lookup tables or other faster method instead of standard trig functions, I don't think it's really an option. The calculations are for kinematics of a 6 DOF robotic arm, and I need precision, especially around singularities. Rather, I'm considering if I should maybe use double precision calculations. That's one of the advantages of M7 over M4. Still, I'm not rushing into it since double precision FPU is slower than the single precision one. Anyways, with the current configuration I think I could easily get 10 000 kinematics calculations (i.e. 10k points in space processed) per second in the final application, which I take as a very good result.

#### NANDBlog

• Super Contributor
• Posts: 4563
• Country:
• Current job: ATEX certified product design
##### Re: Why is a PC so fast compared to BeagleBoard?
« Reply #31 on: August 30, 2019, 10:18:39 am »
I cant believe this is the second page and nobody mentioned it:

Pipeline length
Instructions per cycle

cortex A7

A15
It just calculates more stuff at the same time. Clock is an almost meaningless value alone.

#### ogden

• Super Contributor
• Posts: 2956
• Country:
##### Re: Why is a PC so fast compared to BeagleBoard?
« Reply #32 on: August 30, 2019, 04:42:18 pm »
I cant believe this is the second page and nobody mentioned it:

Pipeline length
Instructions per cycle

Incorrect. In the second reply of this thread I did mention deep pipelines and out of order execution. That obviously means better instructions/cycle ratio:

Intel CPU's are heavily performance-optimized. - Deep pipelines, out of order execution, huge instruction and data cache memories.

#### magic

• Super Contributor
• Posts: 1652
• Country:
##### Re: Why is a PC so fast compared to BeagleBoard?
« Reply #33 on: August 30, 2019, 09:33:44 pm »
Pipelining is evil, its value is enabling higher clock speeds by reducing the time it takes to process one pipeline step.

A 2-stage "decode, execute" pipeline with all the reordering tricks and doubled/tripled execution units would be faster at the same clock speed, thanks to less instructions being flushed on mispredicted branches. But such a thing will never run at the same clock as the deeply pipelined cores.

Pentium 4 was famously a deeply pipelined core, designed to take advantage of multi-GHz clocks that Intel hoped to achieve. It had its ass kicked by the simpler Athlon XP and Intel ditched it in favor of new cores derived from Pentium III.
« Last Edit: August 30, 2019, 09:40:35 pm by magic »

Smf