Author Topic: FreeRTOS performance penalty  (Read 32234 times)

0 Members and 1 Guest are viewing this topic.

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
FreeRTOS performance penalty
« on: September 17, 2014, 01:41:46 am »
So, I have ported FreeRTOS (8.1.2) to a PIC24F (specifically, PIC24FJ64GA102 running at 8Mhz (=16Mhz crystal). Compiler is C30 3.25, no optimization (full optimization was also tested).

Test: I am comparing two blinkies,

1) no FreeRTOS / naked, flipping PB.0 and measure the frequency of the flip;
2) with FreeRTOS, flipping PB.0 through 10 separate tasks (no messaging between them), and measure the frequency of the flip.

I am taking your guestimates:

1) What's the incremental flash usage?
2) What's the incremental (static) ram usage?
3) What the frequency of the flip, as a percentage vs. that of the naked flip?

Yes, I do have the numbers, :)

BTW, the porting is fairly easy: i had done it before, with a lpc arm chip on FreeRTOS 7.5.
================================
https://dannyelectronics.wordpress.com/
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: FreeRTOS performance penalty
« Reply #1 on: September 17, 2014, 02:02:42 am »
1) 5KB
2) 256 bytes
3) 100%

But I'm just guessing since I know nothing of the naked pic24f nor fritos (I'm again hungry for a snak)
 

Offline mazurov

  • Frequent Contributor
  • **
  • Posts: 524
  • Country: us
Re: FreeRTOS performance penalty
« Reply #2 on: September 17, 2014, 02:13:15 am »
I've ran FreeRTOS on PIC18 once (about 5 years ago, not sure if it's still possible). It is not fun if you need to do anything more complex than LED blinking. PIC24 should be slightly better since pointers are better supported by instruction set and FreeRTOS uses pointers quite heavily.

If you need multitasking framework for small micros take a look at QP -> state-machine.com . The concept is different, basically instead of blocking at will as in RTOS you never block and it can be uncomfortable at first. But - you'll get all the goodies like messages, queues, preemption, etc. and the code overhead is small. PIC24 port runs in ~10K/1K and there is a nano version with 10 times less memory fp. It will run well on any arch which supports function pointers and the source/docs are much higher quality that FreeRTOS. Arduino port for the full version is available which means anything (better than  328p) will be suitable.
With sufficient thrust, pigs fly just fine - RFC1925
 

Offline SirNick

  • Frequent Contributor
  • **
  • Posts: 589
Re: FreeRTOS performance penalty
« Reply #3 on: September 17, 2014, 03:32:28 am »
I don't really see the point in this. ???

You could compile a FreeRTOS demo with a main "task" of while(1) to get the flash and core RAM usage.  I think the docs tell you how much space it takes per task (IIRC, mostly sizeof the descriptor and stack), and per queue.

In terms of efficiency, you can flip I/O pins with a handful of ASM instructions with essentially no overhead beyond initial setup; or with optimized C with a high setup penalty, but around the same amount of code space for the actual loop; or put the pin change in a function and waste time with function call overhead (assuming it's not inlined by the compiler); or use an RTOS and incur a little more overhead by managing the ticks and process accounting.

But what does that actually tell you?  If you have more than one task to perform, there will be overhead of one sort or another, by whatever mechanism you use to determine when to do what.  In most cases, with an RTOS, either the work to be done will swamp the overhead incurred by the scheduler by so much that it becomes irrelevant, or the CPU is idle so much of the time that's still irrelevant.  If the performance penalty is great enough that the difference is significant, you're either nearly out of resources anyway, or the project is implemented poorly.

If you need scheduling, you need scheduling.  What does comparing a bare-metal task with no scheduler tell you, other than the obvious fact that the scheduler requires a few cycles to do its thing?  (Which, again IIRC, is also addressed with some ballpark figures on the FreeRTOS site.)  It's all well and good to characterize these figures for grins, but I suspect this is related to the RTOS thread earlier, where there's some debate on whether you should use one.  Well, for blinky?  You can probably get by without one.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19279
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #4 on: September 17, 2014, 08:14:06 am »
3) What the frequency of the flip, as a percentage vs. that of the naked flip?
That's a poor measure of performance, since the incremental frequency loss is highly dependent on on specific microbenchmark operation.

It is normal (because it is useful and more generic) to specify how long specified operations take, e.g. time to post a message or to do a task switch.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline mrflibble

  • Super Contributor
  • ***
  • Posts: 2051
  • Country: nl
Re: FreeRTOS performance penalty
« Reply #5 on: September 17, 2014, 09:01:35 am »
If you do that test for a progressively large function and then plot it that would indeed be interesting.

So instead of the above test + blinkie, you do the above test with blinkie substituted by block of code. And block of code is progressively 1 OP, 2 OP, 3 OP, .... , 100 OP. Block of NOPs or whatever you like with a thou-shalt-not-optimize flag should do the trick there.
 

Offline Jeroen3

  • Super Contributor
  • ***
  • Posts: 4067
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: FreeRTOS performance penalty
« Reply #6 on: September 17, 2014, 09:05:22 am »
1. a few thousand instructions.
2. kernel and task structs will be limited few hundred bytes or so, overhead in unused context (stack) is much bigger.
3. Depends.
Code: [Select]
while(1){ pinToggle(PBo) }10 times the above isn't a very good test. But 10 times the below might be.
Code: [Select]
while(1){ pinToggle(PBo); taskYield() }
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4196
  • Country: us
Re: FreeRTOS performance penalty
« Reply #7 on: September 17, 2014, 10:10:03 am »
Quote
flipping PB.0 through 10 separate tasks
I don't think I understand what that means.

Quote
It is normal to specify how long specified operations take, e.g. time to post a message or to do a task switch.
That would be pretty difficult to compare to a "naked" system.
I'd suggest something like:
Set up an interrupt; say, a uart receive interrupt.
At the start of the ISR, toggle an output bit.
The "task" is to read the UART data, and toggle a different output.  The time between pin toggles tells you the task response time to the "event."
Presumably the "naked" implementation is busy-looping on a software buffer count becoming non-zero, so its response time is limited by the time the ISR takes to complete (and unwind), followed by whatever time it takes the driver code to pull data out of the buffer.
The RTOS case has all that, plus the time it takes to wake up the task.  (plus more, if it does the uart buffering in a separate task as well. (which IMO would be going too far...))
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #8 on: September 17, 2014, 10:11:08 am »
Some hint, all under gcc:

1) successful port: STM32F103C8, STM32F100RB (tweaked configuration), STM32F051C8
2) unsuccessful port: STM32F030F, STM32F100RB (default configuration)

That tells you a lot as to where the physical requirements are.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #9 on: September 17, 2014, 10:37:38 am »
Code: [Select]
while(1){ pinToggle(PBo); taskYield() }
It is too punitive to the OS and unrealistic.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #10 on: September 17, 2014, 11:10:20 am »
Any more guesses from our RTOS experts?

I will give you more time.
================================
https://dannyelectronics.wordpress.com/
 

Offline Precipice

  • Frequent Contributor
  • **
  • Posts: 403
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #11 on: September 17, 2014, 11:36:59 am »
It's a silly test and not very interesting, so I'm not guessing.

Much like saying the micro is a bad LED flasher when a 555 would do.

It's maybe an interesting indication of how much space a tiny OS takes to minimally initialise on a specific micro, but the timings - meh.
 

Offline macboy

  • Super Contributor
  • ***
  • Posts: 2252
  • Country: ca
Re: FreeRTOS performance penalty
« Reply #12 on: September 17, 2014, 12:13:04 pm »
It's a silly test and not very interesting, so I'm not guessing.
Completely agree
Much like saying the micro is a bad LED flasher when a 555 would do.
Completely disagree.
The 555 timer requires external components including resistor, capacitor. It costs around $1. A little PIC10Fxx costs half that, has fewer pins, and requires zero external components to do the job, not even a current limiting resistor since the I/O pin current is hard limited. Getting an arbitrary blink rate and duty cycle is easier, and will always be more accurate. Overall, it is a better tool for the job of blinky.
 

Offline mrflibble

  • Super Contributor
  • ***
  • Posts: 2051
  • Country: nl
Re: FreeRTOS performance penalty
« Reply #13 on: September 17, 2014, 12:48:43 pm »
Any more guesses from our RTOS experts?

I will give you more time.

Don't wait for it. Rather uninspiring test as is. Make it more useful by adding an actual workload. Measure for a bunch of different workload sizes, measure for a bunch of different thread numbers. For bonus points you could also do mixes of workloads, and make a pretty scatter plot.

Right now this here A-star path finding thingy is more interesting than blink the led.

When I have some more time I'll do that money + mouth business for the above tests in chibios.
 

Offline Precipice

  • Frequent Contributor
  • **
  • Posts: 403
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #14 on: September 17, 2014, 01:02:15 pm »
Completely disagree.
The 555 timer requires external components including resistor, capacitor. It costs around $1. A little PIC10Fxx costs half that, has fewer pins, and requires zero external components to do the job, not even a current limiting resistor since the I/O pin current is hard limited. Getting an arbitrary blink rate and duty cycle is easier, and will always be more accurate. Overall, it is a better tool for the job of blinky.

Yeah. I'm old...

although the last time I used 555s, it was on a project where safety wouldn't allow (or couldn't justify the cost of proving) software. For a blinky, not so much.
 

Offline paulie

  • Frequent Contributor
  • **
  • !
  • Posts: 849
  • Country: us
Re: FreeRTOS performance penalty
« Reply #15 on: September 17, 2014, 01:09:35 pm »
Sometimes we ignore the fact that not everybody sleeps, eats, and showers with MPLAB in their right hand and PICKIT3 in the left. Often having to tack on a couple extra jelly bean parts is the easy way out. Also consider that you can buy a dozen 555 for the cost of one of those small micros.
 

Online mikerj

  • Super Contributor
  • ***
  • Posts: 3233
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #16 on: September 17, 2014, 04:20:37 pm »
Code: [Select]
while(1){ pinToggle(PBo); taskYield() }
It is too punitive to the OS and unrealistic.

This is a pretty pointless thread as you haven't given enough information on what you are trying to actually measure and how you are doing it, let alone that comparing the execution speed of a simple loop with a full blown RTOS is ridiculous.

Are you saying you aren't doing the above because it's unrealistic?  If so, what are you doing in your ten tasks?

What stack size have you allocated to each task?
« Last Edit: September 17, 2014, 04:41:34 pm by mikerj »
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #17 on: September 17, 2014, 08:12:14 pm »
Still no takers?

Come on. We have so many people specializing in RTOS and with so much prior experience that you should get this right.

More, please.
================================
https://dannyelectronics.wordpress.com/
 

Online mikerj

  • Super Contributor
  • ***
  • Posts: 3233
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #18 on: September 17, 2014, 08:32:29 pm »
My answer is:

You probably haven't yet learned to correctly configure or use FreeRTOS, so any results are irrelevant.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19279
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #19 on: September 17, 2014, 08:45:11 pm »
Still no takers?

Come on. We have so many people specializing in RTOS and with so much prior experience that you should get this right.
We have got this right.

It is a silly microbenchmark that wouldn't give any useful information, and we're not going to waste out time on it.

I've given a clue as to what would make a more useful set of measurements that could guide decision making when using the RTOS - but you've chosen to ignore it.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline SirNick

  • Frequent Contributor
  • **
  • Posts: 589
Re: FreeRTOS performance penalty
« Reply #20 on: September 17, 2014, 09:15:33 pm »
Alrighty, why not.  (Edited for clarity.)

How much RAM does FreeRTOS use?
Quote
This depends on your application. Below is a guide based on:

IAR STR71x ARM7 port, full optimisation, minimum configuration, four priorities.

Scheduler Itself: 236 bytes (can easily be reduced by using smaller data types)
Each queue: 76 bytes + queue storage area (see FAQ Why do queues use that much RAM?)
Each task: 64 bytes (includes 4 characters for the task name) + the task stack size.

How much ROM/Flash does FreeRTOS use?
Quote
This depends on your compiler, architecture, and RTOS kernel configuration.

The RTOS kernel itself required about 5 to 10 KBytes of ROM space when using the same configuration [as above]

How much overhead is involved?
Quote
The table below provides an indication [...]  Note that any numbers provided are indicative only [...]

Create application queues, semaphores and mutexes.
On a ARM Cortex-M3 device, using the ARM RVDS compiler with optimization set to 1 (low), creating a queue, semaphore or mutex will take approximately 500 CPU cycles.

Create application tasks.
On a ARM Cortex-M3 device, using the ARM RVDS compiler with optimization set to 1 (low), creating each task will take approximately 1100 CPU cycles.

Start the RTOS scheduler.
The RTOS scheduler is started by calling vTaskStartScheduler(). The start up process includes configuring the tick interrupt, creating the idle task, and then restoring the context of the first task to run.
On a ARM Cortex-M3 device, using the ARM RVDS compiler with optimization set to 1 (low), starting the RTOS scheduler will take approximately 1200 CPU cycles.

Context switch times.
A context switch time of 84 CPU cycles was obtained under the following test conditions:

FreeRTOS ARM Cortex-M3 port for the Keil compiler, stack overflow checking turned off, trace features turned off, compiler set to optimise for speed, "configUSE_PORT_OPTIMISED_TASK_SELECTION" set to '1' in FreeRTOSConfig.h.

Exercise of comparing relative merits of instruction sets, registers, compilers, etc., is left to everyone/anyone else.
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #21 on: September 17, 2014, 10:15:25 pm »
Anymore guesses?

Drawing closes tomorrow morning so hurry up!

:)
================================
https://dannyelectronics.wordpress.com/
 

Offline Kjelt

  • Super Contributor
  • ***
  • Posts: 6459
  • Country: nl
Re: FreeRTOS performance penalty
« Reply #22 on: September 18, 2014, 07:49:30 am »
3) What the frequency of the flip, as a percentage vs. that of the naked flip?
What is the frequency of the OS timertick you used?
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #23 on: September 18, 2014, 11:01:56 am »
Quite a few posters the numbers right, some closer than others.

So here are the numbers:

Quote
1) What's the incremental flash usage?

As you would expect, the precise number depends on compiler setting, kernel configuration and chips used.

On PIC24F, it goes from 5K to 8Kb (instructions only). On CM3, it goes from 4.5KB to 6KB.

The size of total compiled flash space, with a reasonably sized stack, goes from 10KB to 30KB, however.

Quote
2) What's the incremental (static) ram usage?

This varies greatly. Under the most basic heap management strategy (=no release of ram space from terminated tasks), the smallest is a few hundred bytes + heaps, to a few thousand KB - most of it in the heaps you configure for the tasks.

Quote
3) What the frequency of the flip, as a percentage vs. that of the naked flip?

On a 8Mhz PIC24F, the frequency of naked flip is about 400Khz. Under FreeRTOS, it is about 394Khz -> high 98%. That number dips as you slow down the mcu, to about low 98%. So the switching takes about 80 - 120 instructions per switch. That's roughly in the ballpark of figures I have seen: 150 - 300 instructions, shorter for 32-bit chips and longer for 8-bit chips.

Now, great caution in using that number:

1) the particular test minimizes the cost of task switching. Each task in the test utilizes its alloted slot to the fullest extend, thus minimizing the cost of switching.

On the other extreme - as proposed by one of the posters earlier - you can run a couple instructions and then yield that task -> under this approach, the mcu spends its time mostly switching between tasks, the pin flip frequency is considerably lower, reflecting the significant cost of switching.

I would argue that the reality is closer to my test in that most of the time, the tasks fully utilizes its time slot, than lasting just a couple instructions.

2) the kernel is configured to minimize over head, for example, by disabling stack overflow checks, etc. In a real life application, you are likely to have turned them on.

Overall, I think a (reasonable) minimum spec for a mcu running FreeRTOS would be like 16KB flash + 8KB ram, 4MIPS. At that point, your ability to add tasks and take advantage of a rtos is very limited -> as there isn't much resources left.

A practical "minimum" might be 24 - 32KB flash and 10KB+ram, 8MIPS. At that point, the switching cost is not noticeable, and there is sufficient space, flash / ram, to implement a few reasonable tasks.

In conclusion, I am positively surprised how little processing power FreeRTOS took away from the chip, and find it quite helpful if you have a few low priority tasks with disparate or particularly long execution time (fft, lcd vs. buttons for example). With an rtos, you don't have to break them up into pieces and it is automagically done for you.

However, it is tricky to use interrupts and to "share" peripherals in an RTOS environment. And it represents another layer that a programer has to deal with.

Hope it helps.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #24 on: September 18, 2014, 11:02:38 am »
Quote
What is the frequency of the OS timertick you used?

I used the default 1000hz/1ms tick.
================================
https://dannyelectronics.wordpress.com/
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf