Author Topic: FreeRTOS performance penalty  (Read 25717 times)

0 Members and 1 Guest are viewing this topic.

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
FreeRTOS performance penalty
« on: September 17, 2014, 01:41:46 am »
So, I have ported FreeRTOS (8.1.2) to a PIC24F (specifically, PIC24FJ64GA102 running at 8Mhz (=16Mhz crystal). Compiler is C30 3.25, no optimization (full optimization was also tested).

Test: I am comparing two blinkies,

1) no FreeRTOS / naked, flipping PB.0 and measure the frequency of the flip;
2) with FreeRTOS, flipping PB.0 through 10 separate tasks (no messaging between them), and measure the frequency of the flip.

I am taking your guestimates:

1) What's the incremental flash usage?
2) What's the incremental (static) ram usage?
3) What the frequency of the flip, as a percentage vs. that of the naked flip?

Yes, I do have the numbers, :)

BTW, the porting is fairly easy: i had done it before, with a lpc arm chip on FreeRTOS 7.5.
================================
https://dannyelectronics.wordpress.com/
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Re: FreeRTOS performance penalty
« Reply #1 on: September 17, 2014, 02:02:42 am »
1) 5KB
2) 256 bytes
3) 100%

But I'm just guessing since I know nothing of the naked pic24f nor fritos (I'm again hungry for a snak)
 

Offline mazurov

  • Frequent Contributor
  • **
  • Posts: 448
  • Country: us
Re: FreeRTOS performance penalty
« Reply #2 on: September 17, 2014, 02:13:15 am »
I've ran FreeRTOS on PIC18 once (about 5 years ago, not sure if it's still possible). It is not fun if you need to do anything more complex than LED blinking. PIC24 should be slightly better since pointers are better supported by instruction set and FreeRTOS uses pointers quite heavily.

If you need multitasking framework for small micros take a look at QP -> state-machine.com . The concept is different, basically instead of blocking at will as in RTOS you never block and it can be uncomfortable at first. But - you'll get all the goodies like messages, queues, preemption, etc. and the code overhead is small. PIC24 port runs in ~10K/1K and there is a nano version with 10 times less memory fp. It will run well on any arch which supports function pointers and the source/docs are much higher quality that FreeRTOS. Arduino port for the full version is available which means anything (better than  328p) will be suitable.
 

Offline SirNick

  • Frequent Contributor
  • **
  • Posts: 589
Re: FreeRTOS performance penalty
« Reply #3 on: September 17, 2014, 03:32:28 am »
I don't really see the point in this. ???

You could compile a FreeRTOS demo with a main "task" of while(1) to get the flash and core RAM usage.  I think the docs tell you how much space it takes per task (IIRC, mostly sizeof the descriptor and stack), and per queue.

In terms of efficiency, you can flip I/O pins with a handful of ASM instructions with essentially no overhead beyond initial setup; or with optimized C with a high setup penalty, but around the same amount of code space for the actual loop; or put the pin change in a function and waste time with function call overhead (assuming it's not inlined by the compiler); or use an RTOS and incur a little more overhead by managing the ticks and process accounting.

But what does that actually tell you?  If you have more than one task to perform, there will be overhead of one sort or another, by whatever mechanism you use to determine when to do what.  In most cases, with an RTOS, either the work to be done will swamp the overhead incurred by the scheduler by so much that it becomes irrelevant, or the CPU is idle so much of the time that's still irrelevant.  If the performance penalty is great enough that the difference is significant, you're either nearly out of resources anyway, or the project is implemented poorly.

If you need scheduling, you need scheduling.  What does comparing a bare-metal task with no scheduler tell you, other than the obvious fact that the scheduler requires a few cycles to do its thing?  (Which, again IIRC, is also addressed with some ballpark figures on the FreeRTOS site.)  It's all well and good to characterize these figures for grins, but I suspect this is related to the RTOS thread earlier, where there's some debate on whether you should use one.  Well, for blinky?  You can probably get by without one.
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 10269
  • Country: gb
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #4 on: September 17, 2014, 08:14:06 am »
3) What the frequency of the flip, as a percentage vs. that of the naked flip?
That's a poor measure of performance, since the incremental frequency loss is highly dependent on on specific microbenchmark operation.

It is normal (because it is useful and more generic) to specify how long specified operations take, e.g. time to post a message or to do a task switch.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline mrflibble

  • Super Contributor
  • ***
  • Posts: 1947
  • Country: nl
Re: FreeRTOS performance penalty
« Reply #5 on: September 17, 2014, 09:01:35 am »
If you do that test for a progressively large function and then plot it that would indeed be interesting.

So instead of the above test + blinkie, you do the above test with blinkie substituted by block of code. And block of code is progressively 1 OP, 2 OP, 3 OP, .... , 100 OP. Block of NOPs or whatever you like with a thou-shalt-not-optimize flag should do the trick there.
 

Online Jeroen3

  • Super Contributor
  • ***
  • Posts: 3333
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: FreeRTOS performance penalty
« Reply #6 on: September 17, 2014, 09:05:22 am »
1. a few thousand instructions.
2. kernel and task structs will be limited few hundred bytes or so, overhead in unused context (stack) is much bigger.
3. Depends.
Code: [Select]
while(1){ pinToggle(PBo) }10 times the above isn't a very good test. But 10 times the below might be.
Code: [Select]
while(1){ pinToggle(PBo); taskYield() }
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 3075
  • Country: us
Re: FreeRTOS performance penalty
« Reply #7 on: September 17, 2014, 10:10:03 am »
Quote
flipping PB.0 through 10 separate tasks
I don't think I understand what that means.

Quote
It is normal to specify how long specified operations take, e.g. time to post a message or to do a task switch.
That would be pretty difficult to compare to a "naked" system.
I'd suggest something like:
Set up an interrupt; say, a uart receive interrupt.
At the start of the ISR, toggle an output bit.
The "task" is to read the UART data, and toggle a different output.  The time between pin toggles tells you the task response time to the "event."
Presumably the "naked" implementation is busy-looping on a software buffer count becoming non-zero, so its response time is limited by the time the ISR takes to complete (and unwind), followed by whatever time it takes the driver code to pull data out of the buffer.
The RTOS case has all that, plus the time it takes to wake up the task.  (plus more, if it does the uart buffering in a separate task as well. (which IMO would be going too far...))
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #8 on: September 17, 2014, 10:11:08 am »
Some hint, all under gcc:

1) successful port: STM32F103C8, STM32F100RB (tweaked configuration), STM32F051C8
2) unsuccessful port: STM32F030F, STM32F100RB (default configuration)

That tells you a lot as to where the physical requirements are.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #9 on: September 17, 2014, 10:37:38 am »
Code: [Select]
while(1){ pinToggle(PBo); taskYield() }
It is too punitive to the OS and unrealistic.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #10 on: September 17, 2014, 11:10:20 am »
Any more guesses from our RTOS experts?

I will give you more time.
================================
https://dannyelectronics.wordpress.com/
 

Offline Precipice

  • Frequent Contributor
  • **
  • Posts: 403
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #11 on: September 17, 2014, 11:36:59 am »
It's a silly test and not very interesting, so I'm not guessing.

Much like saying the micro is a bad LED flasher when a 555 would do.

It's maybe an interesting indication of how much space a tiny OS takes to minimally initialise on a specific micro, but the timings - meh.
 

Offline macboy

  • Super Contributor
  • ***
  • Posts: 1981
  • Country: ca
Re: FreeRTOS performance penalty
« Reply #12 on: September 17, 2014, 12:13:04 pm »
It's a silly test and not very interesting, so I'm not guessing.
Completely agree
Much like saying the micro is a bad LED flasher when a 555 would do.
Completely disagree.
The 555 timer requires external components including resistor, capacitor. It costs around $1. A little PIC10Fxx costs half that, has fewer pins, and requires zero external components to do the job, not even a current limiting resistor since the I/O pin current is hard limited. Getting an arbitrary blink rate and duty cycle is easier, and will always be more accurate. Overall, it is a better tool for the job of blinky.
 

Offline mrflibble

  • Super Contributor
  • ***
  • Posts: 1947
  • Country: nl
Re: FreeRTOS performance penalty
« Reply #13 on: September 17, 2014, 12:48:43 pm »
Any more guesses from our RTOS experts?

I will give you more time.

Don't wait for it. Rather uninspiring test as is. Make it more useful by adding an actual workload. Measure for a bunch of different workload sizes, measure for a bunch of different thread numbers. For bonus points you could also do mixes of workloads, and make a pretty scatter plot.

Right now this here A-star path finding thingy is more interesting than blink the led.

When I have some more time I'll do that money + mouth business for the above tests in chibios.
 

Offline Precipice

  • Frequent Contributor
  • **
  • Posts: 403
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #14 on: September 17, 2014, 01:02:15 pm »
Completely disagree.
The 555 timer requires external components including resistor, capacitor. It costs around $1. A little PIC10Fxx costs half that, has fewer pins, and requires zero external components to do the job, not even a current limiting resistor since the I/O pin current is hard limited. Getting an arbitrary blink rate and duty cycle is easier, and will always be more accurate. Overall, it is a better tool for the job of blinky.

Yeah. I'm old...

although the last time I used 555s, it was on a project where safety wouldn't allow (or couldn't justify the cost of proving) software. For a blinky, not so much.
 

Offline paulie

  • Frequent Contributor
  • **
  • Banned!
  • Posts: 849
  • Country: us
Re: FreeRTOS performance penalty
« Reply #15 on: September 17, 2014, 01:09:35 pm »
Sometimes we ignore the fact that not everybody sleeps, eats, and showers with MPLAB in their right hand and PICKIT3 in the left. Often having to tack on a couple extra jelly bean parts is the easy way out. Also consider that you can buy a dozen 555 for the cost of one of those small micros.
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 2178
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #16 on: September 17, 2014, 04:20:37 pm »
Code: [Select]
while(1){ pinToggle(PBo); taskYield() }
It is too punitive to the OS and unrealistic.

This is a pretty pointless thread as you haven't given enough information on what you are trying to actually measure and how you are doing it, let alone that comparing the execution speed of a simple loop with a full blown RTOS is ridiculous.

Are you saying you aren't doing the above because it's unrealistic?  If so, what are you doing in your ten tasks?

What stack size have you allocated to each task?
« Last Edit: September 17, 2014, 04:41:34 pm by mikerj »
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #17 on: September 17, 2014, 08:12:14 pm »
Still no takers?

Come on. We have so many people specializing in RTOS and with so much prior experience that you should get this right.

More, please.
================================
https://dannyelectronics.wordpress.com/
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 2178
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #18 on: September 17, 2014, 08:32:29 pm »
My answer is:

You probably haven't yet learned to correctly configure or use FreeRTOS, so any results are irrelevant.
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 10269
  • Country: gb
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #19 on: September 17, 2014, 08:45:11 pm »
Still no takers?

Come on. We have so many people specializing in RTOS and with so much prior experience that you should get this right.
We have got this right.

It is a silly microbenchmark that wouldn't give any useful information, and we're not going to waste out time on it.

I've given a clue as to what would make a more useful set of measurements that could guide decision making when using the RTOS - but you've chosen to ignore it.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline SirNick

  • Frequent Contributor
  • **
  • Posts: 589
Re: FreeRTOS performance penalty
« Reply #20 on: September 17, 2014, 09:15:33 pm »
Alrighty, why not.  (Edited for clarity.)

How much RAM does FreeRTOS use?
Quote
This depends on your application. Below is a guide based on:

IAR STR71x ARM7 port, full optimisation, minimum configuration, four priorities.

Scheduler Itself: 236 bytes (can easily be reduced by using smaller data types)
Each queue: 76 bytes + queue storage area (see FAQ Why do queues use that much RAM?)
Each task: 64 bytes (includes 4 characters for the task name) + the task stack size.

How much ROM/Flash does FreeRTOS use?
Quote
This depends on your compiler, architecture, and RTOS kernel configuration.

The RTOS kernel itself required about 5 to 10 KBytes of ROM space when using the same configuration [as above]

How much overhead is involved?
Quote
The table below provides an indication [...]  Note that any numbers provided are indicative only [...]

Create application queues, semaphores and mutexes.
On a ARM Cortex-M3 device, using the ARM RVDS compiler with optimization set to 1 (low), creating a queue, semaphore or mutex will take approximately 500 CPU cycles.

Create application tasks.
On a ARM Cortex-M3 device, using the ARM RVDS compiler with optimization set to 1 (low), creating each task will take approximately 1100 CPU cycles.

Start the RTOS scheduler.
The RTOS scheduler is started by calling vTaskStartScheduler(). The start up process includes configuring the tick interrupt, creating the idle task, and then restoring the context of the first task to run.
On a ARM Cortex-M3 device, using the ARM RVDS compiler with optimization set to 1 (low), starting the RTOS scheduler will take approximately 1200 CPU cycles.

Context switch times.
A context switch time of 84 CPU cycles was obtained under the following test conditions:

FreeRTOS ARM Cortex-M3 port for the Keil compiler, stack overflow checking turned off, trace features turned off, compiler set to optimise for speed, "configUSE_PORT_OPTIMISED_TASK_SELECTION" set to '1' in FreeRTOSConfig.h.

Exercise of comparing relative merits of instruction sets, registers, compilers, etc., is left to everyone/anyone else.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #21 on: September 17, 2014, 10:15:25 pm »
Anymore guesses?

Drawing closes tomorrow morning so hurry up!

:)
================================
https://dannyelectronics.wordpress.com/
 

Online Kjelt

  • Super Contributor
  • ***
  • Posts: 5738
  • Country: nl
Re: FreeRTOS performance penalty
« Reply #22 on: September 18, 2014, 07:49:30 am »
3) What the frequency of the flip, as a percentage vs. that of the naked flip?
What is the frequency of the OS timertick you used?
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #23 on: September 18, 2014, 11:01:56 am »
Quite a few posters the numbers right, some closer than others.

So here are the numbers:

Quote
1) What's the incremental flash usage?

As you would expect, the precise number depends on compiler setting, kernel configuration and chips used.

On PIC24F, it goes from 5K to 8Kb (instructions only). On CM3, it goes from 4.5KB to 6KB.

The size of total compiled flash space, with a reasonably sized stack, goes from 10KB to 30KB, however.

Quote
2) What's the incremental (static) ram usage?

This varies greatly. Under the most basic heap management strategy (=no release of ram space from terminated tasks), the smallest is a few hundred bytes + heaps, to a few thousand KB - most of it in the heaps you configure for the tasks.

Quote
3) What the frequency of the flip, as a percentage vs. that of the naked flip?

On a 8Mhz PIC24F, the frequency of naked flip is about 400Khz. Under FreeRTOS, it is about 394Khz -> high 98%. That number dips as you slow down the mcu, to about low 98%. So the switching takes about 80 - 120 instructions per switch. That's roughly in the ballpark of figures I have seen: 150 - 300 instructions, shorter for 32-bit chips and longer for 8-bit chips.

Now, great caution in using that number:

1) the particular test minimizes the cost of task switching. Each task in the test utilizes its alloted slot to the fullest extend, thus minimizing the cost of switching.

On the other extreme - as proposed by one of the posters earlier - you can run a couple instructions and then yield that task -> under this approach, the mcu spends its time mostly switching between tasks, the pin flip frequency is considerably lower, reflecting the significant cost of switching.

I would argue that the reality is closer to my test in that most of the time, the tasks fully utilizes its time slot, than lasting just a couple instructions.

2) the kernel is configured to minimize over head, for example, by disabling stack overflow checks, etc. In a real life application, you are likely to have turned them on.

Overall, I think a (reasonable) minimum spec for a mcu running FreeRTOS would be like 16KB flash + 8KB ram, 4MIPS. At that point, your ability to add tasks and take advantage of a rtos is very limited -> as there isn't much resources left.

A practical "minimum" might be 24 - 32KB flash and 10KB+ram, 8MIPS. At that point, the switching cost is not noticeable, and there is sufficient space, flash / ram, to implement a few reasonable tasks.

In conclusion, I am positively surprised how little processing power FreeRTOS took away from the chip, and find it quite helpful if you have a few low priority tasks with disparate or particularly long execution time (fft, lcd vs. buttons for example). With an rtos, you don't have to break them up into pieces and it is automagically done for you.

However, it is tricky to use interrupts and to "share" peripherals in an RTOS environment. And it represents another layer that a programer has to deal with.

Hope it helps.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #24 on: September 18, 2014, 11:02:38 am »
Quote
What is the frequency of the OS timertick you used?

I used the default 1000hz/1ms tick.
================================
https://dannyelectronics.wordpress.com/
 

Online Kjelt

  • Super Contributor
  • ***
  • Posts: 5738
  • Country: nl
Re: FreeRTOS performance penalty
« Reply #25 on: September 18, 2014, 12:40:21 pm »
However, it is tricky to ............."share" peripherals in an RTOS environment.
why is this more difficult than in a superloop? You can have both tasks using the same peripheral using a semaphore thus blocking when "in use" and directly starting when "free". While in a superloop you also have to take care of this with your own global flag and the next task (worst case) only gets started at the next round of the loop. Or do you mean something else?
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #26 on: September 18, 2014, 03:40:09 pm »
I was more thinking about managing the requests via a queue and dealing with the overflow / underflow of that queue.

As to applyign (Free)RTOS to a low-spec mcu (like an 8-bitter). If the context switching takes away 300 instructions per switch, and you switch at 1ms intervals. That along would be 1000 * 300 instructions = 0.3MIPS alone on switching.

You probably need 3MIPS+ to make it less noticeable -> thus my 4MIPS figure provided earlier.

Obviously, you can greatly improve the efficiency by switching less frequently, like every 10ms. However, that may not be fast enough for some "real time" applications.

Fortunately, most modern MCUs run faster than 4MIPS. It is flash / sram space that is more constraining.
================================
https://dannyelectronics.wordpress.com/
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 10269
  • Country: gb
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #27 on: September 18, 2014, 04:13:24 pm »
However, it is tricky to ............."share" peripherals in an RTOS environment.
why is this more difficult than in a superloop? You can have both tasks using the same peripheral using a semaphore thus blocking when "in use" and directly starting when "free". While in a superloop you also have to take care of this with your own global flag and the next task (worst case) only gets started at the next round of the loop. Or do you mean something else?
For any small-scale simple microbenchmark or microapplication, a "superloop" is probably the best thing. I've used them myself.

Once the microapplication grows organically over time to become a milliapplication, then the supervisory logic tends to grow like topsy. Controlling that mess often requires something equivalent to a very simple RTOS, so the "designer" reinvents the wheel; unfortunately it is usually an elliptical wheel.

This is the embedded equivalent of "any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp."
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Re: FreeRTOS performance penalty
« Reply #28 on: September 18, 2014, 10:38:17 pm »
Quite a few posters the numbers right, some closer than others.

So here are the numbers:

Quote
1) What's the incremental flash usage?

As you would expect, the precise number depends on compiler setting, kernel configuration and chips used.

On PIC24F, it goes from 5K to 8Kb (instructions only). On CM3, it goes from 4.5KB to 6KB.

The size of total compiled flash space, with a reasonably sized stack, goes from 10KB to 30KB, however.

Quote
2) What's the incremental (static) ram usage?

This varies greatly. Under the most basic heap management strategy (=no release of ram space from terminated tasks), the smallest is a few hundred bytes + heaps, to a few thousand KB - most of it in the heaps you configure for the tasks.

Quote
3) What the frequency of the flip, as a percentage vs. that of the naked flip?

On a 8Mhz PIC24F, the frequency of naked flip is about 400Khz. Under FreeRTOS, it is about 394Khz -> high 98%. That number dips as you slow down the mcu, to about low 98%. So the switching takes about 80 - 120 instructions per switch. That's roughly in the ballpark of figures I have seen: 150 - 300 instructions, shorter for 32-bit chips and longer for 8-bit chips.

1) 5KB
2) 256 bytes
3) 100%

But I'm just guessing since I know nothing of the naked pic24f nor fritos (I'm again hungry for a snak)

What did I win? A bag of Fritos?
I'm still craving them.

 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #29 on: September 18, 2014, 10:53:06 pm »
Did a little more testing on a STM32F100RB running at 24Mhz. Standard peripheral library 3.5 used. 7 tasks running, each flipping the same led.

GCC compiler, under different flags.

Code: [Select]
Optimization O0 O1 O2 O3 O3
Freq_rtos 846.8 1320 1321 1321 1321
Freq_naked 856.9 1333 1333 1333 1333
 Ticks on switch 283 234 216 216 216
Code size 7912 5112 5016 5316 3984


Observations:
1) switching cost pretty high, for a 32-bit chip.
2) Lots of potential for code reduction.
3) Not much to be gained beyond O1.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #30 on: September 18, 2014, 10:55:17 pm »
Note: Freq_rtos is the frequency of led flipping under FreeRTOS; Freq_naked is the frequency of led flipping without FreeRTOS, all in KHz.
================================
https://dannyelectronics.wordpress.com/
 

Offline RichardBarry

  • Contributor
  • Posts: 6
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #31 on: September 19, 2014, 08:38:29 am »
I can't help but feel this thread is missing the whole point of an RTOS.  It is always advisable to choose the right tool for the right job - and flipping an LED on a PIC24 is not the right place for an RTOS.  In a semi complex system, especially with multiple communication interfaces, the performance penalty of an RTOS is *negative*, that is, you will get *much* more power out of your CPU and be able to include *much* more functionality by using an RTOS.

Why?  Because when you don't use an RTOS you will have to poll all your interfaces.  Polling consumes CPU time for no purpose.  You can of course sit in a loop waiting for interrupts, but the same applies, and you get horrible inter-dependencies between different pieces of functionality.  When you use an RTOS you can be completely event driven.  CPU time is only used when there is actually something to do, and spare time is spent in the idle task.  You can use the idle task as an automatic way of minimising power consumption - or use what was idle time to add additional functionality (or just use a smaller cheaper chip).

There are also lots of other reasons, the most important of which are related to maintainability - but really there are sooooooo many discussions of this on the interweb.

Also note when using a GCC based compiler, and some others, most of the code size actually comes from the libraries, not FreeRTOS.

http://www.freertos.org/FAQWhat.html#WhyUseRTOS
+ http://www.FreeRTOS.org + http://www.FreeRTOS.org/plus
The de facto standard, downloaded every 4.2 minutes during 2015.
IoT, Trace, Certification, TCP/IP, FAT FS, Training, and more...
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #32 on: September 19, 2014, 09:17:55 am »
The last readings were done with a 5 digit frequency meter. So some rounding on the last digit may impact the measurements.

Now, I put a 8-digit frequency meter to get down to 1Hz resolution.

gcc vs. mdk, FreeRTOS 8.1.2., the same configuration file + test file used. The chip is the same - a 24Mhz STM32F100RB.

Code: [Select]
GCC O0 O1 O2 O3 Os
 Freq_rtos 847 1,321 1,322 1,322 1,322
 Freq_naked 857 1,333 1,333 1,333 1,333
Efficiency 98.8% 99.1% 99.1% 99.2% 99.1%
 Ticks on switch 283 225 206 204 212
Size 7912 5112 5016 5316 3984




Keil MDK O0 O1 O2 O3 O3(time)
 Freq_rtos 918 1,327 1,327 1,327 1,327
 Freq_naked 926 1,338 1,338 1,338 1,338
Efficiency 99.1% 99.2% 99.2% 99.2% 99.2%
 Ticks on switch 219 202 195 195 194
Size 3988 3304 3188 3188 3408

Quick observations:

1) the ticks spent on context switch is fairly consistent, in the low 200 instructions. On the high end of the numbers I have seen, as published by the vendors.

A side note, Keil published a 187-tick for context switching on a LPC1768, as a maximum figure for RTX.

2) gcc held its own well, in terms of speed. Minimum difference between the two at various compiler settings.

3) Keil does a better job at producing smaller code.

I will try to see what numbers I can get out of RTX/CMSIS-OS, when I get more time.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #33 on: September 19, 2014, 09:29:26 am »
Quote
flipping an LED on a PIC24 is not the right place for an RTOS.

I would argue otherwise.

On flipping an led: The goal here is to see how much time is wasted in switching between tasks.

You have two extremes here:
1) each task takes its alloted time fully, so switching cost is minimum. Flipping an led here simulates that situation. From the mcu's and OS's point of view, it doesn't matter if it is flipping an led, doing some math, or idling around, its processing power is going somewhere. Flipping an led here provides a convenient way to measure the processing power dedicated to running those jobs -> the led isn't flipped when the mcu is busy switching tasks.

2) each task takes minimum time and the mcu spends more of its switching between jobs. Polling for buttons would fall into this category. Switching cost is maximized here -> ie., this is the least efficient way for the mcu.

Of the two, I would argue that reality is closer to 1) than 2).

As to PIC24F, the right chip to run RTOS is the chip that the programmer decides to use for a given task. It may not be the best chip to run a given RTOS. It may not be the best chip from which one can infer the RTOS's performance on other chips.

However, as the tests so far have shown, the pattern of performance carries nicely from the PIC24F test to the STM32F100 test: two vastly different chips, almost identical performance - one is in the mid-high 98% and another in the low 99%.

The PIC24F did not exhibit an ***identical*** performance to the CM3; However, it did exhibit a ***indicative / comparable *** performance to the CM3.

Hope it helps.
================================
https://dannyelectronics.wordpress.com/
 

Offline Precipice

  • Frequent Contributor
  • **
  • Posts: 403
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #34 on: September 19, 2014, 09:42:52 am »
Of the two, I would argue that reality is closer to 1) than 2).

Hmm, unconvinced. Micros I come in contact with tend to spend (wild guess) less than 1% of their time doing stuff. Often far, far less. On my desk it a motor controller that runs hard for 10 seconds at startup, then tends to idle for a month. Of course, if I polled for button presses rather than sleeping and waiting for an edge interrupt, it would be the other way round.
The busiest micros I think I deal with are decoding video, and even then, they've usually got a lot of slack (>50%) because most frames are easier than the hardest frames that the CPU has to be sized to handle.

 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #35 on: September 19, 2014, 10:07:06 am »
FreeRTOS vs. RTX:

Now is a comparison between FreeRTOS and RTX. The same hardware (a 24Mhz STM32F100RB is used), the same toolchain (Keil MDK, same compiler settings). The only difference here is the RTOS used, and the configuration - largely comparable but not identical, because the way they are set-up.

Code: [Select]
Keil MDK / FreeRTOS O0 O1 O2 O3 O3(time)
 Freq_rtos 918 1,327 1,327 1,327 1,327
 Freq_naked 926 1,338 1,338 1,338 1,338
Efficiency 99.1% 99.2% 99.2% 99.2% 99.2%
 Ticks on switch 219 202 195 195 194
Size 3988 3304 3188 3188 3408




Keil MDK / RTX O0 O1 O2 O3 O3(time)
 Freq_rtos 918 1,326 1,326 1,326 1,326
 Freq_naked 926 1,338 1,338 1,338 1,338
Efficiency 99.1% 99.1% 99.1% 99.1% 99.1%
 Ticks on switch 217 218 219 217 220
Size 4756 4456 4424 4428 4428

Quick observations:

1) the performance is largely comparable. Both are low 99% efficient, and flip the led at roughly the same frequency.
2) FreeRTOS has slightly lower switching cost, and slightly smaller footprint.
3) both are comparably simple to setup and to use.

It is a toss-up. FreeRTOS offers better portability across toolchains / chips. RTX is tied to Keil's offerings, but better support, at a monetary cost.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #36 on: September 19, 2014, 10:58:42 am »
Quote
Keil published a 187-tick for context switching on a LPC1768, as a maximum figure for RTX.

I didn't test a lpc1768 but the figures for STM32F1 (~220 switching cost) is roughly comparable to the 187 official figure.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #37 on: September 19, 2014, 11:26:49 am »
CoOS vs. FreeRTOS:

CoIDE has its own rtos, CoOS. Fairly easy to use (1-click away from inclusion into your project).

Identical project (aside from the rtoses used), identical compiler / flags.

Now, the numbers:

Code: [Select]
GCC / CoOS O0 O1 O2 O3 Os
 Freq_rtos 847 1,321 1,321 1,322 1,322
 Freq_naked 857 1,333 1,333 1,333 1,333
Efficiency 98.8% 99.1% 99.1% 99.2% 99.1%
 Ticks on switch 287 220 220 202 206
Size 11564 7228 7256 7564 5432




GCC / FreeRTOS O0 O1 O2 O3 Os
 Freq_rtos 847 1,321 1,322 1,322 1,322
 Freq_naked 857 1,333 1,333 1,333 1,333
Efficiency 98.8% 99.1% 99.1% 99.2% 99.1%
 Ticks on switch 283 225 206 204 212
Size 7912 5112 5016 5316 3984

Quick observations:

1) practically identical performance.
2) CoOS takes considerably more space: the comparison isn't exactly fair there for CoOS - it uses fixed ram space for stacks for each individual tasks.

Difficult to justify using CoOS because of the limited support for chips and cross-toolchains, vs. what it offers over FreeRTOS.


================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #38 on: September 19, 2014, 11:30:49 am »
Quote
CPU time is only used when there is actually something to do, and spare time is spent in the idle task. 

That to me is a spin: the "spare time" is cpu time too. It is either spent on doing something (aka tasks) or doing nothing (aka idle task in RTOS or looping around in a naked environment), identical to what the cpu is doing without an RTOS.

Quote
You can use the idle task as an automatic way of minimising power consumption - or use what was idle time to add additional functionality (or just use a smaller cheaper chip).

That can be easily and in my view more simply implemented in a non-RTOS environment.

And in a non-RTOS environment, it is much easier to go down to a much smaller / lower spec'd chip.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #39 on: September 19, 2014, 11:35:33 am »
On switching cost:

CooCox quoted a 1.5us/72Mhz number. That translates into 108 ticks, vs. ~200 ticks measured.

FreeRTOS quoted a 84 ticks number, vs. ~200 ticks measured.

Keil quoted a 1.6us/72Mhz number (115 ticks), vs. ~200 ticks measured.

Somehow, the 200-tick figure is pretty good.
================================
https://dannyelectronics.wordpress.com/
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 10269
  • Country: gb
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #40 on: September 19, 2014, 11:59:37 am »
Quote
You can use the idle task as an automatic way of minimising power consumption - or use what was idle time to add additional functionality (or just use a smaller cheaper chip).

That can be easily and in my view more simply implemented in a non-RTOS environment.
That depends on your application, its architectural patterns, and the way in which it has been implemented. The point can be argued either way.

Of course, if you said "I could implement it more easily on my systems", I wouldn't argue.

Quote
And in a non-RTOS environment, it is much easier to go down to a much smaller / lower spec'd chip.
Only if either the chip was grossly overspecified or if the application was so poorly implemented that it spent too much of its time "inside" the RTOS.

Don't be too dogmatic!
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #41 on: September 19, 2014, 12:11:47 pm »
Quote
What did I win?

You get to feel good about your getting it right.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #42 on: September 19, 2014, 12:36:11 pm »
Got CoOS to work on STM32F030F.

Practically no ram left, :)

But it does blink a pin merrily.
================================
https://dannyelectronics.wordpress.com/
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 2178
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #43 on: September 20, 2014, 09:58:15 am »
Are you simply toggling one of the ten LED as fast as it will go in each task during it's allocated 1ms time slice, i.e. so each LED only toggles for 1ms at a time? Does your non-RTOS code do exactly the same thing?

If so then comparing the toggle frequency is a little pointless; it should be obvious they will either be the same, or at least can be made the same with some optimisation. The actual loop doing the toggling shouldn't need to differ unless you are adding extra functionality and whilst a task is executing in it's time slice the RTOS takes no CPU overhead (provided the task doesn't call an RTOS API function).

Aside from memory overhead, the task switching is really the only relevant performance parameter here  i.e. how much time is lost whist no LEDs are toggling.  If this was an important parameter then using an RTOS for such a trivial application would be daft.

Why are you completely avoiding replying to any questions or criticism regarding the implementation or relevance of this test?
« Last Edit: September 20, 2014, 10:01:53 am by mikerj »
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #44 on: September 20, 2014, 11:36:25 am »
"How about protothreads?"

In the original discussion that started me thinking about RTOS overhead, I mentioned that depending on your definition, an OS can be as simple as a switch/case statement.

That's precisely what protothreads is: a set of switch/case statement. A basic task in protothreads basically looks like this:

Code: [Select]
task0:
  while (1) {
    if (exit condition is met) return;
    do_something;
  }

Because of this, it has a few interesting characteristics that made the comparison here difficult (unfair to protothreads):

1) it is immensely portable: any C compiler that supports macros and switch/case would be good to go for protothread;
2) it has practically zero flash / ram footprint: one switch case and some tests is all there is.

Two issues with protothread:
1) if exit_condition isn't met, the execution moves to the real task and there is no mechanism to "interrupt", or "switch away" from that task. If you have a long task, you will not exit it until its execution has ended. For an application with lots of disparately long/short tasks, that's bad.
2) if you have a very short task, then the exit_condition is frequently tested and the mcu spends more of its time, percentage-wise, determining if exit condition has been met.

That (a very short task) is unfortunately where we are. Blinking an led / flipping a pin doesn't take much time. So for each flip, you have to test the exit condition. That's time ***wasted***.
    *** unlike in a real OS where time is wasted switching context / jobs, protothreads would have very low cost switching in between tasks -> just the overhead of existing the previous task and calling the next one. The waste is generated within each thread testing the exit condition.

To conclude, if you are to run the same test comparing protothreads vs. other OSs mentioned earlier, you would expect that protothreads to have low memory footprint, but a big performance hit -> entirely due to how this test is set up.

When I get sometime, I will see if I can set it up on a chip and run some numbers.
================================
https://dannyelectronics.wordpress.com/
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 10269
  • Country: gb
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #45 on: September 20, 2014, 03:14:56 pm »
Why are you completely avoiding replying to any questions or criticism regarding the implementation or relevance of this test?
Quite. That's why I stopped actively contributing to this thread - I felt the conversation was unindirectional, and the phrase "there's none so deaf as thems won't hear" sprang to mind.

Maybe I'm too pessimistic, but we'll see.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Re: FreeRTOS performance penalty
« Reply #46 on: September 20, 2014, 05:25:06 pm »
RTOS has it's place for very complex systems that need to be scheduled to meet certain time constrains. OS-9 was the best out there even better than VxWorks or eCos, but it seems they been fading out on the last decade even if they did support arm processors.

Anyhow, when you have to do many decoupled tasks then it's when an RTOS will save you greatly on development time. Sure you can do a custom program that will be better performance wise, but the more complex the system the time to develop it will increase exponentially and debugging it will take too much time and resources.

Unless you do your own scheduler etc but then you will be running an RTOS.

The real purpose of an RTOS is time to market and development cost.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #47 on: September 20, 2014, 06:16:40 pm »
As promised, here is a test between protosthreads and CoOS, on the ghetto board (STM32F030F running at 24Mhz). Compiler is gcc under CoIDE.

First, CoOS:

Code: [Select]
GCC / CoOS O0 O1 O2 O3 Os
 Freq_rtos 789 1,320 1,319 1,319 1,485
 Freq_naked 800 1,332 1,331 1,331 1,498
Efficiency 98.6% 99.1% 99.1% 99.1% 99.1%
 Ticks on switch 324 204 215 211 215
Size 7868 4920 4980 5340 4476

Generally in line with the numbers I had for STM32F100RB.

Now, the numbers for protothread:

Code: [Select]
GCC / Protothreads O0 O1 O2 O3 Os
 Freq_rtos 352 705 877 878 1,027
 Freq_naked 800 1,332 1,331 1,331 1,498
Efficiency 44.0% 52.9% 65.9% 66.0% 68.5%
 Ticks on switch 13,437 11,295 8,185 8,169 7,556
Size 3084 1796 1928 1912 1636

A few things:
1) the libraries used are identical;
2) the user tasks are "largely" identical - 6 tasks blinking the same led. Because the way protothread is configured, you have to set the conditions on each run so the mcu is not blinking the led as fast as it could and you can see that in the efficiency measurements.
3) The foot print of protothread is minimum, as we had expected. The difference is approximately the size of CoOS.
4) Because of structural differences, "ticks on switch" measurements make no sense for protothread. The more meaningful measurements are efficiency: how much time the mcu is actually doing your task, vs. running the OS: switching context in the case of CoOS or testing exit conditions in the case of protothreads.

In the end, I think a lightweight "OS" like protothreads has value on small devices where your tasks are quite similar in execution time. If that's indeed the case, writing your own scheduler or just sequentializing the tasks isn't a bad idea.

For larger chips, a real OS is likely to be more useful.
================================
https://dannyelectronics.wordpress.com/
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 2178
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #48 on: September 20, 2014, 11:23:37 pm »

In the end, I think a lightweight "OS" like protothreads has value on small devices where your tasks are quite similar in execution time. If that's indeed the case, writing your own scheduler or just sequentializing the tasks isn't a bad idea.

Protothreads are simply a way of implementing state machines via the C pre-processor.  State machines are most useful when you spend a reasonable amount of time in any particular state if any significant processing is required e.g. the 1ms or so that a conventional RTOS might use as a time slice.  If you were only toggling an LED once per state then the overhead would be quite large.

Hand crafted state machines with proper enumerated states will be more efficient than Protothreads in many situations because the state labels can be made consecutive (permitting easy implementation into a small jump table) and will also be numerically small in the majority of cases, permitting the state value to fit into an 8 bit integer which can be a very useful saving in time and memory on on 8 bit micros where you would typically consider a state machine design.  The code may not look as tidy as a Protothreads implementation however.

Obviously I don't expect any kind of response, but maybe this will help someone.
 

Offline gxti

  • Frequent Contributor
  • **
  • Posts: 507
  • Country: us
Re: FreeRTOS performance penalty
« Reply #49 on: September 21, 2014, 01:12:11 am »
The main benefit to CoOS is that it is permissively licensed -- you can embed it into a proprietary application without worrying about license compliance. FreeRTOS and ChibiOS use a modified GPL license that allows you to link against it, but if you modify them then you have to open-source your modifications. And either way you still need the appropriate legal boilerplate in order to comply with the license.

That said, CoOS is pretty crappy. It mostly gets the job done but the code is stringy, poorly commented, and had at least one crippling race condition bug with semaphores that they claim to have fixed but I'm not entirely sure. I switched to it from chibios due to the license, but now I'm thinking about switching back because I'm hitting some strange scheduler-related bug that may or may not be the OS's fault, and nobody wants to spend time hunting down OS bugs.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #50 on: September 21, 2014, 10:19:54 am »
Quote
nobody wants to spend time hunting down OS bugs.

That's a plus for a commercial OS.
================================
https://dannyelectronics.wordpress.com/
 

Offline gmb42

  • Regular Contributor
  • *
  • Posts: 174
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #51 on: September 21, 2014, 10:25:54 am »
Test: I am comparing two blinkies,

1) no FreeRTOS / naked, flipping PB.0 and measure the frequency of the flip;
2) with FreeRTOS, flipping PB.0 through 10 separate tasks (no messaging between them), and measure the frequency of the flip.


As others have mentioned, the nature of the test is unclear, leading to difficulties in trying to replicate the experiment.  Posting code (or pseudo-code) would help.

Are the ten tasks being effectively round-robined to flip the pin on and off at each task cycle or do they just invert the current pin state?

Segger have produced a methodology (here) for measuring context switch times that may be of relevance.

Another point when benchmarking software, is permission to do so.  See Clause 2 in the FreeRTOS licence which implies (to me) that the user must obtain permission to publish the results seen in this thread.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #52 on: September 21, 2014, 11:07:33 am »
The basic principle is quite simple: when any of the tasks is running, it is flipping an led - how it flips does not really matter, as long as it does flip the pins; during the context switch, the mcu is not running any of the tasks so the pin is not being flipped.

Thus, if you count the number of pin flips during a given period of time (aka measuring its frequency), with and with an OS, you get to measure how much time is wasted during context switch.

Say that the pin is flipped 100K times / second without an OS; and 99K times / second with an OS. You know that you are missing 1k pulses, or 1% of a second spent in context switches, or 10ms per 1 second period.

Since your tasks run on 1ms time slices, you have switched 1000 times in 1 second. Thus your time in each context switch is 10ms / 1000 = 10us.

Pretty trivial.

Quote
Segger have produced a methodology

The two approaches are identical in that they all rely on the fact that the output does not change during a context switch. In my case, the output is not being flipped during a context switch; in Segger's case, the output remains the same (low) during a context switch.

Segger's approach requires a fast scope and for very fast switches its measurement precision may be limited by the scope's timing resolution. Mine requires simple measurement of frequencies.
================================
https://dannyelectronics.wordpress.com/
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 10269
  • Country: gb
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #53 on: September 21, 2014, 11:58:20 am »
Thus, if you count the number of pin flips during a given period of time (aka measuring its frequency), with and with an OS, you get to measure how much time is wasted during context switch.

Segger's approach requires a fast scope and for very fast switches its measurement precision may be limited by the scope's timing resolution. Mine requires simple measurement of frequencies.
Finding context switch time is a useful measurement. Your way of doing it is unnecessarily bizarre and quite probably the results are obscured by effects that you haven't considered. For a start consider the effects of caches.

Your comment about scope's time resolution is unlikely to be valid unless either it is an extremely fast processor or you have an extremely slow scope. How fast is your scope?

The scope technique has many useful virtues and can be used for other measurements, e.g. interrupt latency. Your technique has many severe limitations and no overwhelming advantages.

Fine to invent a new technique, but not noting its limitations and why (you think) it is beneficial is very rude: it unnecessarily wastes other people's time.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online Kjelt

  • Super Contributor
  • ***
  • Posts: 5738
  • Country: nl
Re: FreeRTOS performance penalty
« Reply #54 on: September 21, 2014, 12:46:08 pm »
Say that the pin is flipped 100K times / second without an OS; and 99K times / second with an OS. You know that you are missing 1k pulses, or 1% of a second spent in context switches, or 10ms per 1 second period.
That only upholds for a project where your alternative to an OS is a superloop only doing one task namely flashing a led.
So since there is no use in such a theoretical project only handling one task, in any real project the superloop will contain many more tasks thus making the OS relatively less costly.
IMO you choose the worst possible testcondition for the OS, not that it is wrong but it should be considered that in any real project the results for using an OS will be better.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #55 on: September 21, 2014, 02:04:14 pm »
Quote
That only upholds for a project where your alternative to an OS is a superloop only doing one task namely flashing a led.

Think about the flashing the led as a proxy to the mcu's processing power: at any point, the mcu can either be doing something useful, or switching context.

In this case, the "useful" thing is being simulated by flashing the led / flipping a pin actually. So any time the pin is not being flipped, it is being consumed in switching context.

That's all there is to it.

In the case of a real RTOS, the "frequency" of the pin being flipped is actually identical, with or without the RTOS. Each "task", when it is runing, is flipping the pin in a fashion identical to the loop that flips the pin without an OS, for its time slice.

So what you will see is that the pin is being flipped at 100Khz for 1ms, the flipping then stopped for a few us when the mcu switches the context, and then the flipping resumes, at 100Khz, when the next task takes over.

Quote
IMO you choose the worst possible testcondition for the OS,

We discussed this earlier and the exact opposite is true - the mcu spends the most of its time running the tasks under this particular test.

You can think of it this way: in the above example, each task runs for 1ms (ie fully utilizing its time slice), and then for another 10us the mcu switches context and no user code is being run during that period of time.

The opposite would be to run a very simple task (flipping a pin) and immediate switch out to the next task <for 1us or so>  - one person I think suggested this, the mcu spends the next 10us doing context switching. You would observe a very low frequency -> on that particular mcu (PIC24F), the frequency is 17K, vs. 400Khz running naked or 394Khz running full time slice.

So which the time spent in context switching is the same, if you increase the number of context switch, the efficiency suffers. The test we are doing utilizes the full time slice so it has the highest efficiency possible, ie. the best case scenario.
================================
https://dannyelectronics.wordpress.com/
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 10269
  • Country: gb
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #56 on: September 21, 2014, 02:18:25 pm »
dannyf: how fast is your scope?

(You previously discounted the standard direct measurement techniques in favour of your strange indirecet imprecise technique because "Segger's approach requires a fast scope and for very fast switches its measurement precision may be limited by the scope's timing resolution.")
« Last Edit: September 21, 2014, 02:20:24 pm by tggzzz »
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #57 on: September 21, 2014, 02:44:34 pm »
Quote
So what you will see is that the pin is being flipped at 100Khz for 1ms, the flipping then stopped for a few us when the mcu switches the context, and then the flipping resumes, at 100Khz, when the next task takes over.

We talked about this earlier: that those flippings take the form of "chunks", each followed by a period of "silence" due to context switching.

Here is a capture of a series of such "chunks". In the chart below, you will see little gaps separating 1ms of flippings.

================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #58 on: September 21, 2014, 02:46:23 pm »
We can blow up those gaps for closer examination:

As you may have noticed, some of the gaps seem to be slightly wider than others. That's due to us sampling at a fairly low speed of 1Mhz. So at each end, we could be off by a max of 1us.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #59 on: September 21, 2014, 02:47:55 pm »
At a higher sample rate, capturing the gap is more difficult but quantifying the gap is much easier.

Here is one at 12Mhz (timing resolution of 0.08us).

The gap here is 7.917us.

================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #60 on: September 21, 2014, 02:53:14 pm »
Quote
The gap here is 7.917us.

That's a STM32F100RB running RTX @ 24Mhz, O3 flag.

The measured efficiency I reported earlier is 99.1%. Or 9us on a 1ms time slice.

The two measurements are fairly consistent, considering the max error of 0.08us * 2 due to sampling.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #61 on: September 21, 2014, 02:56:32 pm »
Hopefully those pictures will help you visualize how the RTOS works in conjunction with the tasks.

================================
https://dannyelectronics.wordpress.com/
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1042
  • Country: nl
Re: FreeRTOS performance penalty
« Reply #62 on: September 21, 2014, 05:07:36 pm »
To be more correct you need to subtract the low time of a pin toggle, as that's now included in the time measured.
To be completely correct you need to include the overhead of writing to a GPIO port as well, as some chips need to resolve pointers or do read-modify-write in software (hence the SET/CLR registers on some microcontrollers).

At that point you may as well run the same test as Segger does.

With 12MHz sample rate on a 24MHz clock rate you're still accurate to only 2 cycles.
I would suggest underclocking (instruction cache on a M0 or PIC24 is not a big issue I think) or getting a scope. Even a half-crappy USB scope will do 100MS/s, which is decent for 100MHz / 1 clock cycle or 200MHz / 2 clock cycles.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #63 on: September 21, 2014, 05:29:50 pm »
Quote
To be more correct you need to subtract the low time of a pin toggle, as that's now included in the time measured.
To be completely correct you need to include the overhead of writing to a GPIO port as well, as some chips need to resolve pointers or do read-modify-write in software (hence the SET/CLR registers on some microcontrollers).

You may want to think it through.

Quote
At that point you may as well run the same test as Segger does.

Those "issues" you identified earlier apply equally to Segger's approach.

It takes a little bit of brain power to process, but those two approaches are really identical.
================================
https://dannyelectronics.wordpress.com/
 

Online Kjelt

  • Super Contributor
  • ***
  • Posts: 5738
  • Country: nl
Re: FreeRTOS performance penalty
« Reply #64 on: September 21, 2014, 08:54:01 pm »
The measured efficiency I reported earlier is 99.1%. Or 9us on a 1ms time slice.
So the RTOS costs 1% cpu time, I can definitely live with that having the luxury of an RTOS taking care of all those other things for me.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #65 on: September 21, 2014, 09:34:09 pm »
That was the gist of the tests:

1) all the major RTOSs have comparable performance (context switch): ~200 instructions. I didn't test embOS but the numbers provided by Segger would imply the same.

2) FreeRTOS did surprisingly well, putting aside its Clause 2 prohibiting benchmarking.

The 99% figure, however, needs to be taken with some precaution, as it is the "best case" scenario - minimum switching given the time slice. If you have lots of shorter tasks - reading a button for example, your efficiency will suffer. On the flip side, we did the test at 1ms, more on the aggressive end of the scale. Retarding it to 10ms would be more realistic, I think.

In the end, I am unsure about FreeRTOS. I have used uCOS II/III for quite some time and find them quite reliable, with a much bigger footprint. I also have access to RTX so the appeal of FreeRTOS isn't that great for me.

But for someone needing an open source RTOS at a lower price point, you can use FreeRTOS knowing that from the point of context switching, you aren't losing to the big boys.
================================
https://dannyelectronics.wordpress.com/
 

Online Jeroen3

  • Super Contributor
  • ***
  • Posts: 3333
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: FreeRTOS performance penalty
« Reply #66 on: September 22, 2014, 07:05:56 am »
The test you performed test is unrealistic. You expect each silence on the toggling is because of a context switch. You forgot about a lot of other factors. (as mentioned before)
Caching, pre-fetching, scheduling method, bus wait states and last but not least interrupt jitter. Since nested interrupts are not allowed in most applications.
The results of you test will not be valid if you enable any peripheral with an interrupt.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 3075
  • Country: us
Re: FreeRTOS performance penalty
« Reply #67 on: September 22, 2014, 07:28:38 am »
I'm not sure.  Yes, the test only captures the performance penalty incurred when a task is blocked due to its run quantum being used up.  But all other reasons for preemption would require an interrupt, and an interrupt would mess up the timing of the blink loop without the RTOS as well.  So, the test only measures the "minimum" overhead that an RTOS might add.  But I think that's what it was supposed to measure...
 

Offline Precipice

  • Frequent Contributor
  • **
  • Posts: 403
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #68 on: September 22, 2014, 07:30:15 am »
Maybe the test is unrealistic but useful, in that it suggests that available RTOSes aren't utter cycle / space hogs, and it'll discourage a few people from rolling their own, which (from observation as a hardware guy watching software projects for decades) _always_ takes longer, and is more buggy / annoying than anyone expects.

If a case can even remotely and slightly wrongly be made for an RTOS underpinning blinky(), then why not make that your default. Learn one, use it. Move on and write your application.
By the time the project grows / changes and needs an RTOS, there'll be one under you already, and you won't suddenly have a huge screeching halt as you need to turn your code inside-out as you move from a superloop to something scheduled.

Or not. I'm just a hardware guy, I've got plenty of other jobs to be getting on with while you reinvent the wheel for the thousandth... frigging... time...

 

Offline Kremmen

  • Super Contributor
  • ***
  • Posts: 1283
  • Country: fi
Re: FreeRTOS performance penalty
« Reply #69 on: September 22, 2014, 07:57:13 am »
I tried to make just this point several posts back. An RTOS is not a silver bullet but it is a very handy tool. If you are already familiar with one the day the real need pops up, you are that far ahead in your project.
Nothing sings like a kilovolt.
Dr W. Bishop
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 10269
  • Country: gb
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #70 on: September 22, 2014, 08:54:52 am »
Maybe the test is unrealistic but useful, in that it suggests that available RTOSes aren't utter cycle / space hogs, and it'll discourage a few people from rolling their own, which (from observation as a hardware guy watching software projects for decades) _always_ takes longer, and is more buggy / annoying than anyone expects.
Quite, but that's a pretty weak and useless statement: people wouldn't be using RTOSs if they were performance hogs. But you knew that!

I'm puzzled why dannyf chose to occupy our time with his obtuse and ambiguous test. He has made statements to effect of not needing "high speed" oscilloscopes, but hasn't quantified "high speed", despite repeated requests.
Quote
If a case can even remotely and slightly wrongly be made for an RTOS underpinning blinky(), then why not make that your default. Learn one, use it.
Always useful to know what tools can/can't do.
Quote
Move on and write your application.
By the time the project grows / changes and needs an RTOS, there'll be one under you already, and you won't suddenly have a huge screeching halt as you need to turn your code inside-out as you move from a superloop to something scheduled.
Just so.
Quote
Or not. I'm just a hardware guy, I've got plenty of other jobs to be getting on with while you reinvent the wheel for the thousandth... frigging... time...
You forgot to mention the reinvented wheel will be elliptical.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8229
  • Country: 00
Re: FreeRTOS performance penalty
« Reply #71 on: September 22, 2014, 09:59:46 am »
Quote
The test you performed test is unrealistic.

Any test is unrealistic, just as any theory / model is unrealistic - that's by design.

The purpose of a test isn't to provide a precise measurement for a particular application, but to provide an indication, sometimes even ballpark indication, for a set of applications.

If one of the test conditions isn't valid for your application, obviously the results of the test are invalid, in the sense that the measurements from the tests are not precisely applicable to you.

However, the results of the test are still valid indications of how your application is likely to perform.

How good of an indication will depend on the materiality and relevance of such test conditions.

Quote
Caching, pre-fetching, scheduling method, bus wait states and last but not least interrupt jitter.

I would argue that all the above are either immaterial or irrelevant.

Take pre-fetch for example. You incur it twice in any context switch, first at fetching the code for the context switch itself and then at fetching the code for the next task. However, unless you have an OS that can pre-determine, with certainty, which piece of code will be executed next and pre-fetch that piece of code into the pipeline, you will always incur that code.

ie., pre-fetching, however ineffective or costly it may be, has no impact in comparing RTOSs that cannot pre-determine, with certainty, which piece of code will be executed next - as they all incur this cost equally.

and I would say that 100% of the RTOSs that have ever existed and will exist for a loooooooong period of time to come fall into t hat category, unfortunately.
================================
https://dannyelectronics.wordpress.com/
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 10269
  • Country: gb
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #72 on: September 22, 2014, 10:59:16 am »
Quote
The test you performed test is unrealistic.

Any test is unrealistic, just as any theory / model is unrealistic - that's by design.

Yes, but some tests are more unrealistic that others - and (except for blinkies) your test is unnecessarily obtuse, indirect, unrealistic and therefore unhelpful.

Quote
The purpose of a test isn't to provide a precise measurement for a particular application, but to provide an indication, sometimes even ballpark indication, for a set of applications.

You are aiming far too low! When I write tests they are designed to give precise measurements for a wide range of applications. If not then I don't waste other peoples' time by publishing them.

Why don't you simply use the standard techniques for measuring parameters that provide useful information for a wide range of applications?

Quote
Quote
Caching, pre-fetching, scheduling method, bus wait states and last but not least interrupt jitter.

I would argue that all the above are either immaterial or irrelevant.

Take pre-fetch for example. You incur it twice in any context switch, first at fetching the code for the context switch itself and then at fetching the code for the next task. However, unless you have an OS that can pre-determine, with certainty, which piece of code will be executed next and pre-fetch that piece of code into the pipeline, you will always incur that code.

That's a silly misdirection: if it is done, it is the the hardware that does the prefetching, caching, bus wait states, scheduling bus transactions, and interrupts. The OS schedules tasks.

Quote
ie., pre-fetching, however ineffective or costly it may be, has no impact in comparing RTOSs that cannot pre-determine, with certainty, which piece of code will be executed next - as they all incur this cost equally.

That's strictly true, but misses the useful point. The impact of prefetching, caches etc can be critically important in hard-realtime systems. In addition the data structures used in RTOSs can be more or less friendly to those performance factors.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 2178
  • Country: gb
Re: FreeRTOS performance penalty
« Reply #73 on: September 22, 2014, 01:40:35 pm »

Quote
Caching, pre-fetching, scheduling method, bus wait states and last but not least interrupt jitter.

I would argue that all the above are either immaterial or irrelevant.

Interrupt jitter is certainly relevant when working with an RTOS since it can be higher than it would be in non-RTOS based code.  This is because interrupts that call RTOS functions will typically need to be disabled whilst RTOS code is being executed (e.g. during a context switch or some API calls).
 

Online Jeroen3

  • Super Contributor
  • ***
  • Posts: 3333
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: FreeRTOS performance penalty
« Reply #74 on: September 22, 2014, 01:51:37 pm »
A badly configured RTOS can kill your very expensive XYZ-machine by responding to the end switch or force sensor a few milliseconds late due to interrupt jitter. Certainly not irrelevant.
 

Offline andyturk

  • Frequent Contributor
  • **
  • Posts: 892
  • Country: us
Re: FreeRTOS performance penalty
« Reply #75 on: September 22, 2014, 01:54:12 pm »
[...] I also have access to RTX so the appeal of FreeRTOS isn't that great for me.

Everyone has access to RTX now. It's free too: http://www.keil.com/pr/article/1253.htm
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 10269
  • Country: gb
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #76 on: September 22, 2014, 02:42:41 pm »
A badly configured RTOS can kill your very expensive XYZ-machine by responding to the end switch or force sensor a few milliseconds late due to interrupt jitter. Certainly not irrelevant.
Very true, of course.

It also points to another serious weakness in dannyf's methods: it can only measure the mean times, whereas hard realtime systems require maximum times.

The same is, of course, true w.r.t. cache access times, but that's a completely different discussion.

BTW dannyf, what settings for priority inversion did you use when doing your measurements? They can significantly affect mean and max times, and hence jitter/latency.

There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1042
  • Country: nl
Re: FreeRTOS performance penalty
« Reply #77 on: September 22, 2014, 02:45:09 pm »
Quote
To be more correct you need to subtract the low time of a pin toggle, as that's now included in the time measured.
To be completely correct you need to include the overhead of writing to a GPIO port as well, as some chips need to resolve pointers or do read-modify-write in software (hence the SET/CLR registers on some microcontrollers).

You may want to think it through.

Quote
At that point you may as well run the same test as Segger does.

Those "issues" you identified earlier apply equally to Segger's approach.

It takes a little bit of brain power to process, but those two approaches are really identical.

It seems like you haven't given any argument against my issue of your measurement method, so they still stand. You're just claiming you're right and smarter. I spot a trend that this is your usual way of arguing on this forum, and I find that highly disturbing in many ways.

You're still mixing up two things and doing nothing about it, where it can easily be reasoned it's relevant. Segger's way of measuring deals with the same issues (because they are tied to hardware related limitations) and therefore creates an (or at least more) accurate view of performance measurements.

It seems it's not like you are not unable to perform these measurements, but it's more a point that once your measurements is equal to what is already out there, there was no point in posting it.

Remember that if you don't accurately measure things it's not engineering, but it becomes art.
And when you realize all models are approximations of real-world phenomenons, we're not into engineering but science. Science wants to know the complete picture with all preconditions.

I view RTOS as a piece of software that enables you to do some things much easier, albeit at the cost of a bit CPU time and memory.
Some systems become so complex that you really can use the abstraction of tasks/threads/IPC to make the code readable, understandable and most importantly maintainable.
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 10269
  • Country: gb
    • Having fun doing more, with less
Re: FreeRTOS performance penalty
« Reply #78 on: September 22, 2014, 03:57:09 pm »
It seems like you (i.e. dannyf) haven't given any argument against my issue of your measurement method, so they still stand.

Not just your points and methods, of course. On this thread alone the same applies to myself and mikerj.

In the not-so-long run, if X ignores or refuses to answer other people's reasoned points, then it is detected by other people - and X's points/views are ignored. That's a shame, partly for the forum, but particularly for X.

Quote
You're just claiming you're right and smarter. I spot a trend that this is your usual way of arguing on this forum, and I find that highly disturbing in many ways.

Yes, precisely.

There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline Sal Ammoniac

  • Frequent Contributor
  • **
  • Posts: 920
  • Country: us
    • Embedded Tales Blog
Re: FreeRTOS performance penalty
« Reply #79 on: December 22, 2014, 07:15:39 pm »
dannyf: how are you measuring the parameter you call "Ticks on switch"?
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf