Quite a few posters the numbers right, some closer than others.
So here are the numbers:
1) What's the incremental flash usage?
As you would expect, the precise number depends on compiler setting, kernel configuration and chips used.
On PIC24F, it goes from 5K to 8Kb (instructions only). On CM3, it goes from 4.5KB to 6KB.
The size of total compiled flash space, with a reasonably sized stack, goes from 10KB to 30KB, however.
2) What's the incremental (static) ram usage?
This varies greatly. Under the most basic heap management strategy (=no release of ram space from terminated tasks), the smallest is a few hundred bytes + heaps, to a few thousand KB - most of it in the heaps you configure for the tasks.
3) What the frequency of the flip, as a percentage vs. that of the naked flip?
On a 8Mhz PIC24F, the frequency of naked flip is about 400Khz. Under FreeRTOS, it is about 394Khz -> high 98%. That number dips as you slow down the mcu, to about low 98%. So the switching takes about 80 - 120 instructions per switch. That's roughly in the ballpark of figures I have seen: 150 - 300 instructions, shorter for 32-bit chips and longer for 8-bit chips.
Now, great caution in using that number:
1) the particular test minimizes the cost of task switching. Each task in the test utilizes its alloted slot to the fullest extend, thus minimizing the cost of switching.
On the other extreme - as proposed by one of the posters earlier - you can run a couple instructions and then yield that task -> under this approach, the mcu spends its time mostly switching between tasks, the pin flip frequency is considerably lower, reflecting the significant cost of switching.
I would argue that the reality is closer to my test in that most of the time, the tasks fully utilizes its time slot, than lasting just a couple instructions.
2) the kernel is configured to minimize over head, for example, by disabling stack overflow checks, etc. In a real life application, you are likely to have turned them on.
Overall, I think a (reasonable) minimum spec for a mcu running FreeRTOS would be like 16KB flash + 8KB ram, 4MIPS. At that point, your ability to add tasks and take advantage of a rtos is very limited -> as there isn't much resources left.
A practical "minimum" might be 24 - 32KB flash and 10KB+ram, 8MIPS. At that point, the switching cost is not noticeable, and there is sufficient space, flash / ram, to implement a few reasonable tasks.
In conclusion, I am positively surprised how little processing power FreeRTOS took away from the chip, and find it quite helpful if you have a few low priority tasks with disparate or particularly long execution time (fft, lcd vs. buttons for example). With an rtos, you don't have to break them up into pieces and it is automagically done for you.
However, it is tricky to use interrupts and to "share" peripherals in an RTOS environment. And it represents another layer that a programer has to deal with.
Hope it helps.