That only upholds for a project where your alternative to an OS is a superloop only doing one task namely flashing a led.
Think about the flashing the led as a proxy to the mcu's processing power: at any point, the mcu can either be doing something useful, or switching context.
In this case, the "useful" thing is being simulated by flashing the led / flipping a pin actually. So any time the pin is not being flipped, it is being consumed in switching context.
That's all there is to it.
In the case of a real RTOS, the "frequency" of the pin being flipped is actually identical, with or without the RTOS. Each "task", when it is runing, is flipping the pin in a fashion identical to the loop that flips the pin without an OS, for its time slice.
So what you will see is that the pin is being flipped at 100Khz for 1ms, the flipping then stopped for a few us when the mcu switches the context, and then the flipping resumes, at 100Khz, when the next task takes over.
IMO you choose the worst possible testcondition for the OS,
We discussed this earlier and the exact opposite is true - the mcu spends the most of its time running the tasks under this particular test.
You can think of it this way: in the above example, each task runs for 1ms (ie fully utilizing its time slice), and then for another 10us the mcu switches context and no user code is being run during that period of time.
The opposite would be to run a very simple task (flipping a pin) and immediate switch out to the next task <for 1us or so> - one person I think suggested this, the mcu spends the next 10us doing context switching. You would observe a very low frequency -> on that particular mcu (PIC24F), the frequency is 17K, vs. 400Khz running naked or 394Khz running full time slice.
So which the time spent in context switching is the same, if you increase the number of context switch, the efficiency suffers. The test we are doing utilizes the full time slice so it has the highest efficiency possible, ie. the best case scenario.