Author Topic: Do i really need to use an RTOS? (Alternatives to finite state machines?) (Read 15671 times)

nctnico · « **Reply #75 on:** June 12, 2022, 08:12:51 pm »

Quote from: Simon on June 12, 2022, 03:26:34 pm

Quote from: tggzzz on June 12, 2022, 01:13:28 pm

Nothing should be done in any interrupt, except determine the event that occurred and mutate that into a message put into a queue for the scheduler to observe, and optionally kick the scheduler into life.

The task state has to be saved when a task switching occurs. What's in the task state is processor dependent and is invisible to C compilers. Typically it will include the PC, stack pointer, the condition codes, the register set including FP registers. Note that many of those can be automatically saved by hardware when the interrupt is recognised and processed.

There is only one way to execute mare than one thread of code, and that is that the second code comes in an interrupt. Unless the code in the interrupt handler does the swapping of the current stack for at least the stack of something that will then decide who's stack to reload you will just have one thread oaf execution with another interrupting it until it is done. Sounds like a mess but I guess you have to make the CPU appear to be more than one somehow.

It is not a mess; it is how every OS that supports pre-emptive multitasking works.

brucehoult · « **Reply #76 on:** June 12, 2022, 10:40:50 pm »

Quote from: tggzzz on June 12, 2022, 04:33:23 pm

The Cortex A series are definitely used as MCUs, e.g. inside Zync 7000 chips which also contain FPGA fabric. One of the more intriguing possibilities is that one core could run Linux and the other an RTOS, together with processing in the FPGA. Maybe that is done in the new low-end Tek scopes.

I haven't actually used Zynq, but unless they've done something special those Cortex-A9s aren't going to have all that tight real-time performance.

In the Microchip "PolarFire SoC" FPGAs there are five 64 bit RISC-V cores. Four of them are full strength "Linux capable" with TLB, cache, branch predictors, FPU. The 5th core is a much simpler RV64IMAC core with no branch prediction, no cache (it has ITIM and DTIM), no TLB, no FPU. It has much more predictable (but slower on average) performance, with tight real-time guarantees.

tggzzz · « **Reply #77 on:** June 12, 2022, 11:08:53 pm »

Quote from: brucehoult on June 12, 2022, 10:40:50 pm

Quote from: tggzzz on June 12, 2022, 04:33:23 pm
The Cortex A series are definitely used as MCUs, e.g. inside Zync 7000 chips which also contain FPGA fabric. One of the more intriguing possibilities is that one core could run Linux and the other an RTOS, together with processing in the FPGA. Maybe that is done in the new low-end Tek scopes.

I haven't actually used Zynq, but unless they've done something special those Cortex-A9s aren't going to have all that tight real-time performance.

In the Microchip "PolarFire SoC" FPGAs there are five 64 bit RISC-V cores. Four of them are full strength "Linux capable" with TLB, cache, branch predictors, FPU. The 5th core is a much simpler RV64IMAC core with no branch prediction, no cache (it has ITIM and DTIM), no TLB, no FPU. It has much more predictable (but slower on average) performance, with tight real-time guarantees.

I doubt it is necessary for there to be tight hard real time performance from the ArmA9s in the Zynq. That's the domain of the Artix 7 FPGA fabric surrounding the cores.

There are many ways to skin cats, and endless shades of grey. Whether decent softcores in FPGA fabric is a good use of silicon and design time is an interesting question. The partitioning between a "high level" ArmA9 and "low level" Artix fabric seems relatively straightforward.

Overall I must admit to being prejudiced against weedy softcores, but it wouldn't take much for me to reverse that opinion. Horses for courses.

NorthGuy · « **Reply #78 on:** June 13, 2022, 06:30:22 pm »

Quote from: JPortici on June 10, 2022, 03:52:18 pm

For example, i have to configure a BLE module via AT commands.
There is a sequence of actions to perform. Because doing it in the dumb way holds the CPU for a noticeable time i had to split everything in small steps, something like this
Code: [Select]
send_data(stateEnum_t set_previous_state, const uint8_t *str, uint8_t strLen); switch (state) { case CHECK_PRESENCE: return send_data(CHECK_PRESENCE,"AT",2); case SET_BAUD_RATE: return send_data(SET_BAUD_RATE,"AT+BAUD2",8); ....... case SEND_DATA: if (txFinished) { if (strPos >= strSize) { //Finished Sending Data return WAIT_FOR_ANSWER; } else { //Send More Data TXBUF = txData[strPos]; strPos++; return SEND_DATA; } } else { //Wait For uart TX to finish return SEND_DATA; } case WAIT_FOR_ANSWER: if (timeout) { //Timeout. Check content of RX buffer if (!strncmp(rxBuffer,"OK",2)) { switch(previous_state) { case CHECK_PRESENCE: //Module is present. Set baud rate return SET_BAUD_RATE; ....... } else { //Unexpected answer. Error handling ..... } else { //keep waiting for timeout return WAIT_FOR_ANSWER; }

Think in terms of data flows.

There's a data flow from your program to the UART or whatever else connects to BLE. So, you create a queue that is a buffer which holds the commands until they're transmitted. When you need to configure something (or otherwise communicate to BLE), you simply put the commands into this queue. They're sitting there waiting for their turn.

Meanwhile, a different process which is called from an interrupt, or otherwise, removes the data from the command queue and sends them to BLE. This process may or may not use DMA, may be simple or complex, as needed ...

Another process receives responses from the BLE and either interprets them immediately or creates another data flow to be consumed by a yet different process at a slower pace.

When you put this all together, the complexity won't increase as the number of commands grows. As you would say, the approach scales well, compared to the brute force approach to the state machine that you employed.

When you put commands into the buffer, there's a danger that there may not be enough space. Then you may need to block. Blocking would put you back into a situation where you would need an FSM, so you need to prevent buffer overflow and associated blocking somehow. Typically, it's easy to do - either by increasing the size of the buffer (as is suitable for command flow), or by using some sort of flow control (e.g. when you need to move big amounts of raw data).

Siwastaja · « **Reply #79 on:** June 13, 2022, 06:54:17 pm »

Quote from: NorthGuy on June 13, 2022, 06:30:22 pm

Think in terms of data flows.

There's a data flow from your program to the UART or whatever else connects to BLE. So, you create a queue that is a buffer which holds the commands until they're transmitted. When you need to configure something (or otherwise communicate to BLE), you simply put the commands into this queue. They're sitting there waiting for their turn.

Meanwhile, a different process which is called from an interrupt, or otherwise, removes the data from the command queue and sends them to BLE. This process may or may not use DMA, may be simple or complex, as needed ...

Another process receives responses from the BLE and either interprets them immediately or creates another data flow to be consumed by a yet different process at a slower pace.

When you put this all together, the complexity won't increase as the number of commands grows. As you would say, the approach scales well, compared to the brute force approach to the state machine that you employed.

When you put commands into the buffer, there's a danger that there may not be enough space. Then you may need to block. Blocking would put you back into a situation where you would need an FSM, so you need to prevent buffer overflow and associated blocking somehow. Typically, it's easy to do - either by increasing the size of the buffer (as is suitable for command flow), or by using some sort of flow control (e.g. when you need to move big amounts of raw data).

That's a good strategy, but what if you need to do decisions what to push, based on something you only know after pushing something else and getting a reply? What if the operations are more complex than just arbitrary byte stream writes - for example, involve toggling different IOs, or accessing more than one peripheral. After taking this into account, your queue becomes fancy and generic, and each element becomes a struct object with command fields, or maybe opcodes if you wish, and possibly pointers to processing/decision functions, and whatnot. At some point you are at crossroads whether this would have been easier with an OS allowing linear code flow, or just the switch-case "FSM" like the OP showed.

The benefit of the switch-case solution is, even if it's not the most elegant, it is most flexible as it allows any type of code execution and decision making logic. This also makes it prone to error, though.

Even with OS such as linux available, I have found myself writing switch-case FSMs and utilizing things like select(), and I definitely did not invent this paradigm; even though the alternative of spawning dozens of threads exists. But with all synchronization and resource management, it might not be any easier than the switch-case FSM.

nctnico · « **Reply #80 on:** June 13, 2022, 07:02:45 pm »

Quote from: Siwastaja on June 13, 2022, 06:54:17 pm

Quote from: NorthGuy on June 13, 2022, 06:30:22 pm
Think in terms of data flows.

There's a data flow from your program to the UART or whatever else connects to BLE. So, you create a queue that is a buffer which holds the commands until they're transmitted. When you need to configure something (or otherwise communicate to BLE), you simply put the commands into this queue. They're sitting there waiting for their turn.

Meanwhile, a different process which is called from an interrupt, or otherwise, removes the data from the command queue and sends them to BLE. This process may or may not use DMA, may be simple or complex, as needed ...

Another process receives responses from the BLE and either interprets them immediately or creates another data flow to be consumed by a yet different process at a slower pace.

When you put this all together, the complexity won't increase as the number of commands grows. As you would say, the approach scales well, compared to the brute force approach to the state machine that you employed.

When you put commands into the buffer, there's a danger that there may not be enough space. Then you may need to block. Blocking would put you back into a situation where you would need an FSM, so you need to prevent buffer overflow and associated blocking somehow. Typically, it's easy to do - either by increasing the size of the buffer (as is suitable for command flow), or by using some sort of flow control (e.g. when you need to move big amounts of raw data).

That's a good strategy, but what if you need to do decisions what to push, based on something you only know after pushing something else and getting a reply? What if the operations are more complex than just arbitrary byte stream writes - for example, involve toggling different IOs, or accessing more than one peripheral. After taking this into account, your queue becomes fancy and generic, and each element becomes a struct object with command fields, or maybe opcodes if you wish, and possibly pointers to processing/decision functions, and whatnot. At some point you are at crossroads whether this would have been easier with an OS allowing linear code flow, or just the switch-case "FSM" like the OP showed.

The benefit of the switch-case solution is, even if it's not the most elegant, it is most flexible as it allows any type of code execution and decision making logic. This also makes it prone to error, though.

I agree. During an initialisation phase anything can go wrong and you might even want to switch to different kinds of initialisation sequences depending on version / model detected. A high level queue only makes sense if you have a layer in between which deals with the initialisation and monitoring of the device. The latter is a better seperation between device handling and higher layer commands.

NorthGuy · « **Reply #81 on:** June 13, 2022, 08:43:11 pm »

Quote from: Siwastaja on June 13, 2022, 06:54:17 pm

That's a good strategy, but what if you need to do decisions what to push, based on something you only know after pushing something else and getting a reply?

It depends. There are many ways to program the same thing and everything depends on details. For example, the code that parses the response will make such decision, and may push the next commands as well.

Quote from: Siwastaja on June 13, 2022, 06:54:17 pm

What if the operations are more complex than just arbitrary byte stream writes - for example, involve toggling different IOs, or accessing more than one peripheral. After taking this into account, your queue becomes fancy and generic, and each element becomes a struct object with command fields, or maybe opcodes if you wish, and possibly pointers to processing/decision functions, and whatnot. At some point you are at crossroads whether this would have been easier with an OS allowing linear code flow, or just the switch-case "FSM" like the OP showed.

Sure, for queue elements, you can use advanced data structures, or you can fill in your buffer with textual commands ready to be passed to BLE, or anything in between. Whatever you do, you shouldn't pursue the solutions which make things more complex. You rather try to make things simpler.

TomS_ · « **Reply #82 on:** June 15, 2022, 09:37:21 am »

Quote from: Simon on June 12, 2022, 03:26:34 pm

Unless the code in the interrupt handler does the swapping of the current stack for at least the stack of something that will then decide who's stack to reload

That is what FreeRTOS does. You end an ISR with a macro called portEND_SWITCHING_ISR and supply a parameter (flag) that indicates whether any ISR safe FreeRTOS calls have resulted in a higher priority task being made ready to run.

At this point, the task contexts will be swapped such that when the ISR returns, you resume execution of a higher priority task.

(I use FreeRTOS as an example because its the only one I have any direct experience with)

JPortici · « **Reply #83 on:** June 15, 2022, 11:29:36 am »

Quote from: Siwastaja on June 13, 2022, 06:54:17 pm

Quote from: NorthGuy on June 13, 2022, 06:30:22 pm
Think in terms of data flows.

There's a data flow from your program to the UART or whatever else connects to BLE. So, you create a queue that is a buffer which holds the commands until they're transmitted. When you need to configure something (or otherwise communicate to BLE), you simply put the commands into this queue. They're sitting there waiting for their turn.

Meanwhile, a different process which is called from an interrupt, or otherwise, removes the data from the command queue and sends them to BLE. This process may or may not use DMA, may be simple or complex, as needed ...

Another process receives responses from the BLE and either interprets them immediately or creates another data flow to be consumed by a yet different process at a slower pace.

When you put this all together, the complexity won't increase as the number of commands grows. As you would say, the approach scales well, compared to the brute force approach to the state machine that you employed.

When you put commands into the buffer, there's a danger that there may not be enough space. Then you may need to block. Blocking would put you back into a situation where you would need an FSM, so you need to prevent buffer overflow and associated blocking somehow. Typically, it's easy to do - either by increasing the size of the buffer (as is suitable for command flow), or by using some sort of flow control (e.g. when you need to move big amounts of raw data).

That's a good strategy, but what if you need to do decisions what to push, based on something you only know after pushing something else and getting a reply? What if the operations are more complex than just arbitrary byte stream writes - for example, involve toggling different IOs, or accessing more than one peripheral. After taking this into account, your queue becomes fancy and generic, and each element becomes a struct object with command fields, or maybe opcodes if you wish, and possibly pointers to processing/decision functions, and whatnot. At some point you are at crossroads whether this would have been easier with an OS allowing linear code flow, or just the switch-case "FSM" like the OP showed.

Thanks. That's exactly why i started the thread (target scenario is more complicated than the "BLE stub of an example" i showed)

tggzzz · « **Reply #84 on:** June 15, 2022, 11:37:06 am »

Quote from: JPortici on June 15, 2022, 11:29:36 am

Quote from: Siwastaja on June 13, 2022, 06:54:17 pm
Quote from: NorthGuy on June 13, 2022, 06:30:22 pm
Think in terms of data flows.

There's a data flow from your program to the UART or whatever else connects to BLE. So, you create a queue that is a buffer which holds the commands until they're transmitted. When you need to configure something (or otherwise communicate to BLE), you simply put the commands into this queue. They're sitting there waiting for their turn.

Meanwhile, a different process which is called from an interrupt, or otherwise, removes the data from the command queue and sends them to BLE. This process may or may not use DMA, may be simple or complex, as needed ...

Another process receives responses from the BLE and either interprets them immediately or creates another data flow to be consumed by a yet different process at a slower pace.

When you put this all together, the complexity won't increase as the number of commands grows. As you would say, the approach scales well, compared to the brute force approach to the state machine that you employed.

When you put commands into the buffer, there's a danger that there may not be enough space. Then you may need to block. Blocking would put you back into a situation where you would need an FSM, so you need to prevent buffer overflow and associated blocking somehow. Typically, it's easy to do - either by increasing the size of the buffer (as is suitable for command flow), or by using some sort of flow control (e.g. when you need to move big amounts of raw data).

That's a good strategy, but what if you need to do decisions what to push, based on something you only know after pushing something else and getting a reply? What if the operations are more complex than just arbitrary byte stream writes - for example, involve toggling different IOs, or accessing more than one peripheral. After taking this into account, your queue becomes fancy and generic, and each element becomes a struct object with command fields, or maybe opcodes if you wish, and possibly pointers to processing/decision functions, and whatnot. At some point you are at crossroads whether this would have been easier with an OS allowing linear code flow, or just the switch-case "FSM" like the OP showed.

Thanks. That's exactly why i started the thread (target scenario is more complicated than the "BLE stub of an example" i showed)

You will probably find you have swapped one set of problems/advantages for a different set of problems/advantages. There is no silver-bullet: what you gain on the swings you lose on the roundabout.

In particular, consider how you will debug (possibly on installed systems) deadlock, livelock, performance, and priority inheritance issues.

NorthGuy · « **Reply #85 on:** June 15, 2022, 03:42:22 pm »

Quote from: nctnico on June 13, 2022, 07:02:45 pm

During an initialisation phase anything can go wrong and you might even want to switch to different kinds of initialisation sequences depending on version / model detected.

BTW, for "once at the beginning" initialization, I usually do things in a single thread sequentially, then when everything is initialized, I turn on interrupts and the multitasking starts. This is because the initialization often requires things which you don't need during normal execution and vice versa. This not only simplifies the initialization, but, more importantly, removes complexity from the multitasking part by freeing it from all the clutter related to the initialization.

Siwastaja · « **Reply #86 on:** June 15, 2022, 05:13:19 pm »

Quote from: NorthGuy on June 15, 2022, 03:42:22 pm

Quote from: nctnico on June 13, 2022, 07:02:45 pm
During an initialisation phase anything can go wrong and you might even want to switch to different kinds of initialisation sequences depending on version / model detected.

BTW, for "once at the beginning" initialization, I usually do things in a single thread sequentially, then when everything is initialized, I turn on interrupts and the multitasking starts. This is because the initialization often requires things which you don't need during normal execution and vice versa. This not only simplifies the initialization, but, more importantly, removes complexity from the multitasking part by freeing it from all the clutter related to the initialization.

Yes, basically by making everything else truly interrupt-based - and this means, no "flag in ISR, check in main loop" BS - leaves you with one "main loop thread" to run long/complex linear code, without still needing an OS. One is of course much less than nearly unlimited, but it's not a bad choice, I have done it a few times and it doesn't seem to break apart with increasing complexity.

One example would be a complicated mobile robot platform which did all timing-critical/sensor readout/motor control/DCDC control/FSM/control loop type things in interrupt handlers (which is a breeze in Cortex-M because of interrupt priority system, pre-empting and software interrupt priority demoting). This left a large linear loop for all that stupidly complex init stuff, and once it's done, image processing & point cloud generation of ten 3D Time-of-Flight cameras. So the init took like maybe 0.5 seconds, after which the cameras would always use whatever CPU time is left from everything else (so maybe 95%). But in retrospective, it would have been quite trivial to make the camera part interrupt based, too, for an empty while(1) "main loop", which could be then used for something else again. But making all the init a truly parallel FSM implementation - no thanks. I have done it in a later project and while it works just great (and allows arbitrary init / denit in parallel without affecting anything else), it's not as easy to write and maintain. Here, an OS would possibly help with maintainability of that init code, but the question is, is it worth bringing to a project just for this reason?

agehall · « **Reply #87 on:** June 15, 2022, 06:18:23 pm »

A bit of an OT comment here:
As is common in every RTOS thread I read here, there are a number of users trying to explain RTOS to people that jump to the conclusion that an RTOS is a magic bullet to make everything real time. It is not. You can still develop bad code that performs nowhere in real time on an RTOS.

However, in the right hands, an RTOS can be a very powerful tool to help a decent programmer develop complex applications that execute with predictable performance (which is really what RT implies) without using complicated patterns. Essentially, the RTOS is a framework for this and imho should be seen as that. Everything else they may (or may not) provide is just convenience.

I do enjoy reading the posts from the people that have obvious experience and understanding though - it is amazing how much knowledge that is floating around here.

Siwastaja · « **Reply #88 on:** June 15, 2022, 06:23:09 pm »

Quote from: agehall on June 15, 2022, 06:18:23 pm

A bit of an OT comment here:
As is common in every RTOS thread I read here, there are a number of users trying to explain RTOS to people that jump to the conclusion that an RTOS is a magic bullet to make everything real time. It is not. You can still develop bad code that performs nowhere in real time on an RTOS.

However, in the right hands, an RTOS can be a very powerful tool to help a decent programmer develop complex applications that execute with predictable performance (which is really what RT implies) without using complicated patterns. Essentially, the RTOS is a framework for this and imho should be seen as that. Everything else they may (or may not) provide is just convenience.

It's all about perspective! I thought you just wrote total BS and was about to write a "someone's wrong on the internet!" style reply, but then again, from a certain perspective, what you wrote makes perfect sense.

RT is a qualifier for the OS part! I can't stress this enough.

Like a red hammer. You need something red. A bucket of red paint is most flexible and efficient way to get most of the redness. This is bare metal. But you might be used to hammers, and frankly, hammers have many advantages. But a normal hammer can't do because you need red. So red hammer it is!

In level of realtime-ness:

worse <--> better
OS - RTOS - bare metal

This is where inexperienced people get confused. They think RTOS is key for real time things. It's true for Windows software developer, and complete opposite to someone who has written code for PIC.

If what you are used to is linux or BSD, RTOS will be total magic of being able to do timing critical things!

If you are used to write bare metal microcontroller code and are fluent writing interrupt based designs, using an RTOS will only make it perform worse, timing-wise, adding timing bloat and delays, and reducing the control you have.

The key is, RTOS is good enough for most real-time MCU applications, and it is an OS. The latter is the reason it is used. The RT part is just the "not THAT bad" qualifier.

Hope this helps people who struggle with the general idea.

tggzzz · « **Reply #89 on:** June 15, 2022, 06:31:31 pm »

Quote from: agehall on June 15, 2022, 06:18:23 pm

A bit of an OT comment here:
As is common in every RTOS thread I read here, there are a number of users trying to explain RTOS to people that jump to the conclusion that an RTOS is a magic bullet to make everything real time. It is not. You can still develop bad code that performs nowhere in real time on an RTOS.

However, in the right hands, an RTOS can be a very powerful tool to help a decent programmer develop complex applications that execute with predictable performance (which is really what RT implies) without using complicated patterns. Essentially, the RTOS is a framework for this and imho should be seen as that. Everything else they may (or may not) provide is just convenience.

I do enjoy reading the posts from the people that have obvious experience and understanding though - it is amazing how much knowledge that is floating around here.

Agreed.

The OP has stated "This state machine has become too complex for me to mantain and i don't see how changing the approach in implementing this state machine is going to solve the problem. It looks to me that [using an RTOS] may postpone the moment in which is going to be too complex to mantain".

Now that might be correct. But given the OP's description, I suspect it will turn out that simplification (if any) by using an RTOS will be largely due to completely re-factoring the solution, and that a complete refactoring of the FSM could achieve similar benefits.

Having said that, completely reimplementing any solution is a big risk factor, especially using unfamiliar technology. It can work, but it often fails.

SiliconWizard · « **Reply #90 on:** June 15, 2022, 06:44:20 pm »

Yep to the above. And as is common on forums, while many of the posts were interesting, each with a kinda different angle, most were actually not answering the original question. Which probably explains agehall's reply.

But people should use forums for what they are. A space for discussion. Not for getting definitive answers on personal questions. We're not here to do other people's work. We're just here to share our knowledge, experience or opinions. Now up to the OP to extract what will (or won't) help them decide.

But sure thing, the "*this* is screaming for *that* solution" approach is usually not a great start, but I guess by now the OP is at least convinced that it's not necessarily screaming for anything much.

westfw · « **Reply #91 on:** June 15, 2022, 09:18:51 pm »

The other thing that I don't think has been mentioned yet, is that while a multitasking environment is very convenient in a lot of cases, it DOES NOT follow that that means you need a "Real Time" Operating System, or even preemption, much less one of the complex commercial operating systems like FreeRTOS.

You can go a long way with a much simpler "Run to Completion" cooperative scheduler - essentially the OP's "State Machines", but with the ability to suspend each SM whenever you want, instead of having to contort the code to having "single exit points."
Yes, you have to be a bit careful about having any one bit of code run for "too long", but you can mostly do away with locks, which is a pretty good trade-off. (I might still be traumatized by the large percentage of SPRs that had "did something wrong with locking" as their root cause, back in the day.)

tggzzz · « **Reply #92 on:** June 15, 2022, 09:27:31 pm »

Quote from: westfw on June 15, 2022, 09:18:51 pm

The other thing that I don't think has been mentioned yet, is that while a multitasking environment is very convenient in a lot of cases, it DOES NOT follow that that means you need a "Real Time" Operating System, or even preemption, much less one of the complex commercial operating systems like FreeRTOS.

You can go a long way with a much simpler "Run to Completion" cooperative scheduler - essentially the OP's "State Machines", but with the ability to suspend each SM whenever you want, instead of having to contort the code to having "single exit points."
Yes, you have to be a bit careful about having any one bit of code run for "too long", but you can mostly do away with locks, which is a pretty good trade-off. (I might still be traumatized by the large percentage of SPRs that had "did something wrong with locking" as their root cause, back in the day.)

I completely agree. IMHO such schedulers have many benefits including being simple and encouraging simplicity in the application. That was the kind of thing relevant to my earlier examples involving message queues and yield() functions in various forms.

The other point to note is that softies too often think that threads are the way to store the state of computation that can block. That way leads to servers with tens of thousands of threads, which is grossly inefficient in many ways. Usually it is much better to store the state in an object, and have a worker thread pick up an event and process it in the context of that object. If you have more cores, you can have more worker threads operating in parallel.

brucehoult · « **Reply #93 on:** June 16, 2022, 01:43:19 am »

Quote from: tggzzz on June 15, 2022, 09:27:31 pm

The other point to note is that softies too often think that threads are the way to store the state of computation that can block. That way leads to servers with tens of thousands of threads, which is grossly inefficient in many ways. Usually it is much better to store the state in an object, and have a worker thread pick up an event and process it in the context of that object. If you have more cores, you can have more worker threads operating in parallel.

Objects with states are much much less convenient to program that threads.

There is more than one way to implement threads. You don't *have* to do it in a way that each thread is allocated a few MB of stack (or even a few KB) whether it needs it or not.

The Erlang language and runtime is specifically designed for systems with huge numbers of active threads, spread over many CPUs, and has been used commercially for several decades.

I worked on a system (also in telephone exchange software) that used Scheme continuations to implement one thread per in-progress phone call (from picking up the receiver and instructing the hardware to give them dial-tone, right through to eventually hanging up the call). In some places it was deployed (national operators in Belgium, Netherlands, Poland, and Indonesia, at least) peak call initiation rates were a few thousand per second with average call lengths 2 minutes, so up to several hundred thousand active threads at a time. Running on SPARC machines in the early 2000s (just looking at transitioning some production use to x86). The main code was all in compiled Chicken Scheme (some of it machine-generated from other things). A Scheme thread could call out to C code, and the C code could make callbacks to the Scheme, but you couldn't switch threads while in C code (or a callback from it).

SiliconWizard · « **Reply #94 on:** June 16, 2022, 02:03:00 am »

Yep. If you handle everything with states, some things will be much harder to implement. I gave the example of a file system library for instance. I implemented FAT/exFAT for some project: I was going to use FatFs, but as I explained, it's all made of "blocking" calls so it's a no-no for any kind of FSM-like cooperative multitasking. (Some suggested workarounds such as periodically calling the main FSM from within FatFs, which I also consider an ugly hack and a general no-no: I don't want "tasks" to sort of "preempt" the scheduler. Horrible. "Tasks" should "yield", but the scheduling should still be handled by the "scheduler", and nowhere else.) So I implemented my own library, with non-blocking calls as much as possible and thus a large number of internal states. (I wanted a library usable in situations for which blocking calls would be unacceptable.) It was absolutely NOT fun to implement.

And preemptive multitasking in particular makes it *a lot* easier for the developer. Sure you now have synchronization issues, but if you have full control over the system you design, you don't necessarily need to use anything that is potentially hard to deal with, such as resource sharing or memory sharing (at least directly.) You can use message passing only, both for synchronization and communication between threads and the end architecture is rather straightforward to develop for. This approach doesn't necessarily work for all situations, but it does for a surprising number of them.

I have developed a lot of projects using ad-hoc FSM/scheduling. But the above approach is interesting and I will certainly consider it more often in the future.

NorthGuy · « **Reply #95 on:** June 16, 2022, 03:38:41 am »

Quote from: SiliconWizard on June 16, 2022, 02:03:00 am

I implemented FAT/exFAT for some project

<snip>

And preemptive multitasking in particular makes it *a lot* easier for the developer.

Look at Microsoft Windows. You already have pre-emptive multitasking, and you already have FS implemented. Yet, Windows API offers overlapped IO, and lots of people use it.

tggzzz · « **Reply #96 on:** June 16, 2022, 07:45:10 am »

Quote from: brucehoult on June 16, 2022, 01:43:19 am

Quote from: tggzzz on June 15, 2022, 09:27:31 pm
The other point to note is that softies too often think that threads are the way to store the state of computation that can block. That way leads to servers with tens of thousands of threads, which is grossly inefficient in many ways. Usually it is much better to store the state in an object, and have a worker thread pick up an event and process it in the context of that object. If you have more cores, you can have more worker threads operating in parallel.

Objects with states are much much less convenient to program that threads.

There is more than one way to implement threads. You don't *have* to do it in a way that each thread is allocated a few MB of stack (or even a few KB) whether it needs it or not.

The Erlang language and runtime is specifically designed for systems with huge numbers of active threads, spread over many CPUs, and has been used commercially for several decades.

Er, no.

You appear to be confusing Erlang's processes (of which there are vast numbers) with operating system threads/processes (some of which would be "worker threads" for the Erlang processes). For good performance there should be around one worker thread for each core, and that thread should be doing useful computation 100% of the time, with each Erlang process timeshared within one worker thread.

Quote

I worked on a system (also in telephone exchange software) that used Scheme continuations to implement one thread per in-progress phone call (from picking up the receiver and instructing the hardware to give them dial-tone, right through to eventually hanging up the call). In some places it was deployed (national operators in Belgium, Netherlands, Poland, and Indonesia, at least) peak call initiation rates were a few thousand per second with average call lengths 2 minutes, so up to several hundred thousand active threads at a time. Running on SPARC machines in the early 2000s (just looking at transitioning some production use to x86). The main code was all in compiled Chicken Scheme (some of it machine-generated from other things). A Scheme thread could call out to C code, and the C code could make callbacks to the Scheme, but you couldn't switch threads while in C code (or a callback from it).

I too have worked on soft real-time telecom billing systems which killed calls when the credit ran out. People who had written previous systems in C were amazed by its performance.

Each call had it state in one object, which also contained a queue for that call's incoming events. When a worker thread had finished processing one call's event, it chose another call's object and processed the event in that object's queue. One worker thread per core, with a couple of cores dedicated to GC and general OS operations.

brucehoult · « **Reply #97 on:** June 16, 2022, 08:27:50 am »

Quote from: tggzzz on June 16, 2022, 07:45:10 am

Quote from: brucehoult on June 16, 2022, 01:43:19 am
Quote from: tggzzz on June 15, 2022, 09:27:31 pm
The other point to note is that softies too often think that threads are the way to store the state of computation that can block. That way leads to servers with tens of thousands of threads, which is grossly inefficient in many ways. Usually it is much better to store the state in an object, and have a worker thread pick up an event and process it in the context of that object. If you have more cores, you can have more worker threads operating in parallel.

Objects with states are much much less convenient to program that threads.

There is more than one way to implement threads. You don't *have* to do it in a way that each thread is allocated a few MB of stack (or even a few KB) whether it needs it or not.

The Erlang language and runtime is specifically designed for systems with huge numbers of active threads, spread over many CPUs, and has been used commercially for several decades.

Er, no.

You appear to be confusing Erlang's processes (of which there are vast numbers) with operating system threads/processes (some of which would be "worker threads" for the Erlang processes). For good performance there should be around one worker thread for each core, and that thread should be doing useful computation 100% of the time, with each Erlang process timeshared within one worker thread.

I'm not confusing anything.

I'm using the standard terminology of "thread" being an notionally independent locus of control flow to describe what Erlang calls a "process" in order not to confuse people unfamiliar with Erlang.

Erlang's "thread" and "process" terminology is the opposite of what people are normally used to with, say, Linux processes each with a number of Green threads within them.

SiliconWizard · « **Reply #98 on:** June 16, 2022, 07:21:28 pm »

While I don't know Erlang very well, I think it's closer to what Go also does with goroutines.

JPortici · « **Reply #99 on:** July 06, 2022, 05:05:07 pm »

Several days later, and a week into my first real freeRTOS project, i have to say this is really fun.
I have already ran into crazy bugs (like things stop working for no apparent reason, move the statement and it works again, find the reason)
Currently rewriting everything to run in tasks, each block in a separate one and see if things can be aggregated (or if there will be no need, there is plenty of memory and actually many many things are wake up -> do really short thing really fast -> sleep for lots of time, 1ms tick rate seems to be almost too fast)
And i've started adopting function pointers and extensive use of contexts for things that are repeated and would elsewhere called "classes" (for example, currently adapting a Lin stack that run in multiple parallel instances). I was never really keen on function pointers, mostly because they intimidated me.

It's really a breath of fresh air. A lot of planning and writing and replanning and rewriting but i LOVE treating so many things as separate programs. I have no doubt that adding changes will be easier. I have also no doubt that this new experience will finally change the way i approach FSMs in general


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Do i really need to use an RTOS? (Alternatives to finite state machines?) (Read 15671 times)

Share me