Author Topic: Reducing RTOS complexity on micro, "never global vars"  (Read 8261 times)

0 Members and 1 Guest are viewing this topic.

Offline jnzTopic starter

  • Frequent Contributor
  • **
  • Posts: 593
Reducing RTOS complexity on micro, "never global vars"
« on: July 25, 2016, 06:20:14 pm »
I am just wrapping up my first RTOS project. Before starting, the most common recommendation was NEVER USE GLOBALS for information between threads. Here is what happened...

  • Thread 1 is a driver / controller for a SPI device
  • Thread 1 has a pretty big struct that contains modes, settings, read data from the device, timeout variables, etc
  • Thread 2 is a logical block, needs to know the states from different threads and make determinations on behavior
  • Because it's "poor form" to use globals, I used messages/mailboxes to send the data over, so I have a Thread1 copy and a Thread2 copy of the data, this creates it's own issues with sync and timeouts among other things
  • Serious complexity came with Threads 3, 4, 5, 6, etc

The issues I'm noticing:
  • I either have to make two compete copies of the original struct - or I have to have a trimmed down version for Thread2 who has the shadow copy, the issue is either I'm balancing waste or I have two different but similar structs. Wasteful or annoying
  • If thread2 want's to change a setting, I can't just set it in my shadow struct, I need to make a routine that sends a "command" mail request which is checked and processed inside Thread1 the next time it runs - fine - but kind of a lot of code for things that would otherwise be fast toggles.
  • I like the idea of the shadow copy fine enough - but it seems that when you have three threads, and now one original and two shadow copies things are getting out of hand. Would it have been the same issue to make a global shadow copy?
  • The main issue with globals seems to be reads and writes - How in Thread1 can I be sure that I haven't interrupted Thread2 during a write? But isn't this exactly what Mutexes are for!?

Basically, I think I adhered so strictly to "no globals!" that I made the program massively more complex, which was a mistake.  It's serious work now to take a struct, pass a pointer to it in a mailbox, copy it the data to the stack, cast it to a compatible struct, copy the data to it's new shadow struct, then keep all of that in sync. A lot can go wrong with pointer passing and casting.

Am I wrong to think that despite "no globals" mantra - that I could have had a "read-only global shadow data" for some of these things that was still handled on a mutex/semaphore for reads/writes?

For example... I have a constant stream of data coming in over UART. The UART handler could have read that, copied it to a read-only global struct/array/buffer, and then that could have been managed with mutexes in the other threads, right? Because right now I have three copies of that data constantly being copied and transferred from UART to the other threads - making synchronization pretty complex.

Anyone?
« Last Edit: July 25, 2016, 06:22:17 pm by jnz »
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9886
  • Country: us
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #1 on: July 25, 2016, 06:36:04 pm »
Maybe you have a global class that owns the data and provides functions for various threads to manipulate the data.  Mutual exclusion will apply, I suspect.
The idea would be to hide one copy of the data - maintaining duplicate copies, or even just shadow copies, is a scheme that is bound to fail.

In the case of a stream going to several threads, have the input handler put the data in the proper queue.  The threads can just slurp the data from their end of the queue.

There's really no such thing as 'never'.  What it really means is 'never' until you just have to.  But there shouldn't really be a need to use globals if you can figure out a class that will hide the data and provide restricted access to it.

 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26755
  • Country: nl
    • NCT Developments
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #2 on: July 25, 2016, 07:10:12 pm »
This is exactly why you should avoid having parallel threads which together do one job. It is always going to end in a mess with potential deadlocks. Better rewrite the threads into a single thread which uses one or more statemachines.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 
The following users thanked this post: diyaudio, aandrew

Offline Jeroen3

  • Super Contributor
  • ***
  • Posts: 4067
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #3 on: July 25, 2016, 07:29:58 pm »
Things you share inter-thread are technically global. Unless you passing pointers with finite life local objects. But you'd need to be even more cautious with that.

The two thing you need to lookout for are:
- when both threads write to the shared data object.
- when one thread write is interrupted and the second thread reads. You could theoretically read a half updated word.
So Writing/Reading shared data is always exclusive!

Either:
- guard the operation with kernel locks. This is dirty, adds jitter. But you can't hang it.
- guard the operation with a mutex/semaphore. It's possible to hang it if you don't yield properly.
Or, this is tricky since it platform specific:
- use exclusive read/write.

One more thing to be aware of. If your compiler optimizes the variable a sequence of reads might become consolidated and optimized to a lesser number of reads. Maybe not what you intended to happen. Be more explicit to the compiler. Use volatile to always ensure the actual value if fetched for each statement the variable is used in.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4196
  • Country: us
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #4 on: July 25, 2016, 07:37:48 pm »
My general observation is that cooperative ("run-to-completion") based OSes have most of their bugs due to either cpu hogging or unexpected task switches, while preemptive OSes have most of their bugs due to locking issues.  ie, resource contention is a PITA, not matter how it's managed.
 

Offline jnzTopic starter

  • Frequent Contributor
  • **
  • Posts: 593
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #5 on: July 25, 2016, 08:47:12 pm »
Pretty good comments everyone...


In the case of a stream going to several threads, have the input handler put the data in the proper queue.  The threads can just slurp the data from their end of the queue.

Which when you have a ton of incoming data, and some threads only every 40 times the data handler does that won't work. Right now I have a Request and Respond system. Thread2 runs every 100ms, at the end it running, it sends a request for new data, the Thread1 that is supplying that data needs to keep itself refreshed every 5ms. If I just poured all the data in, every 100ms Thread2 would have a ton of duplicate data it would have to sort through, I'd need to have a worst-case buffer size to deal with. In the Request model I know only one set of data is coming.


There's really no such thing as 'never'.  What it really means is 'never' until you just have to. 

:D

This is exactly why you should avoid having parallel threads which together do one job. It is always going to end in a mess with potential deadlocks. Better rewrite the threads into a single thread which uses one or more statemachines.

That's wouldn't be better in this case. I have data coming in over UART in a thread that is designed to be portable between this and other applications. I have SPI devices that are unique to this application, and they are self-contained state machines - but the data may be needed somewhere else. An RTOS means you'll need to transfer data between threads, it's not really avoidable.


Things you share inter-thread are technically global. Unless you passing pointers with finite life local objects. But you'd need to be even more cautious with that.

The two thing you need to lookout for are:
- when both threads write to the shared data object.
- when one thread write is interrupted and the second thread reads. You could theoretically read a half updated word.
So Writing/Reading shared data is always exclusive!

Well, using a CMSIS RTOS, you allocate memory the size of the object you're sending, it packs it (does a behind the scenes copy) and puts it in the mailbox que for the receiving thread. So it's not a shared object really. It's an instance of the data put on the receiving thread's stack. The issue is if you aren't careful and just start pouring data on to the mailbox, you can overrun it.

ie, resource contention is a PITA, not matter how it's managed.

Seems to be.
 

Offline Kilrah

  • Supporter
  • ****
  • Posts: 1852
  • Country: ch
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #6 on: July 25, 2016, 09:16:40 pm »
Which when you have a ton of incoming data, and some threads only every 40 times the data handler does that won't work. Right now I have a Request and Respond system. Thread2 runs every 100ms, at the end it running, it sends a request for new data, the Thread1 that is supplying that data needs to keep itself refreshed every 5ms. If I just poured all the data in, every 100ms Thread2 would have a ton of duplicate data it would have to sort through, I'd need to have a worst-case buffer size to deal with. In the Request model I know only one set of data is coming.

That's wouldn't be better in this case. I have data coming in over UART in a thread that is designed to be portable between this and other applications. I have SPI devices that are unique to this application, and they are self-contained state machines - but the data may be needed somewhere else. An RTOS means you'll need to transfer data between threads, it's not really avoidable.

Not sure if it can translate to your application, but in ours that seems similar the data is received using interrupts, i.e. all UART receive events are interrupt-driven, and the ISRs for those just push the received bytes onto a FIFO buffer (separate one for each UART).

When the thread that needs the data wakes up it just pops from the FIFO(s) until they're empty and do their parsing job.
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3632
  • Country: us
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #7 on: July 25, 2016, 09:20:30 pm »
The issue isn't not using globals, it's not using globals from different threads without mutual exclusion. I expect that the better compilers have features to provide thread-local storage (TLS) and mutex protected variables, since even Java has them.
 
The following users thanked this post: Kilrah

Offline jnzTopic starter

  • Frequent Contributor
  • **
  • Posts: 593
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #8 on: July 25, 2016, 09:36:18 pm »
Not sure if it can translate to your application, but in ours that seems similar the data is received using interrupts, i.e. all UART receive events are interrupt-driven, and the ISRs for those just push the received bytes onto a FIFO buffer (separate one for each UART).

When the thread that needs the data wakes up it just pops from the FIFO(s) until they're empty and do their parsing job.

Now what if 5 threads need that data?
 

Offline jnzTopic starter

  • Frequent Contributor
  • **
  • Posts: 593
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #9 on: July 25, 2016, 09:42:49 pm »
The issue isn't not using globals, it's not using globals from different threads without mutual exclusion. I expect that the better compilers have features to provide thread-local storage (TLS) and mutex protected variables, since even Java has them.

Which is kind what I'm coming around to. Because currently with each thread having it's own stack and each time I need data from another thread, having to move a temp pointer to the mailbox location, match them, copy that message/mail to where it goes based on a message ID system, etc etc. Annoying.
 

Offline Kilrah

  • Supporter
  • ****
  • Posts: 1852
  • Country: ch
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #10 on: July 25, 2016, 09:53:30 pm »
Now what if 5 threads need that data?
Then... you probably shouldn't have that many threads doing the same thing in the first place? Multithreading is good, but becomes pointless if you make dozens of them that overlap a lot. Might be better and simpler to have only 3, that work on significantly different parts of the system.

Or you just have one thread process the received data, store the various processed forms the others need somewhere, and use semaphores to signal availability to them.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7695
  • Country: de
  • A qualified hobbyist ;)
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #11 on: July 25, 2016, 09:54:03 pm »
The issue isn't not using globals, it's not using globals from different threads without mutual exclusion. I expect that the better compilers have features to provide thread-local storage (TLS) and mutex protected variables, since even Java has them.

Aren't those features provided by the OS? I think it's called IPC.
 

Offline jnzTopic starter

  • Frequent Contributor
  • **
  • Posts: 593
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #12 on: July 25, 2016, 09:57:16 pm »
Or you just have one thread process the received data, store the various processed forms the others need somewhere, and use semaphores to signal availability to them.

So exactly the topic at hand ;)
 

Offline Sal Ammoniac

  • Super Contributor
  • ***
  • Posts: 1663
  • Country: us
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #13 on: July 25, 2016, 10:53:46 pm »
Pretty good comments everyone...

There's really no such thing as 'never'.  What it really means is 'never' until you just have to. 

:D

I'm sure the Toyota engineers had the same thoughts when they first decided they needed to use globals. Then they decided they needed a few more, and a few more, and eventually they ended up with 11,000 of them and their code was a mess of spaghetti.

This is exactly why you should avoid having parallel threads which together do one job. It is always going to end in a mess with potential deadlocks. Better rewrite the threads into a single thread which uses one or more statemachines.

I think this is the best answer so far. Just because you have threads available doesn't mean you have to go overboard and create a separate thread for everything.

Quote from: jnz
That's wouldn't be better in this case. I have data coming in over UART in a thread that is designed to be portable between this and other applications. I have SPI devices that are unique to this application, and they are self-contained state machines - but the data may be needed somewhere else. An RTOS means you'll need to transfer data between threads, it's not really avoidable.

Designing things to be portable, while laudable, often results in convoluted messes as you have to make too many accommodations to portability over clean architecture.
Complexity is the number-one enemy of high-quality code.
 
The following users thanked this post: hans

Offline andyturk

  • Frequent Contributor
  • **
  • Posts: 895
  • Country: us
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #14 on: July 26, 2016, 04:35:58 am »
I'd be surprised if you get thread local storage (TLS) on a compiler for an mcu. It requires a lot of integration between the RTOS/threading library and the code generation of the compiler.

However, I've gotten almost equivalent functionality for "free" using GCC with C++. You create a Thread class that encapsulates the thread API for your RTOS (I used CMSIS-RTX last time) with a virtual run() method that you override to implement your particular thread's loop. When you subclass from Thread, just add your thread-local stuff as instance variables. If you're using C++, it's pretty easy.

 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1626
  • Country: nl
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #15 on: July 26, 2016, 09:41:25 am »
Not sure if it can translate to your application, but in ours that seems similar the data is received using interrupts, i.e. all UART receive events are interrupt-driven, and the ISRs for those just push the received bytes onto a FIFO buffer (separate one for each UART).

When the thread that needs the data wakes up it just pops from the FIFO(s) until they're empty and do their parsing job.

Now what if 5 threads need that data?

In that case, have only 1 thread work on the same hardware peripheral. That's the easiest solution. E.g. a Uart parser/packetizer that formats outgoing packets, parses incoming packets and spreads commands via mailboxes/IPC to other threads. It can make sense to also do this for CAN or Ethernet, albeit both have different challenges.

However sometimes you will have multiple thread access to 1 peripheral; e.g. SPI or I2C. In that case I would guard each I/O access with a mutex. But take care you are not using this peripheral within an interrupt, because it can not access the peripheral. If so, I would seriously take the hardware peripheral allocation into consideration. Modern large MCU's come with half a dozen SPI busses these days, so I always take care to use most of them when I still can.


Technically if you create a mailbox in CMSIS, you are still defining a global object. But I think that is more acceptable, because any read/write action on that object is done via OS routines which should be thread safe, reentrant, etc.
The problem with global integers etc is once you start writing to it from multiple places or have some thread priority 'issue' that causes other threads to work on old or corrupted data.

Another issue with global objects is testability (unit testing). You need to take special care that between each test, the object states can be reset completely to zero. This usually involves some extra code added only for testing, so it can access static globals within drivers, etc. This can become very messy really quick.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26755
  • Country: nl
    • NCT Developments
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #16 on: July 26, 2016, 02:44:20 pm »
Slightly off-topic: One of the interesting things I have observed about programmers is that most go through the following stages:
- Program everything as a single task
- Discover parallel threads, use them too much and realise data synchronisation between threads can turn into a world of pain quickly
- Go back to programming using a single task as much as possible
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Kalvin

  • Super Contributor
  • ***
  • Posts: 2145
  • Country: fi
  • Embedded SW/HW.
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #17 on: July 26, 2016, 02:54:50 pm »
Slightly off-topic: One of the interesting things I have observed about programmers is that most go through the following stages:
- Program everything as a single task
- Discover parallel threads, use them too much and realise data synchronisation between threads can turn into a world of pain quickly
- Go back to programming using a single task as much as possible
- Discover co-operative, run-to-completion scheduler and realize the system using simple (possibly event-driven) state machines.
 
The following users thanked this post: Bassman59

Offline borjam

  • Supporter
  • ****
  • Posts: 908
  • Country: es
  • EA2EKH
Re: Reducing RTOS complexity on micro, "never global vars"
« Reply #18 on: July 26, 2016, 02:55:17 pm »
    • The main issue with globals seems to be reads and writes - How in Thread1 can I be sure that I haven't interrupted Thread2 during a write? But isn't this exactly what Mutexes are for!?
    The "no globals" rule means "avoid them except whenever they are unavoidable". Look at any OS, there are global variables of course. You have a process table, a file descriptor table, memory tables, etc, etc.

    The problem is: globals are troublesome unless you isolate them properly in a module, defining an access protocol (yes, that's what mutual exclusion primitives are useful for). You have to be extremely careful with them, because it's not as simple as protecting each access with a mutex.

    Quote
    Basically, I think I adhered so strictly to "no globals!" that I made the program massively more complex, which was a mistake.  It's serious work now to take a struct, pass a pointer to it in a mailbox, copy it the data to the stack, cast it to a compatible struct, copy the data to it's new shadow struct, then keep all of that in sync. A lot can go wrong with pointer passing and casting
    The problem with the shadow copy is keeping it consistent with the master copy. Do you have mechanisms in place to
    prevent the master from being updated while you do some work on the copy?

    Concurrent programming is an amazing tool, but it must be used properly. Back in the 90's, when I did a lot of this, I found this paper to be a real life saver. The simple rules to determine which tasks should be concurrent processes are really useful. And it will help you structure it properly.

    http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.6735&rep=rep1&type=pdf

    But remember that concurrency brings several problems of its own: Not only mutual exclusion, but deadlocks. And such problems can be really tough to debug. I just remember when a friend sent a pointer to a local variable to another thread... it took a week or two to find the error. It was such a stupid mistake my brain kinda filtered out as impossible the countless times I read the code. Finally, during one step by step run, I realized. Gosh, is this a pointer to a local variable????? :D
    [/list]
    « Last Edit: July 26, 2016, 03:06:26 pm by borjam »
     
    The following users thanked this post: SimonR

    Offline SimonR

    • Regular Contributor
    • *
    • Posts: 122
    • Country: gb
    Re: Reducing RTOS complexity on micro, "never global vars"
    « Reply #19 on: July 26, 2016, 11:06:48 pm »
    Slightly off-topic: One of the interesting things I have observed about programmers is that most go through the following stages:
    - Program everything as a single task
    - Discover parallel threads, use them too much and realise data synchronisation between threads can turn into a world of pain quickly
    - Go back to programming using a single task as much as possible

    Been there, Done that, you're spot on.
    The lesson is: Don't make things complicated if you don't have to.
     

    Offline aandrew

    • Frequent Contributor
    • **
    • Posts: 277
    • Country: ca
    Re: Reducing RTOS complexity on micro, "never global vars"
    « Reply #20 on: July 26, 2016, 11:45:31 pm »
    This is exactly why you should avoid having parallel threads which together do one job. It is always going to end in a mess with potential deadlocks. Better rewrite the threads into a single thread which uses one or more statemachines.

    Exactly, exactly, exactly!

    I see this so many times... people break up their code into modules which have no business being separated, and then contort their code into knots trying to get the modules to play nicely with each other.

    KISS; if you need to break code into threads you'll know it because it will simply not make a lot of sense to have it any other way. Until then... one process with interrupt handlers which do as absolutely little as possible. :-)
     

    Offline SimonR

    • Regular Contributor
    • *
    • Posts: 122
    • Country: gb
    Re: Reducing RTOS complexity on micro, "never global vars"
    « Reply #21 on: July 27, 2016, 12:34:17 am »
    Basically, I think I adhered so strictly to "no globals!" that I made the program massively more complex, which was a mistake.  It's serious work now to take a struct, pass a pointer to it in a mailbox, copy it the data to the stack, cast it to a compatible struct, copy the data to it's new shadow struct, then keep all of that in sync. A lot can go wrong with pointer passing and casting.

    I think you are right you have taken the "no globals!" too literally. What this so called rule is trying to avoid
    is the case where you have one variable for every feature of every peripheral etc.

    eg.
       int uart1_baud, uart1_stop, uart1_bits;
       int uart1_baud, uart1_stop, uart1_bits;
       
       InitUart1();
       InitUart2();

    Doing this can get out of hand very quickly.

    Quote
    Things you share inter-thread are technically global. Unless you passing pointers with finite life local objects. But you'd need to be even more cautious with that.

    If you think about it within one module of functionality local data is actually global throughout the module. So in practice
    the module init function creates a control structure, for instance, and then passes it as an argument to each thread that needs it
    when it is created. Writes to this structure may or may not need protecting with some kind of lock.

    The stucture has to be maintained for the lifetime of the threads. One way to solve this is to have a third thread to create the first 2 (or more) this dies when and if the others complete.
    Alternatively you could use a global. But pass it in to the code by reference.

    Code: [Select]
    struct Uart {
        int uart1_baud,
        int uart1_stop,
        int uart1_bits
         };
    void main(void)
    {
        struct Uart uart1;
        struct Uart uart2;
       
        mainApplicationEntry(&uart1,&uart2);
    }
    From this point on the structure should only be accessed using the reference pointer and never directly.
    The result is you have only one copy shared between threads and none of the clutter of a global.
    This method also solves the unit testing issue mentioned earlier.
     

    Offline SimonR

    • Regular Contributor
    • *
    • Posts: 122
    • Country: gb
    Re: Reducing RTOS complexity on micro, "never global vars"
    « Reply #22 on: July 27, 2016, 12:47:35 am »
    Managing data flow from a communication peripheral can get very tricky as you have just found out. The solutions to this can take many forms and are generally dependant of what your application is trying to do. But you may be interested in the way it was done in a video processing application I worked on.
    It took video frames from a capture card and passed them on to numerous threads for simultaneous display in multiple places. Its job was to pass the frame to the following:
    • A  video display board (1 or more),
    • A software frame decoder for display in a window
    • A broadcast stream to Ethernet for display in a window on another PC
    All displays were exactly in sync.

    In a simple system where one thread gets the data and one processes it, the FIFO solution is the way to go, although you would really only need one thread as the other should actually be an ISR. But you still use a FIFO.
    If you need to share data across several threads then things are a bit more complicated. Again the solution depends of how you are going to use the data. Are you just going to act on its content or are you going to write back to the buffer that contains it. This affects whether you need to take a copy or not, but generally copies are bad if you have limited processing power.

    In my application I just had one provider and many consumers. The data was not changed in between them. In its simplest form the architecture was one ISR per input or output and a single thread joining them all together.  The whole thing was done using the simple message queue, one per consumer.
    So here’s how it was done.
    A number of buffers were created at power up and placed in a pool. The pool is global among the threads/ISRs  that need it. The buffers each have a reference counter and a count of how much data they  contain. They also had stuff like mutexes etc. as required.
    A Capture initialiser takes a buffer from the pool and configures the DMA to point at the buffer. From this point on the sequence is:

    • The Capture card ISR is called when DMA is complete. It places the ref to the buffer into the message queue, gets another buffer from the pool, sets up the next DMA and exits. Its about 5 line of code.
      In a UART driver you can use an empty pool to set the flow control.
    • The main buffer handler is at the end of the message queue and it has one output queue for each consumer, however many there are. Its job is to take the buffer (by its reference) from the input queue then set its reference count to the number of outputs and place a copy of the reference in each of the out queues.
    • The video card driver is a DMA driven ISR. It takes the next buffer in the queue and starts a DMA operation. The previous now used buffer has its reference counter decremented and if it is zero the buffer is put back in to the pool, completing the cycle. Only when all consumers have finished with the buffer is it released.
      Each consumer is slightly different, and may or may not copy the data at this point.

    There is very little code at each stage and the buffers flow round in a circle. basically you need one buffer for each stage in the sequence plus one. Most of the code is actually for setup, not execution.

    The advantages of this system are:
    The connections between stages are very simple so it is easy to add extra stages if required. I had up to six as the system was dynamically configurable.
    The buffers don't actually have to be as big as a whole data packet. In this case they were big enough for a whole video frame but in another system we had a scatter/gather DMA so each buffer only had a fragment of a frame.
    In a UART system with an undefined stream The ISR would gradually fill up the buffer and when it was full, post the buffer in the queue. There would need to be a timer to make sure slow data was always sent. So if after a few mS only a few characters were received the buffer would be sent any way. USB CDC (USB to com port) drivers work this way.

    I don’t know CMSISS RTX but a quick look shows that it has a memory pool facility that looks perfect for this job. You would just need to define a buffer. Maybe something like this
    Code: [Select]
    typedef struct {
    mutex_t lock;
    uint32_t RefCount;
    uint32_t DataCount;
    uint8_t data[BUFF_SIZE];;
    } poolBuffer_t;
     

    Offline jnzTopic starter

    • Frequent Contributor
    • **
    • Posts: 593
    Re: Reducing RTOS complexity on micro, "never global vars"
    « Reply #23 on: July 28, 2016, 11:40:27 pm »
    Technically if you create a mailbox in CMSIS, you are still defining a global object. But I think that is more acceptable, because any read/write action on that object is done via OS routines which should be thread safe, reentrant, etc.
    The problem with global integers etc is once you start writing to it from multiple places or have some thread priority 'issue' that causes other threads to work on old or corrupted data.
    Sort of. I don't really consider them global, it's still private OS access to them, but I get what you're saying with the rest of it.  That's the issue I'm having right now, it wasn't even multiple write, nor would it have mattered if I wasn't reading the exact latest data. I basically spent a ton of time and work protecting from this really pretty easy system. The issue is complexity went way up and now I have really similar issues with sync but all because I was really trying to avoid globals. It was dumb.


    This is exactly why you should avoid having parallel threads which together do one job. It is always going to end in a mess with potential deadlocks. Better rewrite the threads into a single thread which uses one or more statemachines.
    I see this so many times... people break up their code into modules which have no business being separated, and then contort their code into knots trying to get the modules to play nicely with each other.
    KISS; if you need to break code into threads you'll know it because it will simply not make a lot of sense to have it any other way. Until then... one process with interrupt handlers which do as absolutely little as possible. :-)

    I'm with you guys, except in this case I'm sticking with it. I have a SPI chip that's handling the power and communication, I also have a few SPI devices that are 100% different from that. It's NO ISSUE in my case to have a mutex handling SPI - it WOULD be an issue in my case to try and combine these because even if I did that, both threads need to get data to a separate thread that's doing a lot of the application work. I get what you're saying, but in this case it kinda makes no difference as I would still need global and local variables with syncing.

    That said.... I'll keep this in mind and try and thread only when it's absolutely necessary. I'm feeling like when I really want to write blocking code for simplicity that's when I'm going to prefer threads.



    I think you are right you have taken the "no globals!" too literally. What this so called rule is trying to avoid is the case where you have one variable for every feature of every peripheral etc.

    For your example, I hadn't really thought about defining the vars in Main() but I'm not sure that changes much. All the same issues with globals. But I need to read that example better so I may be missing something. One thing to note, is that right now none of my threads shutdown. They're all looping all the time, they "sleep" for periods, but for the most part I have yet to need to shut a thread down.

    I also need to read your next example about the video capture better - and I am using RTX - but like threads shutting down, this is a fairly deterministic system, I'm not creating and allocating pools because I'd have no reason to ever shut them down.
     

    Offline SimonR

    • Regular Contributor
    • *
    • Posts: 122
    • Country: gb
    Re: Reducing RTOS complexity on micro, "never global vars"
    « Reply #24 on: July 29, 2016, 10:44:17 am »
    Quote
    For your example, I hadn't really thought about defining the vars in Main() but I'm not sure that changes much. All the same issues with globals. But I need to read that example better so I may be missing something. One thing to note, is that right now none of my threads shutdown. They're all looping all the time, they "sleep" for periods, but for the most part I have yet to need to shut a thread down.

    My video example was dynamically re-configurable and ran on Linux so yes it did shutdown and the pools, variables  etc. were returned to the heap afterwards. However I used the exact same technique in a Wireless USB application and that was a true embedded system which never shutdown. And I do mean the exact same technique. If you are not shutting down then, even though lifetime of the thread/variable is forever, the start-up is the same.
    The general guidance for embedded application is to allocate storage once only at power up and avoid using Alloc() if possible. In fact if you are designing a high reliability system and using something like MISRA C then features like alloc are completely banned anyway.

    Quote
    I also need to read your next example about the video capture better - and I am using RTX - but like threads shutting down, this is a fairly deterministic system, I'm not creating and allocating pools because I'd have no reason to ever shut them down.

    By all means read my second example carefully, I am happy to draw a diagram or answer questions about it. You are definitely missing the point here though.
    The pool is created only once at start up and the buffers are created only once at start up then used to populate the pool. The technique relies on the flow of buffers round an imaginary loop. From the pool to the data source, through the processing stages to the data sinks and then back to the pool. It’s what makes sharing the same data across multiple threads possible and it facilitates flow control if that is required.
     
    You could manage buffers in other ways. For instance you could still create a number of buffers and then use some sort of reference list to manage the system and keep track of where the buffers are being used. You could use the mailbox, but that will result in a copy of the data for every thread it goes to whether a copy is needed or not. If you needed separate copies then it’s the right design choice.

    Using the pool in this case is NOT a memory allocation mechanism it’s a really neat way of managing the flow of data without writing any code.

    Going from a flat design with global data to a multi-threaded design for the first time can really mess with your head, it did me. I’m not suggesting that you change working code to match my example, it’s just an example you may find interesting for future reference.
     

    Offline jnzTopic starter

    • Frequent Contributor
    • **
    • Posts: 593
    Re: Reducing RTOS complexity on micro, "never global vars"
    « Reply #25 on: July 29, 2016, 08:57:52 pm »
    By all means read my second example carefully, I am happy to draw a diagram or answer questions about it. You are definitely missing the point here though.
    The pool is created only once at start up and the buffers are created only once at start up then used to populate the pool. The technique relies on the flow of buffers round an imaginary loop. From the pool to the data source, through the processing stages to the data sinks and then back to the pool. It’s what makes sharing the same data across multiple threads possible and it facilitates flow control if that is required.

    Trying to follow this here...

    If I have a struct like:

    Code: [Select]
    typedef struct {
    mutex_t lock;
    uint8_t data[BUFF_SIZE];;
    } NOT_A_poolBuffer_t;

    And I make that global and extern it into my other threads/files, so now I have my data global and an attached mutex in the struct. I lock that up during writes and reads. Let's assume I don't cause any crazy hangups with the mutex, this is a fairly simple application.

    How is this effectively different than using the pool features built into RTX?
     

    Offline SimonR

    • Regular Contributor
    • *
    • Posts: 122
    • Country: gb
    Re: Reducing RTOS complexity on micro, "never global vars"
    « Reply #26 on: July 30, 2016, 12:52:57 am »
    You are using the pool built in to the RTX, but you still have to define the objects that you put in the pool. In this case its buffers.

    I'll write some some example code to illustrate but it might take a couple of days.
     

    Offline SimonR

    • Regular Contributor
    • *
    • Posts: 122
    • Country: gb
    Re: Reducing RTOS complexity on micro, "never global vars"
    « Reply #27 on: August 01, 2016, 01:32:14 pm »
    First let me thank you for creating this topic as I wasn't aware of the CMSIS RTOS api before and it looks like its worth futher invesigation. I've also learned a lot from trying to explain things. I had forgotten how very quickly things can get complicated if you are not careful. You can end up explaining the structure of your program and not the technique you are using.
    You disn't actually say that you are using  CMSIS RTOS, I just assumed from the way the replies are worded that you are and from the fact that you used mailboxes. Anyway I've done my examples assuming that you are.
    There are however a couple of issues with using CMSIS RTOS for my solution to be aware but I'll explain those seperately so as not to confuse things.

    I've done 2 versions. One that uses Global variables as its simpler and and one reduces the number of globals.
     

    Offline SimonR

    • Regular Contributor
    • *
    • Posts: 122
    • Country: gb
    Re: Reducing RTOS complexity on micro, "never global vars"
    « Reply #28 on: August 01, 2016, 01:52:59 pm »
    In reading the CMSIS RTOS documentation I realised that the Mailbox is actually using the technique I am about to describe, it’s just that all the detail is hidden. The down side of it is that is doesn’t solve the problem of sending the same data to more than one thread as far as I can see.
    I started with the code example for osMessageCreate() about half way down the message queue documentation here:-

    https://www.keil.com/pack/doc/CMSIS/RTOS/html/group___c_m_s_i_s___r_t_o_s___message.html

    Code: [Select]
    #include "cmsis_os.h"
     
    osThreadId tid_thread1;                          // ID for thread 1
    osThreadId tid_thread2;                          // for thread 2
     
    typedef struct {                                 // Message object structure
      float    voltage;                              // AD result of measured voltage
      float    current;                              // AD result of measured current
      int      counter;                              // A counter value
    } T_MEAS;
     
    osPoolDef(mpool, 16, T_MEAS);                    // Define memory pool
    osPoolId  mpool;
    osMessageQDef(MsgBox, 16, T_MEAS);               // Define message queue
    osMessageQId  MsgBox;
     
    void send_thread (void const *argument);         // forward reference
    void recv_thread (void const *argument);         // forward reference
                                                     // Thread definitions
    osThreadDef(send_thread, osPriorityNormal, 1, 0);
    osThreadDef(recv_thread, osPriorityNormal, 1, 2000);
     
    //
    //  Thread 1: Send thread
    //
    void send_thread (void const *argument) {
      T_MEAS    *mptr;
     
      mptr = osPoolAlloc(mpool);                     // Allocate memory for the message
      mptr->voltage = 223.72;                        // Set the message content
      mptr->current = 17.54;
      mptr->counter = 120786;
      osMessagePut(MsgBox, (uint32_t)mptr, osWaitForever);  // Send Message
      osDelay(100);
     
      mptr = osPoolAlloc(mpool);                     // Allocate memory for the message
      mptr->voltage = 227.23;                        // Prepare a 2nd message
      mptr->current = 12.41;
      mptr->counter = 170823;
      osMessagePut(MsgBox, (uint32_t)mptr, osWaitForever);  // Send Message
      osThreadYield();                               // Cooperative multitasking
                                                     // We are done here, exit this thread
    }
     
    //
    //  Thread 2: Receive thread
    //
    void recv_thread (void const *argument) {
      T_MEAS  *rptr;
      osEvent  evt;
       
      for (;;) {
        evt = osMessageGet(MsgBox, osWaitForever);  // wait for message
        if (evt.status == osEventMessage) {
          rptr = evt.value.p;
          printf ("\nVoltage: %.2f V\n", rptr->voltage);
          printf ("Current: %.2f A\n", rptr->current);
          printf ("Number of cycles: %d\n", rptr->counter);
          osPoolFree(mpool, rptr);                  // free memory allocated for message
        }
      }
    }
     
    void StartApplication (void) {
      mpool = osPoolCreate(osPool(mpool));                 // create memory pool
      MsgBox = osMessageCreate(osMessageQ(MsgBox), NULL);  // create msg queue
       
      tid_thread1 = osThreadCreate(osThread(send_thread), NULL);
      tid_thread2 = osThreadCreate(osThread(recv_thread), NULL);
      :
    }

    It works exactly how the mailbox works except the message queue is visible. I took this example and modified it to share data across 3 threads without the need to copy. I also change the names of variables to match my buffer concept.
    Code: [Select]
    #include "cmsis_os.h"
     
    osThreadId tid_threadSource;                     // ID for main thread (data source)
    osThreadId tid_thread1;                          // for thread 1
    osThreadId tid_thread2;                          // for thread 2
    osThreadId tid_thread3;                          // for thread 3
     
    typedef struct poolBuffer_s {
        osMutexId  RefCountLock;
        uint32_t RefCount;
        uint32_t DataCount;
        uint8_t Data[BUFF_SIZE];
    } poolBuffer_t;
     
    osPoolDef(buffer_pool, 16, poolBuffer_t);       // Define a pool of buffers
    osPoolId  buffer_pool;
    osMessageQDef(MsgBox1, 16, poolBuffer_t);        // Define message queue1
    osMessageQId  MsgBox1;
    osMessageQDef(MsgBox2, 16, poolBuffer_t);        // Define message queue2
    osMessageQId  MsgBox2;
    osMessageQDef(MsgBox3, 16, poolBuffer_t);        // Define message queue3
    osMessageQId  MsgBox3;
     
    void send_thread (void const *argument);         // forward reference
    void recv_thread1 (void const *argument);         // forward reference
                                                     // Thread definitions
    osThreadDef(send_thread, osPriorityNormal, 1, 0);
    osThreadDef(recv_thread1, osPriorityNormal, 1, 1000);
    osThreadDef(recv_thread2, osPriorityNormal, 1, 1000);
    osThreadDef(recv_thread3, osPriorityNormal, 1, 1000);

    //
    //  Thread 1: Send thread
    //
    void send_thread (void const *argument) {
      poolBuffer_t    *buffptr;
     
        for(;;)
        {
            if(checkDataReady == true)
            {
                buffptr = osPoolAlloc(buffer_pool);           // Allocate memory for the message
                buffer->RefCount = 3;                         // we are sending to 3 threads don’t need a mutex here as we haven’t set it yet
                    // Set the message content here maybe a video frame
                osMessagePut(MsgBox1, (uint32_t)buffptr, osWaitForever);  // Send Message to tread 1
                osMessagePut(MsgBox2, (uint32_t)buffptr, osWaitForever);  // Send same Message to tread 2
                osMessagePut(MsgBox3, (uint32_t)buffptr, osWaitForever);  // Send same Message to tread 3
                osThreadYield(); 
            }                             // Cooperative multitasking
        }
    }

    void myBufferFree(poolBuffer_t *buffer)
    {
        osMutexWait(buffer->RefCountLock, osWaitForever);
        buffer->RefCount--;
        if(buffer->RefCount ==0)
        {
            osPoolFree(buffer_pool, buffer);
        }
        osMutexRelease(buffer->RefCountLock);
    }
     
    //
    //  Thread 1: Receive thread1
    //
    void recv_thread1 (void const *argument) {
      poolBuffer_t  *bufferReference;
      osEvent  evt;
       
      for (;;) {
        evt = osMessageGet(MsgBox1, osWaitForever);  // wait for message
        if (evt.status == osEventMessage) {
          bufferReference = evt.value.p;
          // process the buffer here eg give data to video card
          myBufferFree(bufferReference);                  // free memory allocated for message
        }
      }
    }
    //
    //  Thread 2: Receive thread2
    //
    void recv_thread2 (void const *argument) {
      poolBuffer_t  *bufferReference;
      osEvent  evt;
       
      for (;;) {
        evt = osMessageGet(MsgBox2, osWaitForever);  // wait for message
        if (evt.status == osEventMessage) {
          bufferReference = evt.value.p;
          // process the buffer here eg save videodata to a file
          myBufferFree(bufferReference);                  // free memory allocated for message
        }
      }
    }
    //
    //  Thread 3: Receive thread2
    //
    void recv_thread3 (void const *argument) {
      poolBuffer_t  *bufferReference;
      osEvent  evt;
       
      for (;;) {
        evt = osMessageGet(MsgBox3, osWaitForever);  // wait for message
        if (evt.status == osEventMessage) {
          bufferReference = evt.value.p;
          // process the buffer here eg send video date to ethernet
          myBufferFree(bufferReference);                  // free memory allocated for message
        }
      }
    }


    void StartApplication (void) {
      buffer_pool = osPoolCreate(osPool(buffer_pool));                 // create memory pool
      MsgBox1 = osMessageCreate(osMessageQ(MsgBox1), NULL);  // create msg queue
      MsgBox2 = osMessageCreate(osMessageQ(MsgBox3), NULL);  // create msg queue
      MsgBox3 = osMessageCreate(osMessageQ(MsgBox4), NULL);  // create msg queue
       
      tid_threadSource = osThreadCreate(osThread(send_thread), NULL);
      tid_thread1 = osThreadCreate(osThread(recv_thread1), NULL);
      tid_thread2 = osThreadCreate(osThread(recv_thread2), NULL);
      tid_thread3 = osThreadCreate(osThread(recv_thread3), NULL);
    }

    As you can see its almost exactly the same except that we send the same message to 3 queue instead of 1. Each thread decrements the reference count before returning the buffer to the pool. And that’s it. I have written a higher level buffer free function so that the mutex code only appears in one place.
    There is an issue in this code to do with initialising the pool but I don’t want to complicate things yet.
     

    Offline SimonR

    • Regular Contributor
    • *
    • Posts: 122
    • Country: gb
    Re: Reducing RTOS complexity on micro, "never global vars"
    « Reply #29 on: August 02, 2016, 11:42:20 am »
    Considerations when using CMSIS RTOS API

    CMSIS RTOS API is supposed to be an implementation independent API to sit on top of whatever RTOS you choose to use. As a result C macros are used to create the structure for a given application.
    For instance to create a message queue you would use the #define osMessageQDef like this, as per the example in the documentaion.
    Code: [Select]
    osMessageQDef(MsgBox, 16, T_MEAS);               // Define message queue
    osMessageQId  MsgBox;
    void main(void)
    {
    }

    The #define is implementation dependant. Which means that if you wanted to include one as part of another structure for example you will find that it may or may not compile. It effectively means that every RTOS component has to be defined as global.
    My requirement for a mutex is defined like this.
    Code: [Select]
    typedef struct poolBuffer_s {
        osMutexId  RefCountLock;
        uint32_t RefCount;
        uint32_t DataCount;
        uint8_t Data[BUFF_SIZE];
    } poolBuffer_t;
    But using CMSIS RTOS should if be defined like this
    Code: [Select]
    typedef struct poolBuffer_s {
        osMutexDef( RefCountLock);
        uint32_t RefCount;
        uint32_t DataCount;
        uint8_t Data[BUFF_SIZE];
    } poolBuffer_t;
    We can’t guarantee that the macro will result in correct code unless we know that the implementation underneath is correct.

    Why is this important?
    Well in my buffer pool example the buffers need to be initialised when they are put into the pool.

    In my pthreads application the pool creation code looks like this.
    Code: [Select]
    Generic_Pool_t * CreateBufferPool(void)
    {
        Generic_Pool_t *bufPool = PoolCreate();
        for(int count = 0; count < 10; count++)
        {
            Buffer_t * newBuffer = BufferCreate(bufferSize);
            newBuffer->Lock = MutexCreate();
            PoolAdd(newBuffer);
        }
        Return buffPool;
    }
    The problem with the MemoryPool osPoolCreate is that it creates the pool and allocates the space for each object in the pool but the objects are not initialised. As you can see it’s important to for the buffers to be initialised before they are added to the pool because they each have their own mutex.
    To initialise them in CMSIS RTOS you would have to create the pool then take every single object out of the pool, initialise them and then put them back in.
     


    Share me

    Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
    Smf