Author Topic: Do i really need to use an RTOS? (Alternatives to finite state machines?) (Read 15665 times)

newbrain · « **Reply #100 on:** July 07, 2022, 02:51:43 pm »

Quote

LOVE treating so many things as separate programs. I have no doubt that adding changes will be easier.

QFT, parole sante.
These are exactly the two things I (as a hobbyist, remember) find are the best advantage of using an RTOS.

Different functions of the system (e.g. HMI and data processing) can be treated as completely separate - once the right priorities have been assigned and the amount of resources used verified.
I am sure that if a new audio frame has arrived, the processing will happen in time - even if the GUI might slow down.

And, for the same reason, changing/adding functions is easier.

If resources are limited, of course, the overhead might be too much.

Some personal examples:

* Linear PSU - HD44780 4×20 display, 2 encoders, 2 buttons, V,I,P measurements, OCP with programmable delay, I & V DACs control etc.: with an STM042k6 (6KB RAM) I went for a superloop. Did not even try with FreeRTOS. Changing something can be done, but it needs a lot of care.

* AD9834 based sine generator: on a STM32F072 FreeRTOS allows me to add both interface (parallel 480×320 TFT, one encoder) and real time (Lastly: a couple of hours to add SSB modulation) tasks without too much hassle.

* iMX RT1021 SDR: Here FreeRTOS shines, the audio processing chain (FIRs, FFT, various demodulations) is served at the highest priority, UI - two encoders, some buttons, SPI 480×320 TFT - at (almost) the lowest. Other tasks might be in the middle (slow decoders such as RTTY, MORSE etc.). An FFT+waterfall display was easily added without touching the rest.

I found I'm usually more comfortable with pre-emption with no time slicing.

brucehoult · « **Reply #101 on:** July 07, 2022, 11:42:42 pm »

Quote from: newbrain on July 07, 2022, 02:51:43 pm

I found I'm usually more comfortable with pre-emption with no time slicing.

That's fine if you have 0 or 1 CPU-bound tasks.

More than 1 and you *need* either time slicing or yield() calls in your loops.

If you already have pre-emption rather than simply interrupt handlers ... i.e. an interrupt handler can cause control to return to a different (higher priority, newly unblocked) task than the one that was interrupted ... then time slicing is nothing more extra than a clock interrupt and a fairness algorithm.

westfw · « **Reply #102 on:** July 08, 2022, 12:17:51 am »

Quote

you *need* either time slicing or yield() calls in your loops.

Or other calls that implicitly yield...

SiliconWizard · « **Reply #103 on:** July 08, 2022, 12:27:16 am »

Well uh. A preemptive scheduler is uh... preemptive. It must be able to preempt. Sure it can do so by other means than a timer, but you still need something that preempts, externally from the tasks themselves. If tasks are solely responsible for "yielding", that's not preemptive scheduling, that's merely cooperative.

Now of course with a preemptive scheduler, tasks can always "yield" earlier than their allocated time slot if they have nothing more to do (usually waiting on some event) - which is the only way of making a scheduler work at all if the number of tasks times the time slot is greater than 100%. So in that regard, the time slot is the *max* time slot for a given task. But there still needs to be an external event that can preempt. If it's not a timer, it needs to be something else, but not something that would entirely depend on each task, otherwise you're again not in the preemptive territory anymore.

I haven't looked at everything FreeRTOS offered, for instance, but I've implemented a preemptive scheduler with which you can define a different (max) time slot for each task, instead of a fixed one. I haven't checked whether you could do that with FreeRTOS.

Siwastaja · « **Reply #104 on:** July 08, 2022, 06:13:03 am »

Quote from: newbrain on July 07, 2022, 02:51:43 pm

Quote
LOVE treating so many things as separate programs. I have no doubt that adding changes will be easier.
QFT, parole sante.
These are exactly the two things I (as a hobbyist, remember) find are the best advantage of using an RTOS.
...
I found I'm usually more comfortable with pre-emption with no time slicing.

The feelings of you two sound pretty much similar to the enjoyment I was having when I realized, on Cortex-M, that you don't have to set some flags and sequentially process them in a superloop, but you can make a fully ISR-driven design, with pre-emption with IRQ priorities, plus software interrupts to trigger lower priority interrupts from higher ones.

Everything just... works, running in parallel. Modules can be treated as "separate programs", and state machine is implemented as just functions, triggered by something - usually HW timer or other peripheral. No problem doing "long" ISRs - just make more important stuff higher priority, and pre-emption works.

And it doesn't need to be a generic 1ms tick, and you don't need to write any kind of scheduler, at all, the CPU can take care of it: just set a timer peripheral for whatever delay you actually need, and make it trigger the next state function directly.

The most obvious advantages for the OS would be addition of time-slicing, and the fact a trigger can bring the program flow in the middle of a function (wait for event / semaphore call). With simple ISR-based design, while long pieces of code are possible and can be pre-empted, trigger mechanisms always bring you to the start of the function. I don't think this is a bad thing, having trigger mechanisms coupled with handler functions improves readability, like in JavaScript you use a small onClick() handler instead of some massive loop where you wait-for-click.

tggzzz · « **Reply #105 on:** July 08, 2022, 09:06:26 am »

Quote from: Siwastaja on July 08, 2022, 06:13:03 am

Quote from: newbrain on July 07, 2022, 02:51:43 pm
Quote
LOVE treating so many things as separate programs. I have no doubt that adding changes will be easier.
QFT, parole sante.
These are exactly the two things I (as a hobbyist, remember) find are the best advantage of using an RTOS.
...
I found I'm usually more comfortable with pre-emption with no time slicing.

The feelings of you two sound pretty much similar to the enjoyment I was having when I realized, on Cortex-M, that you don't have to set some flags and sequentially process them in a superloop, but you can make a fully ISR-driven design, with pre-emption with IRQ priorities, plus software interrupts to trigger lower priority interrupts from higher ones.

Everything just... works, running in parallel.

Except that they are't actually running in parallel, and there can be "gotchas" that appear in very well disciplined high-reliability environments. Start by understanding why the first space shuttle launch attempt (1981-4-10) was scrubbed and the software patch installed before thr first launch occurred (1981-4-12).

Quote

Modules can be treated as "separate programs", and state machine is implemented as just functions, triggered by something - usually HW timer or other peripheral. No problem doing "long" ISRs - just make more important stuff higher priority, and pre-emption works.

If and only if you get everything right. If not, livelock (for example) can occur. Understand why the Mars Pathfinder computer repeatedly reset itself, had to be remotely debugger, and a config bit changed. (That also showed the value of logging events+states in the running system; don't turn debugging off!).

There is a lot of well-understood theory about RTOSs, and practical examples of the subtle intermittent problems that do occur when the theory is ignored.

Summary: no, they don't "just work" - but they might appear to "just work".

Siwastaja · « **Reply #106 on:** July 08, 2022, 10:27:07 am »

Problem of managing shared resources and proving availability of CPU resources with worst-case event rates, and limiting the rate events can happen, are common problem to OS- or bare metal design. Just the tools available and terminology are somewhat different. There is no silver bullet to it.

Parallelism is notoriously difficult to get right. And in my opinion, purely event-driven code is easier to understand and get right than time-slicing linear code with a lot of mutexes. And, of course, avoiding sharing of resources by design as much as possible. FIFOs are generally a better idea than sharing a resource with mutexes, for example.

Learning from past mistakes is always a good idea. For example the classic "computer overload" Apollo thing which was basically equivalent of wiring an interrupt pin to a CPU from an external source which is not proven to limit the minimum interrupt period. The takeaway is, if there is any uncertainty about the minimum period (or even signal integrity causing false edges), pattern where that IRQ is temporarily disabled to be re-enabled by timer IRQ can be used to limit worst case rate.

tggzzz · « **Reply #107 on:** July 08, 2022, 10:39:29 am »

Quote from: Siwastaja on July 08, 2022, 10:27:07 am

Parallelism is notoriously difficult to get right. And in my opinion, purely event-driven code is easier to understand and get right than time-slicing linear code with a lot of mutexes. And, of course, avoiding sharing of resources by design as much as possible. FIFOs are generally a better idea than sharing a resource with mutexes, for example.

That's pretty much my belief.

Event->interrupt->capture event put in fifo->return from interrupt. Forever loop: wait until event in fifo->process event to completion. Completion can be putting an event in the same queue or sending it to another FSM for processing.

That is simple and, coupled with the half-sync-half-async design pattern, is efficient and predictable. It does require suitable events that can allow "processing event to completion", but that is usually not a major problem in real-world systems.

tellurium · « **Reply #108 on:** July 08, 2022, 10:47:06 am »

Quote from: Siwastaja on July 08, 2022, 06:13:03 am

Everything just... works, running in parallel. Modules can be treated as "separate programs", and state machine is implemented as just functions, triggered by something - usually HW timer or other peripheral. No problem doing "long" ISRs - just make more important stuff higher priority, and pre-emption works.

I'd like to see a simple project, e.g. blinky with UART based control (e.g. to change blink intervals, blink counts, or pwm), implemented in 3 paradigms:
1. superloop
2. os
3. ISRs with priorities

Each approach should not use any external dependency and be as small and simple as possible.

tggzzz · « **Reply #109 on:** July 08, 2022, 02:53:12 pm »

Quote from: tellurium on July 08, 2022, 10:47:06 am

Quote from: Siwastaja on July 08, 2022, 06:13:03 am
Everything just... works, running in parallel. Modules can be treated as "separate programs", and state machine is implemented as just functions, triggered by something - usually HW timer or other peripheral. No problem doing "long" ISRs - just make more important stuff higher priority, and pre-emption works.

I'd like to see a simple project, e.g. blinky with UART based control (e.g. to change blink intervals, blink counts, or pwm), implemented in 3 paradigms:
1. superloop
2. os
3. ISRs with priorities

Each approach should not use any external dependency and be as small and simple as possible.

While that sounds like good idea, just about any approach, e.g. assembler, is sufficient for a trivial project.

The problem is understanding how architectures do and don't scale as the number of inputs and outputs increases, the number of souces of inputs and outputs increaces, interaction between them becomes trickier, processing complexity increases, asynchronous vs synchronous APIs are encountered, extra requirements are added late in the implementation process, years elapse and people forget, new people are added to a project, etc.

The value of "hello world" and "blinky" projects is that they are a first step in implementing ~~something~~ anything in your chosen architecture, to gain confidence that you have your toolchain working from end to end. They do not and cannot show the strengths/weaknesses of architectures.

Siwastaja · « **Reply #110 on:** July 08, 2022, 04:07:21 pm »

Portability is also a point. If you base your project on the premise of having an interrupt controller with pre-emptive priorized interrupts, then it won't be trivial to port it to a simpler microcontroller that does not have this feature. OS has the benefit that as long as the same OS is ported on the different platform, basic concurrency features work without any porting. Though, the hardware reality still leaks through abstractions: performance can differ a lot depending on what HW resources are available to the OS.

tellurium · « **Reply #111 on:** July 08, 2022, 05:20:59 pm »

I actually see a huge value is such a project.

Many people are accustomed to one/two approaches only - they just got familiar with it, and stuck to it even if another approach is better for the task. And I know that working examples is a big deal. That's one of the best learning tools in software engineering.

The main goal is to show the code structure and execution flow, cause THAT is what differentiate approaches, not scalability/portability/whatever. Even implemented on a single arch, e.g. on the ubiquitous STM32 bluepill.

Siwastaja · « **Reply #112 on:** July 08, 2022, 06:17:22 pm »

Quote from: tellurium on July 08, 2022, 05:20:59 pm

I actually see a huge value is such a project.

Many people are accustomed to one/two approaches only - they just got familiar with it, and stuck to it even if another approach is better for the task. And I know that working examples is a big deal. That's one of the best learning tools in software engineering.

The main goal is to show the code structure and execution flow, cause THAT is what differentiate approaches, not scalability/portability/whatever. Even implemented on a single arch, e.g. on the ubiquitous STM32 bluepill.

It would be a good demonstration, i.e., an explanation in form of code what these paradigms actually mean, but due to points tggzzz raised, obviously not a totally fair comparison. You can't find a "winner" that way.

1 and 3 also tend to mix up to some kind of hybrid. Also note how even within 2 (the OS), there are many options, like superloop/select()/poll() vs. threads.

Personally, I use all three approaches quite equally, but in different types of projects.

Nominal Animal · « **Reply #113 on:** July 08, 2022, 06:56:58 pm »

Quote from: Siwastaja on July 08, 2022, 06:13:03 am

And it doesn't need to be a generic 1ms tick, and you don't need to write any kind of scheduler, at all, the CPU can take care of it: just set a timer peripheral for whatever delay you actually need, and make it trigger the next state function directly.

For tasks at the same priority level, a single timer and a binary min-heap to hold the next firing time works well. I use it frequently in Linux, for all sorts of timeouts.

(For those who are unaware, given uniform random keys, such a binary min-heap only does an average of e ~ 2.7 percolations per insert/deleteMin, the data representation is a simple array, and on an X-bit arch, if you can live with say (X-N)-bit timestamps and use the low N bits to indicate the cause, you can treat the combined value as a X-bit timestamp, i.e. no masking needed. Whenever the actual interrupt fires, you dispatch all tasks that have elapsed thus far, so the implicit ordering of the N-bit causes among same timestamps only orders the dispatching during the same interrupt, which would usually happen anyway. The only "dirty" bit is that you need to handle the case when the interrupt would fire almost instantly in the previous interrupt, to cater to the interrupt setup overhead, so there is one "critical" interrupts-disabled section for a few cycles, where you check ARM_DWT_CYCCNT and if it has not yet progressed too far, arm the next interrupt –– note that this assumes a single-shot timer with cheap next firing interval setup. But otherwise the code is simple, clean, lightweight, and concise to implement.)

As mentioned by others above, this is very much co-operative time-sharing: each invocation must complete within an acceptable number of cpu cycles, and they do not really run in parallel at all. It is very much an event-driven asynchronous model, which makes things easier if you approach the development from that angle (instead of thread/task/procedural angle).

Now, the only thing one really needs to be able to switch between tasks with the same privileges ("userspace threads") or coroutines, is to switch the stack and the register file. C does specify setjmp()/longjmp(), but the jmp_buf structure is opaque.

After recent experimentation (and using terminology where the 'bottom' of the stack is where the stack is empty, and 'top' is where new data is added, even for downwards-growing stacks), I've found that if each 'stack' actually consists of register file storage before/just beyond/outside the bottom of the stack, storing the register state when the stack is not in use, it is very compatible with freestanding C/C++, and you get extremely lightweight same-privilege task switches, and you can use a pointer to the 'stack' (really, the register file storage of that stack)' as the task identifier. Even yield() calls simplify to (store register state to the file before the bottom of the stack, and look for something else to run). It is not an RTOS, but it might be useful for bare-metal developers for cooperative multitasking and pre-emptive multitasking (if they write their own task scheduler), and also for educational purposes.

You can even add additional fields there, say one describing the priority of whatever the task is doing now (with peripheral accesses bounded by setting that to "critical, please don't yield"), without disabling interrupts. Then, the scheduler noting that the task is "critical", could simply set another flag, check that the task is not stuck, and let the task progress. When the next time slice elapses, or the task resets the priority to non-critical, it auto-yields due to the scheduler-set flag.

(You do need to replace some/most of newlib (the base C library you use), though. Which doesn't bother me, because I do want to replace it with something better; I have a topic about that here somewhere already.)

What has stopped me thus far from even starting this as a proper project, is stack safety/collisions. I really don't have any tools (except for hardware ones, like MMU or protection registers) to ensure tasks won't overflow their stacks... I just get very unsure when having to just "trust" code to behave well $:-\$

TC · « **Reply #114 on:** July 08, 2022, 10:10:10 pm »

Disclaimer... I haven't read every reply to this topic. But the posted question was about coding state machines.

I found this book to be an excellent resource on the topic...

Practical UML Statecharts in C/C++ - Miro Samek

rstofer · « **Reply #115 on:** July 08, 2022, 10:51:18 pm »

Quote from: Nominal Animal on July 08, 2022, 06:56:58 pm

What has stopped me thus far from even starting this as a proper project, is stack safety/collisions. I really don't have any tools (except for hardware ones, like MMU or protection registers) to ensure tasks won't overflow their stacks... I just get very unsure when having to just "trust" code to behave well $:-\$

FreeRTOS can do a stack check before it dispatches a task:

https://www.freertos.org/Stacks-and-stack-overflow-checking.html

rstofer · « **Reply #116 on:** July 08, 2022, 11:18:13 pm »

Quote from: tellurium on July 08, 2022, 10:47:06 am

Quote from: Siwastaja on July 08, 2022, 06:13:03 am
Everything just... works, running in parallel. Modules can be treated as "separate programs", and state machine is implemented as just functions, triggered by something - usually HW timer or other peripheral. No problem doing "long" ISRs - just make more important stuff higher priority, and pre-emption works.

I'd like to see a simple project, e.g. blinky with UART based control (e.g. to change blink intervals, blink counts, or pwm), implemented in 3 paradigms:
1. superloop
2. os
3. ISRs with priorities

Each approach should not use any external dependency and be as small and simple as possible.

While not exactly what you want, FreeRTOS comes with a LOT of ports and if the board is also 'mbed' compatible there are even more ports at mbed.org.

Try to pick a chip with an ARM NVIC (Nested Vectored Interrupt Controller) peripheral. Write your interrupt handlers as simple C functions.

Here is the code to set up the NVIC on an LPC1768

Code: [Select]

    
NVIC_SetVector(SPI_IRQn, (uint32_t) &spi_slave_handler);
NVIC_SetPriority(SPI_IRQn, 1);
NVIC_EnableIRQ(SPI_IRQn);

Here is the beginning code for 'spi_slave_handler()' Note that it is an ordinary function with no special attributes.

Code: [Select]

void spi_slave_handler(void) {
    unsigned char status;
    unsigned char value;
    ...

westfw · « **Reply #117 on:** July 09, 2022, 12:49:26 am »

Quote

I'd like to see a simple project, e.g. blinky with UART based control (e.g. to change blink intervals, blink counts, or pwm), implemented in 3 paradigms

The thing is, "simple projects" are really easy. Short, well-behaved "tasks" with no resource contention - no problem.
It's when they get more complicated that you run into trouble.

For instance, I recently have had cause to look at the "micros()" implementation on rp2040 in Arduino, using the Philhower core vs the Arduino (MBed) core.
As it turns out, there's a hardware timer that counts microseconds in a 64bit register, so all the code really needs to do is return the low 32bits of that counter.
The Philhower core calls the SDK function to get the 64bit time, and returns 32bits of it. That's ... fair.

The MBed core has decided that "time" is a protected OS resource, and has multiple levels of "protections" to make sure there are no conflicts in accessing it. There may be additional protections to prevent conflicts between the two CPUs. As a result, micros() takes about 4 us to execute (on a 120MHz CPU)! That's AWFUL. And yet, the problem statement "clock values should be consistent across multiple tasks" isn't an obviously awful prerequisite...

brucehoult · « **Reply #118 on:** July 09, 2022, 03:58:39 am »

Quote from: westfw on July 09, 2022, 12:49:26 am

As a result, micros() takes about 4 us to execute (on a 120MHz CPU)! That's AWFUL.

Ugh! I just tried on an Uno, calling micros() 400 times in a loop and storing the result in an unsigned long array (i.e. 4 bytes).

It normally incremented by 4 us each time, but about once in 60-70 times got the same micros() result twice in a row.

Getting a pointer to the array outside the loop increased the "same twice in a row" frequency to once in 16 times. Storing only the low 8 bits of micros() increased the "same twice in a row" frequency to once in 7 times.

So, yeah, definitely being limited by the 16 MHz CPU and needing 4 instructions to move a 32 bit value around. The loop is taking a little less than 64 clock cycles to execute.

Trying exactly the same sketch on the HiFive1 (FE310 RISC-V from 2016), also running at 16 MHz (but a 32 bit CPU), the result of micros() changes by 1 unit every call, except once in 16 calls the same value is returned twice. So the loop takes very slightly less than 1 us to execute -- 15 clock cycles in fact.

With the clock changed to 256 MHz, 376 out of 400 calls to micros() return the same value as the previous call. The micros() value increments by only 23 in 400 calls, so the loop executes in 57.5 ns or 15 clock cycles.

The loop calling micros() looks like:

Code: [Select]

2040015c:       64040993                addi    s3,s0,1600  // note: s0 was already gp-2020, the address of a[]
20400160:       81c18a13                addi    s4,gp,-2020

20400164:       28f1                    jal     20400240 <micros>
20400166:       00aa2023                sw      a0,0(s4)
2040016a:       0a11                    addi    s4,s4,4
2040016c:       ff3a1ce3                bne     s4,s3,20400164 <loop+0x6c>

And micros() looks like:

Code: [Select]

20400240 <micros>:
20400240:       b8002573                csrr    a0,mcycleh
20400244:       b00027f3                csrr    a5,mcycle
20400248:       b8002773                csrr    a4,mcycleh
2040024c:       fee51ae3                bne     a0,a4,20400240 <micros>
20400250:       01851293                slli    t0,a0,0x18
20400254:       0087d313                srli    t1,a5,0x8
20400258:       0062e533                or      a0,t0,t1
2040025c:       8082                    ret

So that gets the 64 bit cycle count, divides it by 256, and returns the lower 32 bits: the hi 24 bits of the lo word, shifted right by 8, combined with the lo 8 bits of the hi word, shifted left by 24 (0x18).

In total, 12 instructions in 15 cycles because the three CSR reads take 2 cycles each.

My Arduino code (identical for Uno and HiFive1):

Code: [Select]

#define N 400
unsigned long a[N];

void setup() {
  pinMode(LED_BUILTIN, OUTPUT);
  Serial.begin(115200);
  delay(100);
  Serial.println("Starting");
}

int first = 1;

void loop() {
  if (first){
    Serial.println("first");
    unsigned long *p = a;
    int dups = 0;
    for (int i=0; i<N; ++i) p[i] = micros();
    for (int i=0; i<N; ++i){
      Serial.println((unsigned long)p[i]);
      if (i>0 && p[i] == p[i-1]){
        Serial.println("====");
        ++dups; 
      }
    }
    Serial.print("Dups = ");
    Serial.println(dups);
    first = 0;
  }
  digitalWrite(LED_BUILTIN, HIGH);   // turn the LED on (HIGH is the voltage level)
  delay(1000);                       // wait for a second
  digitalWrite(LED_BUILTIN, LOW);    // turn the LED off by making the voltage LOW
  delay(1000);                       // wait for a second
}

westfw · « **Reply #119 on:** July 09, 2022, 08:58:25 am »

Quote

I just tried on an Uno ...
It normally incremented by 4 us each time, but about once in 60-70 times got the same micros() result twice in a row.

On an Uno, the micros() function only has a resolution of 4us. So the function itself (which does 32 math on the timer0 overflow count and the timer register) is taking slightly less than 4us to execute. 60-odd clocks including the loop and store doesn't seem unreasonable.

Code: [Select]

unsigned long micros() {
        unsigned long m;
        uint8_t oldSREG = SREG, t;
 3b8:   3f b7           in      r19, 0x3f       ; 63
        
        cli();
 3ba:   f8 94           cli
        m = timer0_overflow_count;
 3bc:   80 91 39 01     lds     r24, 0x0139     ;  <timer0_overflow_count>
 3c0:   90 91 3a 01     lds     r25, 0x013A     ;  <timer0_overflow_count+0x1>
 3c4:   a0 91 3b 01     lds     r26, 0x013B     ;  <timer0_overflow_count+0x2>
 3c8:   b0 91 3c 01     lds     r27, 0x013C     ;  <timer0_overflow_count+0x3>
        t = TCNT0;
 3cc:   26 b5           in      r18, 0x26       ; 38
        if ((TIFR0 & _BV(TOV0)) && (t < 255))
 3ce:   a8 9b           sbis    0x15, 0 ; 21
 3d0:   05 c0           rjmp    .+10            ; 0x3dc <micros+0x24>
 3d2:   2f 3f           cpi     r18, 0xFF       ; 255
 3d4:   19 f0           breq    .+6             ; 0x3dc <micros+0x24>
                m++;
 3d6:   01 96           adiw    r24, 0x01       ; 1
 3d8:   a1 1d           adc     r26, r1
 3da:   b1 1d           adc     r27, r1
        SREG = oldSREG;
 3dc:   3f bf           out     0x3f, r19       ; 63
        return ((m << 8) + t) * (64 / clockCyclesPerMicrosecond());
 3de:   ba 2f           mov     r27, r26
 3e0:   a9 2f           mov     r26, r25
 3e2:   98 2f           mov     r25, r24
 3e4:   88 27           eor     r24, r24
 3e6:   bc 01           movw    r22, r24
 3e8:   cd 01           movw    r24, r26
 3ea:   62 0f           add     r22, r18
 3ec:   71 1d           adc     r23, r1
 3ee:   81 1d           adc     r24, r1
 3f0:   91 1d           adc     r25, r1
 3f2:   42 e0           ldi     r20, 0x02       ; 2
 3f4:   66 0f           add     r22, r22
 3f6:   77 1f           adc     r23, r23
 3f8:   88 1f           adc     r24, r24
 3fa:   99 1f           adc     r25, r25
 3fc:   4a 95           dec     r20
 3fe:   d1 f7           brne    .-12            ; 0x3f4 <micros+0x3c>
}
 400:   08 95           ret

For the same functionality to take the same time on a 32bit processor with a hardware counter counting in the right units is disgraceful!

Nominal Animal · « **Reply #120 on:** July 09, 2022, 08:59:37 am »

Quote from: rstofer on July 08, 2022, 10:51:18 pm

Quote from: Nominal Animal on July 08, 2022, 06:56:58 pm
What has stopped me thus far from even starting this as a proper project, is stack safety/collisions. I really don't have any tools (except for hardware ones, like MMU or protection registers) to ensure tasks won't overflow their stacks... I just get very unsure when having to just "trust" code to behave well $:-\$
FreeRTOS can do a stack check before it dispatches a task:

So can I, but 1) how much is enough stack space, and 2) that's at only a point in the task time slice.

Making hand-wavy guesses how much stack is actually needed feels.. icky. I can instrument the stack (fill with 0xdeadbeef) and provide functions to check how far the stack was meddled with. It shows that I'm used to having an MMU (and virtual memory)!

brucehoult · « **Reply #121 on:** July 09, 2022, 09:18:42 am »

Quote from: westfw on July 09, 2022, 08:58:25 am

Quote
I just tried on an Uno ...
It normally incremented by 4 us each time, but about once in 60-70 times got the same micros() result twice in a row.
On an Uno, the micros() function only has a resolution of 4us. So the function itself (which does 32 math on the timer0 overflow count and the timer register) is taking slightly less than 4us to execute.

That's ... exactly what I said.

Quote

60-odd clocks including the loop and store doesn't seem unreasonable.

As I said, it needs four instructions to manipulate a 32 bit value, so yes that seems reasonable, for an 8 bit CPU at 16 MHz.

Quote

For the same functionality to take the same time on a 32bit processor with a hardware counter counting in the right units is disgraceful!

Depends on the clock rate.

Taking a little under 1 µs (four times faster) on a 32 bit processor (E31) that is running at the same 16 MHz as the AVR seems also fine.

Taking 4 µs on a 32 bit processor running at 120 MHz would indeed be awful. 480 clock cycles. Hard to see how you could even do that.

The E31, as noted, scales perfectly from 15/16 µs at 16 MHz to 15/256 µs at 256 MHz.

Nominal Animal · « **Reply #122 on:** July 09, 2022, 05:44:18 pm »

Quote from: evb149 on July 09, 2022, 04:06:21 pm

One can often enable some kinds of static as well as dynamic / instrumented analysis / checking to look for stack overflows instead of or in addition to whatever kinds of stack safety may be able to be garnered by use of the MMU or memory / page / whatever protection mechanisms that may exist:

True. Because I usually work in a hosted (full OS with MMU and virtual memory), I didn't know GCC supports -fstack-limit-symbol=sym, which I can probably synthesize. Thanks for pointing it out! Now I have no excuses left...

Welp, I think I next need to do some extensive testing to find out whether on ARM limiting the choice of stack alignments to a power of two (so that by masking the current stack pointer by 2ⁿ-1 yields the smallest allowed stack address), or reserving a register for this (via -ffixed-reg -fstack-limit-register=reg; the register then acts as both the stack limit, and a base address (minus a compile-time constant) to the task structure containing the register file), makes more sense. It definitely sounds intriguing, and the latter seems much more doable; but again, one must recompile all libraries, including HAL and the C library one uses, with those flags specified for this to work.

Quote from: evb149 on July 09, 2022, 04:36:20 pm

Speaking of FSMs, RTOS, embedded systems architecture, et. al.
what kinds of useful things / tools are people here using or are familiar with for related architecture / design pattern / framework / design / implementation etc. relating to system design and elaboration?

Graphviz has become indispensable for me. The way it just eats human-readable, easily machine generated text, and spits out graphs, directed graphs, flowcharts, and so on, has made a big difference for me. I use it both in verification/debugging/unit testing, as well as in the actual planning stage. (Plus, since it is just plain text, you can easily support it in your preferred markdown language.)

SiliconWizard · « **Reply #123 on:** July 09, 2022, 06:03:58 pm »

Graphviz is good, but of course it's fully automated (which is a benefit), and doesn't really allow any hand placement of anything. If you want something more flexible, there is yEd: https://www.yworks.com/products/yed

I use it when I need something that can be edited manually. It also supports a number of automatic placement algorithms, but you can further arrange things manually. It's free, but not open-source, though.

westfw · « **Reply #124 on:** July 10, 2022, 08:57:58 am »

Quote

Taking 4 µs on a 32 bit processor running at 120 MHz would indeed be awful. 480 clock cycles. Hard to see how you could even do that.

Here it is in all its ugliness!
micros()->(crit)elapsed_time->(crit)slicetime->(crit)ticker_read_us->initialize/(core_crit)/update_present_time->math
Code: [Select]10004484 <micros>:

unsigned long micros() {
10004484: b507 push {r0, r1, r2, lr}
return timer.elapsed_time().count();
10004486: 4903 ldr r1, [pc, #12] ; (10004494 <micros+0x10>)
10004488: 4668 mov r0, sp
1000448a: f002 fc80 bl 10006d8e <mbed::TimerBase::elapsed_time() const>
}
1000448e: 9800 ldr r0, [sp, #0]
10004490: bd0e pop {r1, r2, r3, pc}
10004492: 46c0 nop ; (mov r8, r8)
10004494: 20000fc8 .word 0x20000fc8

---------

10006d8e <mbed::TimerBase::elapsed_time() const>:
10006d8e: b530 push {r4, r5, lr}
10006d90: 000d movs r5, r1
10006d92: b085 sub sp, #20
10006d94: 0004 movs r4, r0
10006d96: a803 add r0, sp, #12
10006d98: f002 f820 bl 10008ddc <mbed::CriticalSectionLock::CriticalSectionLock()>
10006d9c: 0029 movs r1, r5
10006d9e: 4668 mov r0, sp
10006da0: f7ff ffda bl 10006d58 <mbed::TimerBase::slicetime() const>
10006da4: 68a8 ldr r0, [r5, #8]
10006da6: 68e9 ldr r1, [r5, #12]
10006da8: 9a00 ldr r2, [sp, #0]
10006daa: 9b01 ldr r3, [sp, #4]
10006dac: 1812 adds r2, r2, r0
10006dae: 414b adcs r3, r1
10006db0: a803 add r0, sp, #12
10006db2: 6022 str r2, [r4, #0]
10006db4: 6063 str r3, [r4, #4]
10006db6: f002 f817 bl 10008de8 <mbed::CriticalSectionLock::~CriticalSectionLock()>
10006dba: 0020 movs r0, r4
10006dbc: b005 add sp, #20
10006dbe: bd30 pop {r4, r5, pc}

-------

10006d58 <mbed::TimerBase::slicetime() const>:
10006d58: b537 push {r0, r1, r2, r4, r5, lr}
10006d5a: 0004 movs r4, r0
10006d5c: a801 add r0, sp, #4
10006d5e: 000d movs r5, r1
10006d60: f002 f83c bl 10008ddc <mbed::CriticalSectionLock::CriticalSectionLock()>
10006d64: 2300 movs r3, #0
10006d66: 2200 movs r2, #0
10006d68: 6022 str r2, [r4, #0]
10006d6a: 6063 str r3, [r4, #4]
10006d6c: 7d6b ldrb r3, [r5, #21]
10006d6e: 2b00 cmp r3, #0
10006d70: d008 beq.n 10006d84 <mbed::TimerBase::slicetime() const+0x2c>
10006d72: 6928 ldr r0, [r5, #16]
10006d74: f001 fff0 bl 10008d58 <ticker_read_us>
10006d78: 682a ldr r2, [r5, #0]
10006d7a: 686b ldr r3, [r5, #4]
10006d7c: 1a80 subs r0, r0, r2
10006d7e: 4199 sbcs r1, r3
10006d80: 6020 str r0, [r4, #0]
10006d82: 6061 str r1, [r4, #4]
10006d84: a801 add r0, sp, #4
10006d86: f002 f82f bl 10008de8 <mbed::CriticalSectionLock::~CriticalSectionLock()>
10006d8a: 0020 movs r0, r4
10006d8c: bd3e pop {r1, r2, r3, r4, r5, pc}

------

10008d58 <ticker_read_us>:
10008d58: b570 push {r4, r5, r6, lr}
10008d5a: 0004 movs r4, r0
10008d5c: f7ff ff2e bl 10008bbc <initialize>
10008d60: f000 fc24 bl 100095ac <core_util_critical_section_enter>
10008d64: 0020 movs r0, r4
10008d66: f7ff fe41 bl 100089ec <update_present_time>
10008d6a: 6863 ldr r3, [r4, #4]
10008d6c: 6a9c ldr r4, [r3, #40] ; 0x28
10008d6e: 6add ldr r5, [r3, #44] ; 0x2c
10008d70: f000 fc32 bl 100095d8 <core_util_critical_section_exit>
10008d74: 0029 movs r1, r5
10008d76: 0020 movs r0, r4
10008d78: bd70 pop {r4, r5, r6, pc}

-------

100089ec <update_present_time>:
100089ec: b5f7 push {r0, r1, r2, r4, r5, r6, r7, lr}
100089ee: 6846 ldr r6, [r0, #4]
100089f0: 0033 movs r3, r6
100089f2: 3332 adds r3, #50 ; 0x32
100089f4: 781c ldrb r4, [r3, #0]
100089f6: 2c00 cmp r4, #0
100089f8: d132 bne.n 10008a60 <update_present_time+0x74>
100089fa: 6803 ldr r3, [r0, #0]
100089fc: 685b ldr r3, [r3, #4]
100089fe: 4798 blx r3
10008a00: 6a32 ldr r2, [r6, #32]
10008a02: 0003 movs r3, r0
10008a04: 4282 cmp r2, r0
10008a06: d02b beq.n 10008a60 <update_present_time+0x74>
10008a08: 1a82 subs r2, r0, r2
10008a0a: 6930 ldr r0, [r6, #16]
10008a0c: 6233 str r3, [r6, #32]
10008a0e: 4010 ands r0, r2
10008a10: 2233 movs r2, #51 ; 0x33
10008a12: 56b2 ldrsb r2, [r6, r2]
10008a14: 2a00 cmp r2, #0
10008a16: db24 blt.n 10008a62 <update_present_time+0x76>
10008a18: 0021 movs r1, r4
10008a1a: f7f7 feb1 bl 10000780 <__aeabi_llsl>
10008a1e: 2734 movs r7, #52 ; 0x34
10008a20: 57f7 ldrsb r7, [r6, r7]
10008a22: 0004 movs r4, r0
10008a24: 000d movs r5, r1
10008a26: 2f00 cmp r7, #0
10008a28: d014 beq.n 10008a54 <update_present_time+0x68>
10008a2a: 2100 movs r1, #0
10008a2c: 6a70 ldr r0, [r6, #36] ; 0x24
10008a2e: 1824 adds r4, r4, r0
10008a30: 414d adcs r5, r1
10008a32: 9400 str r4, [sp, #0]
10008a34: 9501 str r5, [sp, #4]
10008a36: 428f cmp r7, r1
10008a38: db1b blt.n 10008a72 <update_present_time+0x86>
10008a3a: 003a movs r2, r7
10008a3c: 0020 movs r0, r4
10008a3e: 0029 movs r1, r5
10008a40: f7f7 fe92 bl 10000768 <__aeabi_llsr>
10008a44: 003a movs r2, r7
10008a46: 0004 movs r4, r0
10008a48: 000d movs r5, r1
10008a4a: f7f7 fe99 bl 10000780 <__aeabi_llsl>
10008a4e: 9b00 ldr r3, [sp, #0]
10008a50: 1a18 subs r0, r3, r0
10008a52: 6270 str r0, [r6, #36] ; 0x24
10008a54: 6ab2 ldr r2, [r6, #40] ; 0x28
10008a56: 6af3 ldr r3, [r6, #44] ; 0x2c
10008a58: 1912 adds r2, r2, r4
10008a5a: 416b adcs r3, r5
10008a5c: 62b2 str r2, [r6, #40] ; 0x28
10008a5e: 62f3 str r3, [r6, #44] ; 0x2c
10008a60: bdf7 pop {r0, r1, r2, r4, r5, r6, r7, pc}
10008a62: 68b1 ldr r1, [r6, #8]
10008a64: 0002 movs r2, r0
10008a66: 0023 movs r3, r4
10008a68: 0008 movs r0, r1
10008a6a: 0021 movs r1, r4
10008a6c: f7f7 ff34 bl 100008d8 <__aeabi_lmul>
10008a70: e7d5 b.n 10008a1e <update_present_time+0x32>
10008a72: 68f7 ldr r7, [r6, #12]
10008a74: 000b movs r3, r1
10008a76: 9800 ldr r0, [sp, #0]
10008a78: 9901 ldr r1, [sp, #4]
10008a7a: 003a movs r2, r7
10008a7c: f7f7 ff0c bl 10000898 <__aeabi_uldivmod>
10008a80: 4347 muls r7, r0
10008a82: 9b00 ldr r3, [sp, #0]
10008a84: 0004 movs r4, r0
10008a86: 1bdf subs r7, r3, r7
10008a88: 000d movs r5, r1
10008a8a: 6277 str r7, [r6, #36] ; 0x24
10008a8c: e7e2 b.n 10008a54 <update_present_time+0x68>

-------

10008ddc <mbed::CriticalSectionLock::CriticalSectionLock()>:
10008ddc: b510 push {r4, lr}
10008dde: 0004 movs r4, r0
10008de0: f000 fbe4 bl 100095ac <core_util_critical_section_enter>
10008de4: 0020 movs r0, r4
10008de6: bd10 pop {r4, pc}

Disassembly of section .text._ZN4mbed19CriticalSectionLockD2Ev:

10008de8 <mbed::CriticalSectionLock::~CriticalSectionLock()>:
10008de8: b510 push {r4, lr}
10008dea: 0004 movs r4, r0
10008dec: f000 fbf4 bl 100095d8 <core_util_critical_section_exit>
10008df0: 0020 movs r0, r4
10008df2: bd10 pop {r4, pc}

-------

100095ac <core_util_critical_section_enter>:
100095ac: b510 push {r4, lr}
100095ae: f7ff f979 bl 100088a4 <hal_critical_section_enter>
100095b2: 4a06 ldr r2, [pc, #24] ; (100095cc <core_util_critical_section_enter+0x20>)
100095b4: 6813 ldr r3, [r2, #0]
100095b6: 1c59 adds r1, r3, #1
100095b8: d104 bne.n 100095c4 <core_util_critical_section_enter+0x18>
100095ba: 223f movs r2, #63 ; 0x3f
100095bc: 4904 ldr r1, [pc, #16] ; (100095d0 <core_util_critical_section_enter+0x24>)
100095be: 4805 ldr r0, [pc, #20] ; (100095d4 <core_util_critical_section_enter+0x28>)
100095c0: f7ff ff46 bl 10009450 <mbed_assert_internal>
100095c4: 3301 adds r3, #1
100095c6: 6013 str r3, [r2, #0]
100095c8: bd10 pop {r4, pc}
100095ca: 46c0 nop ; (mov r8, r8)
100095cc: 2000aa4c .word 0x2000aa4c
100095d0: 10013a2f .word 0x10013a2f
100095d4: 10013a59 .word 0x10013a59

Disassembly of section .text.core_util_critical_section_exit:

100095d8 <core_util_critical_section_exit>:
100095d8: 4a05 ldr r2, [pc, #20] ; (100095f0 <core_util_critical_section_exit+0x18>)
100095da: b510 push {r4, lr}
100095dc: 6813 ldr r3, [r2, #0]
100095de: 2b00 cmp r3, #0
100095e0: d005 beq.n 100095ee <core_util_critical_section_exit+0x16>
100095e2: 3b01 subs r3, #1
100095e4: 6013 str r3, [r2, #0]
100095e6: 2b00 cmp r3, #0
100095e8: d101 bne.n 100095ee <core_util_critical_section_exit+0x16>
100095ea: f7ff f96f bl 100088cc <hal_critical_section_exit>
100095ee: bd10 pop {r4, pc}
100095f0: 2000aa4c .word 0x2000aa4c

-------

100088a4 <hal_critical_section_enter>:
100088a4: b510 push {r4, lr}
100088a6: f3ef 8010 mrs r0, PRIMASK
100088aa: b672 cpsid i
100088ac: 4a05 ldr r2, [pc, #20] ; (100088c4 <hal_critical_section_enter+0x20>)
100088ae: 7813 ldrb r3, [r2, #0]
100088b0: 2b00 cmp r3, #0
100088b2: d105 bne.n 100088c0 <hal_critical_section_enter+0x1c>
100088b4: 2101 movs r1, #1
100088b6: 000c movs r4, r1
100088b8: 4b03 ldr r3, [pc, #12] ; (100088c8 <hal_critical_section_enter+0x24>)
100088ba: 4384 bics r4, r0
100088bc: 701c strb r4, [r3, #0]
100088be: 7011 strb r1, [r2, #0]
100088c0: bd10 pop {r4, pc}
100088c2: 46c0 nop ; (mov r8, r8)
100088c4: 2000adff .word 0x2000adff
100088c8: 2000adfa .word 0x2000adfa

Disassembly of section .text.hal_critical_section_exit:

100088cc <hal_critical_section_exit>:
100088cc: b510 push {r4, lr}
100088ce: f3ef 8210 mrs r2, PRIMASK
100088d2: 2301 movs r3, #1
100088d4: 4393 bics r3, r2
100088d6: d004 beq.n 100088e2 <hal_critical_section_exit+0x16>
100088d8: 2236 movs r2, #54 ; 0x36
100088da: 4906 ldr r1, [pc, #24] ; (100088f4 <hal_critical_section_exit+0x28>)
100088dc: 4806 ldr r0, [pc, #24] ; (100088f8 <hal_critical_section_exit+0x2c>)
100088de: f000 fdb7 bl 10009450 <mbed_assert_internal>
100088e2: 4a06 ldr r2, [pc, #24] ; (100088fc <hal_critical_section_exit+0x30>)
100088e4: 7013 strb r3, [r2, #0]
100088e6: 4b06 ldr r3, [pc, #24] ; (10008900 <hal_critical_section_exit+0x34>)
100088e8: 781b ldrb r3, [r3, #0]
100088ea: 2b00 cmp r3, #0
100088ec: d000 beq.n 100088f0 <hal_critical_section_exit+0x24>
100088ee: b662 cpsie i
100088f0: bd10 pop {r4, pc}
100088f2: 46c0 nop ; (mov r8, r8)
100088f4: 10013620 .word 0x10013620
100088f8: 10013651 .word 0x10013651
100088fc: 2000adff .word 0x2000adff
10008900: 2000adfa .word 0x2000adfa


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Do i really need to use an RTOS? (Alternatives to finite state machines?) (Read 15665 times)

Share me