Author Topic: How Could I predict of loop speed of Microcontrollers ? (Read 4063 times)

ammjy · « **on:** January 06, 2020, 02:35:07 pm »

Hello, EEVBlog users !

Thanks you for your time.

I have a question about loop speed of Microcontroller's.
If we upload firmware, Microcontroller will operate according to programmed loop.

for a example, If we have MCU and we consist PID controller to control motor with speed sensor,
MCU will calculate PID loop with rotational speed data and input data, to make control signal.
and that control signal will converted to PWM or something.

But, How we can predict approximate loop speed ?
Could I ask for advice from you ?

Please, let me know,,

chickenHeadKnob · « **Reply #1 on:** January 06, 2020, 02:55:59 pm »

The standard way to do this is to count instruction cycles. However usually if you need to track very exact timings you do this after writing and loading the code which has the section you want to time bracketed with writes to set/clear one or two spare gpio port bits. Then you can monitor those port(s) bits with a scope and get exact running times. And you can determine jitter in task scheduling or interrupt servicing.

Sal Ammoniac · « **Reply #2 on:** January 07, 2020, 06:26:20 pm »

Probably not relevant to your case, but this is interesting: http://news.mit.edu/2020/tool-how-fast-code-run-chip-0106

tggzzz · « **Reply #3 on:** January 07, 2020, 09:19:00 pm »

If the micro has a cache or interrupts are used, then you can't usefully predict the loop time. The best you can do is make many measurements and hope you have captured the worst case.

There is one exception to that: the XMOS xCORE processors; the IDE analyses the object code to give exact timings for all paths in the loop.

Alternatively you could design the system to be insensitive to the time spent processing in the loop.

ajb · « **Reply #4 on:** January 08, 2020, 04:18:34 pm »

A perhaps more reliable approach is to choose an MCU capable of running much faster than you require, and use a timer to manage when you actually execute each iteration of your control loop. This is especially useful when the MCU has to manage other tasks that would result in timing variability. The control loop can either be executed in a timer interrupt, in the main loop (after checking that it's due for execution), or in an OS task depending on timing/complexity requirements

T3sl4co1l · « **Reply #5 on:** January 08, 2020, 06:00:29 pm »

By design, of course. A controller might run in a loop that is either triggered by a timer flag, or code run by a timer interrupt.

You must use a CPU powerful enough to complete the operation in the available time, plus any additional activities you have planned (e.g., serial, display, etc.).

There is no trivial way to count CPU cycles, but it can be calculated by careful inspection of output code, reference to the instruction set, and any additional materials (faster CPUs contain caches and pipelines which consume additional cycles, depending on access patterns). Or it can be measured (profiling) through various means (a very basic example, set and clear a GPIO pin when entering and exiting the relevant section; measure the pulse width or duty cycle with an oscilloscope).

You cannot calculate CPU cycles from high level code (e.g. Arduino statements).

Tim

ttt · « **Reply #6 on:** January 08, 2020, 11:38:09 pm »

Like everyone else already said: measure. On a regular ARM Cortex-M based MCU you can count cycles using the DWT_CYCCNT register. You need to enable debug mode for that. Something like this should work:

Code: [Select]


void your_function(void)
{
	volatile uint32_t *DWT_CYCCNT = (uint32_t *)0xE0001004;

	// reset cycle count
	*DWT_CYCCNT = 0;
	
	// Do work ...
	....

	// get cycles spent
        uint32_t cycle_count = *DWT_CYCCNT;
}

// Call this somewhere in main()
void init_cycle_counter() {
    volatile uint32_t *DEMCR = (uint32_t *)0xE000EDFC;
    volatile uint32_t *DWT_LAR = (uint32_t *)0xE0001FB0;
    volatile uint32_t *DWT_CONTROL = (uint32_t *)0xE0001000;

    *DEMCR |= 0x01000000;
    *DWT_LAR = 0xC5ACCE55; 
    *DWT_CONTROL |= 1;
}

I use this method regularly for time critical stuff, like my lathe motion controller which does control steppers based on incremental encoder input:

https://github.com/tinic/lathe_motion_stm32/blob/master/lathe_motion/src/main.cpp

SiliconWizard · « **Reply #7 on:** January 08, 2020, 11:53:04 pm »

Quote from: ttt on January 08, 2020, 11:38:09 pm

Like everyone else already said: measure. On a regular ARM Cortex-M based MCU you can count cycles using the DWT_CYCCNT register. You need to enable debug mode for that.

Yup.

NorthGuy · « **Reply #8 on:** January 09, 2020, 12:39:14 am »

Quote from: ttt on January 08, 2020, 11:38:09 pm

Like everyone else already said: measure.

Don't measure. Design.

What are you going to do? Take an MCU, design a board, write all the code, then measure just to figure out it is too slow. What for?

Instead, figure out how fast a loop you need, then get an MCU which can produce the desired speed for you. Roughly, you divide the MCU frequecy by the loop frequency. This gives you the number of instructions you have to program your loop. If that's enough for you, MCU is suitable. Otherwise, get something faster.

tggzzz · « **Reply #9 on:** January 09, 2020, 01:04:05 am »

Quote from: NorthGuy on January 09, 2020, 12:39:14 am

Quote from: ttt on January 08, 2020, 11:38:09 pm
Like everyone else already said: measure.

Don't measure. Design.

What are you going to do? Take an MCU, design a board, write all the code, then measure just to figure out it is too slow. What for?

Instead, figure out how fast a loop you need, then get an MCU which can produce the desired speed for you. Roughly, you divide the MCU frequecy by the loop frequency. This gives you the number of instructions you have to program your loop. If that's enough for you, MCU is suitable. Otherwise, get something faster.

Don't forget to disable all caches; they improve speed on average, but realtime systems depend on worst case behaviour.

james_s · « **Reply #10 on:** January 09, 2020, 01:06:45 am »

No need to design a board just to find out if a MCU is suitable. That's what dev/evaluation boards were invented for.

nigelwright7557 · « **Reply #11 on:** January 09, 2020, 08:51:09 am »

Any work I have done with PID included a mains reference signal so everything was timed around that.
Older PIC micros are good as they are 1/2us per cycle.
With newer processors its more usual to use a timer or two...

Siwastaja · « **Reply #12 on:** January 09, 2020, 11:19:27 am »

Most MCUs won't have caches. Only some top-end models do; many of them run caches disabled by default. It's very improbable a person with such a basic question would be working with an MCU with cache.

What comes to interrupts, don't bloat it; design. Interrupts should be there for a good reason, kept short and without too many different paths.

Unless your project is massively complex, it should be fairly simple to analyze for worst-case timing, including interrupts, even without a specific brand name design environment.

tggzzz · « **Reply #13 on:** January 09, 2020, 11:33:47 am »

Quote from: Siwastaja on January 09, 2020, 11:19:27 am

Most MCUs won't have caches. Only some top-end models do; many of them run caches disabled by default. It's very improbable a person with such a basic question would be working with an MCU with cache.

Someone with a basic question won't understand the significance of caches - and so might well inadvertently be using a machine with a cache.

Quote

What comes to interrupts, don't bloat it; design. Interrupts should be there for a good reason, kept short and without too many different paths.

That is good sane design. Not all designs are sane and/or good. Interrupts tend to be overused

There's a good purist argument that interrupts are only necessary because inadequate hardware necessitates it be timeshared

Quote

Unless your project is massively complex, it should be fairly simple to analyze for worst-case timing, including interrupts, even without a specific brand name design environment.

Analysis on its own? I doubt that. Measurement is much more likely.

And projects don't have to be complex for analysis to be intractable. All that is required is that the available resources aren't grossly underutilised.

Siwastaja · « **Reply #14 on:** January 09, 2020, 11:45:30 am »

Quote from: tggzzz on January 09, 2020, 11:33:47 am

Analysis on its own? I doubt that. Measurement is much more likely.

Analysis doesn't need to be "count every instruction looking at the instruction set manual".

Just measuring the time counts as analysis, as long as you understand the consequence of hitting different paths, including those coming from the interrupts.

Basic procedures like turning off interrupts for the short timing-critical sections work well for most cases; such sections can be easily and reliably measured. Setting highest IRQ priority, doing the timing-critical thing as the very first thing in such highest-priority ISR does the job on any ARM Cortex M with just a few clock cycles of uncertainty. Whenever this isn't enough, your XMOS product line may hit some nice sweet spot, but it's more likely that an CPLD or FPGA would be the right tool for the job.

Sometimes we may stretch the limits of what simple MCUs can do because we are so cost-restrained on the final BOM that spending extra human effort is worth it. In this case, neither XMOS nor FPGA does the job.

Vendor lock-in is a huge risk when talking about workflows which require learning the mindset. This is why XMOS is and stays niche. Those who need more timing capability than they get from a general purpose ARM MCU, go with an FPGA.

donotdespisethesnake · « **Reply #15 on:** January 09, 2020, 12:04:40 pm »

Quote from: NorthGuy on January 09, 2020, 12:39:14 am

Quote from: ttt on January 08, 2020, 11:38:09 pm
Like everyone else already said: measure.

Don't measure. Design.

What are you going to do? Take an MCU, design a board, write all the code, then measure just to figure out it is too slow. What for?

Instead, figure out how fast a loop you need, then get an MCU which can produce the desired speed for you. Roughly, you divide the MCU frequecy by the loop frequency. This gives you the number of instructions you have to program your loop.

Since you haven't written the code yet, that is pretty useless information.

In practice, you rarely start from a clean sheet, you are building an incremental change on a previous project. In all cases, you draw on experience of others, so the third option is to consult others with experience - which is exactly what the OP is doing. That could also include talking to reps of MCU manufacturers, they often have ready to go examples if motor control is a target application.

It is quite plausible to build some prototype/technical demonstrator project as a throw-away path finder, I would say normal in fact. It depends on how much you are investing in R&D, in order to reduce risk later in the development phase.

Of course, if you have no experience with motor control algorithms, no access to FAEs, no previous code or hardware, and no budget for multiple iterations then you are really doing it the extreme way, and are unlikely to succeed.

Unfortunately the OP is one of those fire and forget, please do my homework questions, so we may not get any feedback. I do wonder if there is a bot creating these in order to provoke discussions.

NorthGuy · « **Reply #16 on:** January 09, 2020, 02:42:20 pm »

Quote from: donotdespisethesnake on January 09, 2020, 12:04:40 pm

Quote from: NorthGuy on January 09, 2020, 12:39:14 am
Instead, figure out how fast a loop you need, then get an MCU which can produce the desired speed for you. Roughly, you divide the MCU frequecy by the loop frequency. This gives you the number of instructions you have to program your loop.

Since you haven't written the code yet, that is pretty useless information.

Is it? Say, you want 2 kHz PID loop. You're interested to know if PIC16 can do it. Say, it runs at 8 MHz. 8 MHz/2 kHz = 4000. Since the PIC doesn't have a hardware multipler, the multiplication expense will dominate. To be generous, give it 200 cycles for multiplication. You will need one or two - 400 cycles. Add some room for bloat - 1000 cycles, which is well below 4000. Test passed.

tggzzz · « **Reply #17 on:** January 09, 2020, 03:02:50 pm »

Quote from: Siwastaja on January 09, 2020, 11:45:30 am

Quote from: tggzzz on January 09, 2020, 11:33:47 am
Analysis on its own? I doubt that. Measurement is much more likely.

Analysis doesn't need to be "count every instruction looking at the instruction set manual".

Just measuring the time counts as analysis, as long as you understand the consequence of hitting different paths, including those coming from the interrupts.

Please don't snip the important context and caveats just so you can make your point - a.k.a. strawman arguments!

Quote

Basic procedures like turning off interrupts for the short timing-critical sections work well for most cases; such sections can be easily and reliably measured. Setting highest IRQ priority, doing the timing-critical thing as the very first thing in such highest-priority ISR does the job on any ARM Cortex M with just a few clock cycles of uncertainty. Whenever this isn't enough, your XMOS product line may hit some nice sweet spot, but it's more likely that an CPLD or FPGA would be the right tool for the job.

Sometimes we may stretch the limits of what simple MCUs can do because we are so cost-restrained on the final BOM that spending extra human effort is worth it. In this case, neither XMOS nor FPGA does the job.

Vendor lock-in is a huge risk when talking about workflows which require learning the mindset. This is why XMOS is and stays niche. Those who need more timing capability than they get from a general purpose ARM MCU, go with an FPGA.

Most of those points are strawman arguments - particularly since this thread is about prediction. Look at the title, another important piece of context

Vendor lockin can be an important consideration, as can other considerations such as BoM cost. But this thread isn't about those considerations, nor about XMOS.

And it isn't "my XMOS"; I have zero connection with the company!

tggzzz · « **Reply #18 on:** January 09, 2020, 03:08:17 pm »

Quote from: NorthGuy on January 09, 2020, 02:42:20 pm

Quote from: donotdespisethesnake on January 09, 2020, 12:04:40 pm
Quote from: NorthGuy on January 09, 2020, 12:39:14 am
Instead, figure out how fast a loop you need, then get an MCU which can produce the desired speed for you. Roughly, you divide the MCU frequecy by the loop frequency. This gives you the number of instructions you have to program your loop.

Since you haven't written the code yet, that is pretty useless information.

Is it? Say, you want 2 kHz PID loop. You're interested to know if PIC16 can do it. Say, it runs at 8 MHz. 8 MHz/2 kHz = 4000. Since the PIC doesn't have a hardware multipler, the multiplication expense will dominate. To be generous, give it 200 cycles for multiplication. You will need one or two - 400 cycles. Add some room for bloat - 1000 cycles, which is well below 4000. Test passed.

So you can make guesstimate about grossly underutilised hardware. Boring!

What if you guesstimate that other operations bring it up to 3000 cycles? Then you are in trickier territory.

Kleinstein · « **Reply #19 on:** January 09, 2020, 03:27:39 pm »

To get the time code takes one can either a test, possibly with additions to help measuring the time. With some IDEs one can also use simulation of the code still in the IDE to get the timing.

The exact timing can vary on details like compiler optimization and actual data. So if really fixed loop lenght is needed it is more like using some hardware to to set the loop lenght (e.g. ADC triggering, wait for timer or use a timer interrupt). So for the code it is abut making sure the worst case is fast enough. For prediction one could use experience from other code or simpler code parts - just be aware of the optimization of modern compilers that sometimes is very effective and sometimes fails.

For short loops in ASM with simple µCs (e.g. no cache and no IRQs used) one may be able to get loops with fixed execution time - still it's tricky and more like last resort with lots of simulation runs.

T3sl4co1l · « **Reply #20 on:** January 09, 2020, 03:58:31 pm »

Right, you can do a top-down analysis but you need a good idea of how many and what kind of operations are required, for a given well-defined algorithm, on a given well-defined platform.

Example: maybe you have 1-2 cycle 8x8 bit multiply (e.g. AVR8), but it takes a half dozen instructions to load and store 16-bit offsets with 8-bit registers, so a full 16x16 multiply call takes upwards of 40 cycles. Doing it serially (shift accumulate) usually takes a few hundred cycles.

And if you need higher math operations like floating point, exp/log/trig, count on 2-10 times that, easily. (If you need exp/trig, consider using a CORDIC library.)

If you don't even have an algorithm selected, forget about it. Your project needs far more development before it's ready to count cycles.

Platforms can be shopped between, to some extent, but you need to read enough on the CPUs and peripherals to understand all real delays. Instructions, caches or pipelines if applicable, memory access, peripheral access, peripheral features and how much you can offload to them versus how much you need to babysit them, etc. So expect to spend days to weeks researching a given platform to sufficient detail to be able to cycle-count it.

It's easy to see why CPUs are largely shopped by order of magnitude. Not confident an AVR will do it? Use STM32F0. Need more? Need a real OS (Linux etc.)? Use STM32F4+. It runs Doom easily. Need lots of bandwidth? Get a still faster ARM, or consider a DSP or FPGA. Can you do development on an rPi? They're cheap enough that it's a good option, even for painfully trivial tasks. The price difference between these options is roughly arithmetic ($1, 2, 3, ...), while the computational power is exponential. The R&D time required to qualify a marginal, potentially cheaper platform, versus getting a more powerful one, will easily pay for a thousand of those chips.

Tim

NorthGuy · « **Reply #21 on:** January 09, 2020, 07:28:33 pm »

Quote from: tggzzz on January 09, 2020, 03:08:17 pm

So you can make guesstimate about grossly underutilised hardware.

Anybody can make an estimate. That's not a rocket science. The trick is to actually take time and make the estimate before grabbing the chip. Also look at the periphery, RAM, ROM, and other things. The little time you spend doing this will pay off big time later.

Also, may help you realize that PIC16 is "grossly underutilised hardware" for your project

Quote from: tggzzz on January 09, 2020, 03:08:17 pm

What if you guesstimate that other operations bring it up to 3000 cycles? Then you are in trickier territory.

If you want to push something to the edge, you need more accurate estimates.

Jan Audio · « **Reply #22 on:** January 10, 2020, 11:35:07 am »

Quote from: NorthGuy on January 09, 2020, 12:39:14 am

Quote from: ttt on January 08, 2020, 11:38:09 pm
Like everyone else already said: measure.

Don't measure. Design.

What are you going to do? Take an MCU, design a board, write all the code, then measure just to figure out it is too slow. What for?

Yes, write as fast as possible.
Use else for less important things.
Can copy main loop twice and fill up with 2 less important things.

Siwastaja · « **Reply #23 on:** January 10, 2020, 04:03:50 pm »

A few points for a painless process. These can be violated for a good reason, and with experience, but I wouldn't recommend doing so.

1) Don't try to push microcontrollers to do literally cycle-accurate things on software. That's not the intended purpose. Peripherals are for this purpose, hence take a careful look at the availability of peripherals you need, including a sanity check whether all you need are actually available at the same time (pin mapping, DMA mapping if that applies). Sometimes an external IC is what you need; likely the problem is already solved for you.

Sometimes you cannot solve the cycle-accurate problem with on-chip peripherals; then look at microcontollers that provide programmable glue logic; otherwise, go for external CPLD. Or, why not a XMOS product

.

2) Don't confuse understanding timing constraints with being cycle-accurate. Two different things, completely different league.

Often, the real-world specification is like "do this thing within 1 us from a trigger signal" and/or "calculate thing A every 10us with max 2us jitter, and thing B every 100us, with max 20us jitter". Neither requires cycle-accurate timing, not even close. It's enough to make interrupt A higher priority than interrupt B, make sure interrupt A runs in a few us, and never disable interrupts for more than about 1us, which is trivial if you follow sane practices of only disabling interrupts for short atomic operations, typically a few cycles.

Note that on a typical mid-cost ARM MCU running at around 100MHz, the interrupt latency is around 120ns, and may have some jitter around 50ns; for example, if a lower priority interrupt entry is interrupted by a higher-priority interrupt, the CPU is "wise" enough to utilize the stack-push work already being done, lowering the latency in that particular case, increasing the jitter. Some will argue it's a good feature, others will see it as a bad feature. But the jitter is still in tens of nanoseconds! Does that matter? I have never yet seen a case where it does, all high-speed communication is standardized and working through the peripherals. Peripherals designed to work with high-priority, high-accuracy safety signals, such as motor controller peripherals, implement their own low-latency, low-jitter input channels directly into that peripheral. (Like a motor controller / power controller bridge overcurrent signal.)

3) Leave margin. A lot of margin. The less margin you have, the closer to cycle-accurate your analysis needs to be, the slower and more demanding it gets to design.

But CPU performance is cheap! An MCU which costs $1 more may save you $10000 in development. You need to sell more than 10000 to break even. This is only counting the direct employment cost. In real world, you can see non-optimized solutions (i.e., "excess" CPU performance) even in large-scale mass products, because having to rely on optimization, and having no option of adding SW features later during the process without PCB respin, is something that will also stretch the calendar time, possibly not solvable by throwing larger engineering budget; the product launch cannot wait. Hence, they throw more CPU power at it from the start, even if it costs more in components.

Margin allows you to leave minor details out of your timing estimate. Margin saves your day when you forgot about a thing you need to cover, and need to add new, unexpected code.

4) Do actual projects to gain experience. With some experience, you can hand-wave a timing estimate that is accurate, say, from -50% to +100%, in minutes. It's valuable: upon seeing the expected specification, you can instantly give red or green (or maybe yellow) light to the project, and give an estimate whether you need a $1 MCU or a $100 FPGA, or maybe a $10 application CPU to pull it off.

5) When #3 fails you, or more likely, you fail following #3: Save an actual project with low-level optimization once or twice, so you learn there usually are ways out of trouble even when you run out of your margin. Don't make it a habit, though. You likely miss the deadline; and requiring low-level manual optimization regularly tells something about your margins.

T3sl4co1l · « **Reply #24 on:** January 10, 2020, 06:35:17 pm »

Good point, even something as (perhaps seemingly) innocuous as optimization flags, can be troublesome. Optimization tells the compiler to try every sneaky thing it knows, and produce the fastest/smallest code based on that. This pushes the output towards the edge cases of the C code (or other language), which the developer is less likely to have understood, or meant.

Sometimes it comes down to "implementation dependent" features, but it's often the case that a particular construct can be implemented in various ways while still complying with the standard, often due to obscure and pedantic rules.

A familiar example is leaving off the volatile keyword when the variable might be updated outside a given function. The unoptimized code is compiled naively, including all read and write operations verbatim. Optimization tells the compiler to omit redundant operations, and if it doesn't know there's any way the variable can be read/written elsewhere, it will simply omit all writes/reads it can! The resulting program functions in -O0 but hangs or reads zeroes in -O1 or higher, say. You might rightfully ask, which way is actually correct? But both are indeed correct, for their respective reasons.

As a result, a lot of shops tend to prefer unoptimized releases -- the output is more likely to follow the developers' intent, and less likely to be shaped by edge cases and obscure rules.

A couple years ago I did the hardware on a job that's for metering fuel, for sale; as such, it needs to receive the local Weights and Measures seal, and that in turn needs to meet various requirements. Customer requested something nice and upscale, Cortex M4. Adds a couple bucks to the BOM cost, compared to the minimum they could potentially use (a PIC18 would probably suffice). But they're not making millions of these things, and they'll easily spend several times the difference on development and approvals. And they're already using the family elsewhere, they already know the toolchain and have an existing codebase for it.

Medical devices are also often hard real-time with strict design approvals. It's common to have a "treatment" processor locked in its own little world, with as little communication as possible to the less secure application processor. This doesn't quite double the cost (after all, you might save some cost on the "treatment" processor, if you can), but it's a sure better thing than injuring patients with faulty treatments!

Tim


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: How Could I predict of loop speed of Microcontrollers ? (Read 4063 times)

Share me