Author Topic: Best MCU for the lowest input capture interrupt latency (Read 18087 times)

tggzzz · « **Reply #50 on:** April 04, 2022, 05:44:32 pm »

Quote from: SpacedCowboy on April 04, 2022, 05:15:02 pm

Quote from: tggzzz on April 04, 2022, 04:32:00 pm
The problem is that devices - whatever you call them - are becoming ever more capable and complex, at rapidly decreasing cost. The consequence is that nothing is black and white (it is shades of gray), the boundaries are becoming increasingly blurred, and there is no simple distinction that can be made.

I think this is a fair point. I think there is a 'crossover' market right now, but I don't think it really changes the definitions.

Quote from: tggzzz on April 04, 2022, 04:32:00 pm
To me an MCU is a single chip computer, embedded within a system and not acting as a fully-fledged general purpose computer. The Zynq devices are used like that, e.g. in the Red Pitaya oscilloscope (and other embedded systems and, I believe, other scopes). Hence to me they constitute an MCU, albeit a very powerful one. They are also capable of much more.

What you're describing sounds more like a SOC to me. A Zynq can be used as a 'microcontroller' to talk to the PL, but it's a stretch. There's too much associated with a Zynq that is more APU-like (DDR ram with its vagaries, for example) although you certainly can try and use it as an MCU, I don't think that's it's real purpose - it's aimed somewhat higher IMHO. SOCs can range from something like the Zynq, or embedded IP on and FPGA, all the way up to the Apple M1, although that doesn't really fit into your 'not acting as a fully-fledged general purpose computer' clause.

And there's another term, SoC, that is ill-defined

Having the A9-ARM(s) operating very closely with the PL is standard usage. The design tools and examples clearly indicate that is the intention.

One way of using them is to have a linux running on one core, and and RTOS (or simpler) on the other core.

Overall it is a fascinating, if unsurprising, development, and Xilinx appear to have executed it well.

Quote

If you look at how ST do it, they have their MP1 which has an MCU in it (Cortex M4) but it also has APU's (Cortex A7's) - I'd say the M4 was the MCU, and the A7's were not. The reason they include the M4 is basically the laundry list already discussed, where a predictable and deterministic device is required. The reason they include the A7's is for general-purpose computing and speed, but with determinism trade-offs. Anything 'A'-designated (including the Zynq) isn't really an MCU, at least IMHO.

I'd say the RP2040 was an MCU, I'd say the R-Pi was not. I'd say the PIC range, anything ARM-M-designated were MCU's, as are the XMOS chips. I'd say the SiFive SOCs (eg: U540) were not.

I'm not going to disagree, but I'll note that the mere existence of the various ranges of ARM cores indicates that everything is grey.

Quote

IMHO it's pretty simple: determinism is the defining factor of an MCU. If I know I can guarantee it'll take X clocks to service my interrupt under <condition> and I can set up my code to be in <condition>, then it's a microcontroller. If there's even a chance that this can't be done, it's not.
Shrug. I guess YMMV, but that's my $0.02.

That's internally consistent, but it begs the question of what degree of determinism is required.

To me an interesting system-level design decision is what to put in hardware and what in software. That needs understanding of the boundaries and how the boundaries are changing. And that means most distinctions between most processors is knowledge with a very short half-life. OTOH, being able to recognise the fundamental differences - and lack of them - will stand the test of time.

I think I'll propose a rule-of-thumb. When comparing devices, the more letters/digits you have to use, the less interesting the distinctions become

hans · « **Reply #51 on:** April 04, 2022, 06:06:09 pm »

Quote from: tggzzz on April 04, 2022, 04:32:00 pm

The problem is that devices - whatever you call them - are becoming ever more capable and complex, at rapidly decreasing cost. The consequence is that nothing is black and white (it is shades of gray), the boundaries are becoming increasingly blurred, and there is no simple distinction that can be made.

From that point of view, the key distinction is the definition of various terms - and neither of us has made our definition explicit.

To me an MCU is a single chip computer, embedded within a system and not acting as a fully-fledged general purpose computer. The Zynq devices are used like that, e.g. in the Red Pitaya oscilloscope (and other embedded systems and, I believe, other scopes). Hence to me they constitute an MCU, albeit a very powerful one. They are also capable of much more.

Similarly Z80s and their more integrated modern variants are also MCUs - but they are/were also general-purpose computers running general purpose operating systems. In my case I used Z80a+CP/M for cross-compiling C for an embedded Z80.

Now if you use MCU to mean something without caches or where caches can be disabled, that's one other valid definition of the term. But only one other.

I think the presence of caches is a BS indicator of MCU or MPU class, or embedded or application cores. Caches are a side effect of another design decision. For example, FLASH accelerators are commonly used in modern MCUs, but are also notorious in worst-case execution time prediction (upper bound of jitter), since they are effectively also caches. E.g. they may read a FLASH line size of 128-bit, and make it available in larger chunks for the I bus. This is fine for sequential access, but on random access (jumps) it will incur a small delay (wait state).
To make an accurate WCET prediction is hard. Another way is to assume cache miss always occurs.

Is SOC an accurate term? I doubt it. I think that's more for a CPU with integrated peripherals that are secondary to running code, for example, on board USB, ethernet, etc.

Rather I think that the focus on running code from internal or external memory, only, optionally or primarily, is a decent indicator. Some MCU/MPUs are getting into a 'crossover' region, e.g. a Cortex-m7 at 0.5GHz+ will run circles around desktop PCs from 25 years ago.

Running code from external ICs relies completely on caches to 'fix' performance. The external memory device is too slow, and we want to go faster. Likewise, modern PCs are also outright slow if you were to turn off speculative execution, even though it has caused many security problems in the last few years. We would be right back on MCU-level performance/MHz if we did. Many Cortex-A core variants distinguish themselves from MCUs by presence of these features, similar to out of order execution and superscalar designs. Out-of-order execution typically uses register renaming, e.g. the register file is L1, instead of a bunch of flipflops. These are quite clear design differences that application processors unique from microcontrollers, and also by design/choice not very suitable for hard real-time systems with bounded system/interrupt latencies.

However, superscalar execution has just entered the Cortex-m7, and probably to come is the speculative and/or out-of-order execution so the 'crossover' region will blur even more in the future.... more shades of gray.

tggzzz · « **Reply #52 on:** April 04, 2022, 07:00:28 pm »

To concentrate on a couple of your points...

Quote from: hans on April 04, 2022, 06:06:09 pm

To make an accurate WCET prediction is hard. Another way is to assume cache miss always occurs.

Without the right architectural features, that is extremely difficult to predict. In practice people fall back on measuring and hoping they come across the WET. Or doing it in hardware.

Quote

However, superscalar execution has just entered the Cortex-m7, and probably to come is the speculative and/or out-of-order execution so the 'crossover' region will blur even more in the future.... more shades of gray.

I didn't realise that, but it is just another example on why it it beneficial to concentrate on the fundamental basic features, and not to become obsessive about somewhat arbitrary names.

SiliconWizard · « **Reply #53 on:** April 04, 2022, 08:03:18 pm »

The days when microcontrollers were 100% predictable timing-wise when executing code (and in a humanly-approachable way) are over, except for the vintage stuff and a very few exceptions.

Now, you can usually determine upper bounds for all timings, but even that can be tricky on modern MCUs, and the upper bounds you're going to determine are likely to leave you unimpressed. But if that meets your timing requirements, then that will be the way to go. For anything interrupt-based, you'll likely be in the order of the us on even a fast MCU (but the faster it is, and often the more complex the architecture, so it becomes difficult to get below that threshold.) That's often more than good enough for real-time "tasks", as long as you're not trying to use your MCU as an FPGA.

But that's the reason why many MCUs these days come with hardware triggering, flexIO stuff, etc. That can fill the gap, because most often, when you need really very low latency, whatever needs to be done is pretty low-level.

I used cycle-counting per instruction on 8-bit PIC MCUs (and then again, also actually on old CPUs like Z80, 8086, ...). I never did that on anything Cortex-based. It's certainly not what defines "MCUs" these days.

tggzzz · « **Reply #54 on:** April 04, 2022, 08:32:45 pm »

Quote from: SiliconWizard on April 04, 2022, 08:03:18 pm

The days when microcontrollers were 100% predictable timing-wise when executing code (and in a humanly-approachable way) are over, except for the vintage stuff and a very few exceptions.

Yes, and beginners don't know the exceptions, nor why they are exceptions.

Quote

Now, you can usually determine upper bounds for all timings,

How? More specifically, how do you know your max figure is the upper bound.

Quote

but even that can be tricky on modern MCUs, and the upper bounds you're going to determine are likely to leave you unimpressed.

Exactly

Beginners often don't realise the difference between mean and max.

Quote

But if that meets your timing requirements, then that will be the way to go.

Definitely

Quote

For anything interrupt-based, you'll likely be in the order of the us on even a fast MCU (but the faster it is, and often the more complex the architecture, so it becomes difficult to get below that threshold.) That's often more than good enough for real-time "tasks", as long as you're not trying to use your MCU as an FPGA.

But that's the reason why many MCUs these days come with hardware triggering, flexIO stuff, etc. That can fill the gap, because most often, when you need really very low latency, whatever needs to be done is pretty low-level.

I used cycle-counting per instruction on 8-bit PIC MCUs (and then again, also actually on old CPUs like Z80, 8086, ...). I never did that on anything Cortex-based. It's certainly not what defines "MCUs" these days.

Knowing when and why hardware is required is an essential skill. Ditto understanding the wide difference between mean and max.

Hand cycle counting always was a pain, even when practical.

hans · « **Reply #55 on:** April 05, 2022, 08:17:20 am »

You can very easily estimate a theoretical upper bound for the worst-case execution time for a system. Just assume nothing works. The pipeline gets flushed for each instruction: so a 3-stage CPU only executes 1 opcode per 3 cycles. All caches miss: each memory access has the maximum penalty for that specific memory. etc. You will get an upper bound that is no where close to reality, but it's a theoretical 'correct' value for your assumptions.

But when theory needs to meet practice, are the assumptions realistic? The challenge then becomes to find a tight upper bound, that is valid when you do introduce the design concepts like a pipelined CPU. This still has no relationship at all to the mean execution time of a routine. Mean is only a fun figure to have if you want to design a desktop program with a certain throughput aim. It's pretty much useless in the embedded space. After all, if we design a power supply to withstand certain limits, we are only interested in the min/max so it doesn't release the magic smoke. "Typical" values as an absolute maximum make as much sense as saying something is more optimal.

Measuring WCETs is to some degree valid to use (however, often with a large safety margin added on top, resulting in a non-tight bound).. but also remember that such a measurement is only worth as much as the stimuli that were applied during test. You can't expect that repeating the exact same experiment will at some point yield a different result. If that happened, then some stimuli must have changed. For extreme example, if you never tested the WCET while a nuclear bomb goes off at distance=x m away, then perhaps you don't have the WCET for all operation conditions for that system (unless the system can go down in the apocalypse, then you can abandon this test).

Perhaps you can see why computer science classifies algorithms and problems into things like P, NP and NP hard. Sometimes you can know that a valid solution to a math problem exists, but finding that solution takes an extremely long time to do so. Finding a more relaxed solution (a less tight upper bound) is then the more practical thing to do.

Siwastaja · « **Reply #56 on:** April 05, 2022, 08:19:13 am »

Interrupt latency on Cortex-M7 at 400MHz is 30ns, and this includes automatic stacking of registers and fetching the vector address. With code in ITCM, the source for jitter is due to logic synchronization - something that can't be avoided. Let's call it 1 clock cycle. So average 30ns with 2.5 ns of jitter. Given easy-to-use interrupt priority system, heck, you can just make use of the interrupts without blinking an eye! Not having ITCM/DTCM available and sharing the RAM with heavy DMA transfers? Just assume 100% DMA utilization. Bus arbiter gives half of the cycles for CPU, so assume each load/store takes double the time. It's still blazing fast. So you have an if(condition) at the start in that critical ISR? So OK, it might run a clock cycle or two faster on the second invocation. So your jitter is now up by 5ns! There is no magical way branch prediction would sometimes make it take longer than it takes on the first round, or cause a large difference. It's very fast even with the miss, and the miss isn't some super special occurrence, you will see it on the scope screen the first time you try it.

Now let's compare these numbers to the "simple" "good" old systems. Polling loop on a simple 8-bit AVR at 8MHz, which alternates IO reads and comparison and branch instructions, would be something like 5 clock cycles in length, with maybe 3 clock cycles of jitter, plus 1 unavoidable cycle from synchronization. So something like average ~400ns with another 400ns of jitter. As you can see, I had hard time coming up with exact jitter number, so it's not easy to analyze. You thought you could write "cycle-accurate" code easily on a simple instruction set with predictable instruction timing, but in reality you didn't, because you still had to interface with the external world. And this applies to any xCORE as well. It can't magically predict when the external signal comes in.

Simple isn't always better. Identify what you really need. This sounds obvious, but if you need fast, then fast is fast. And MICROCONTROLLER cores, when compared to APPLICATION cores, offer fast worst-case. They do it by having on-chip SRAM, and higher end ones partitioning the RAM in multiple interfaces, some only accessible by CPU (with predictable exact 1-cycle latency). This is all blatantly obvious to everybody except guerilla marketing shills. Some uncertainty comes from the fact that Cortex-M7 is sometimes even faster by having branch prediction. The result is very fast worst case, and a tad better average case, with a few clock cycles of jitter. This is a big deal only if you really need to bit-bang an interface actually accurate to a few nanoseconds. Which is extremely rare, given the sheer performance, which let's you do the job much easier, if you can accept say 10-20ns of uncertainty. Which you easily should be able to do if you were happy with the 8-bitters. Their synchronization jitter alone was in the same order!

The actual reason we don't almost ever count individual cycles (I have done it once in a project, and it's fully possible!) on modern high-end microcontrollers is not the difficulty of doing so, but the fact there is no need. When the complexity went up, so did the performance. If average case went up by 20x, then worst case possibly went up by 10x. Gone are the days of having to resort to counting instruction cycles to bitbang an interface; because interrupt entry now only takes equivalent to what 0.5 clock cycles was on AVR/PIC, now you can just make a timer generate interrupts and bitbang the protocol there, and let everything else run in parallel. Oh the joy of programming it.

And the differentiating factor with Cortex-A running linux, and Cortex-M running bare metal, is exactly the predictability of worst case timing. Claims about caches being relevant at all are made-up BS arguments shown wrong countless of times. Small sources of jitter do exist (M7 branch prediction; DMA and CPU arbitrating RAM access; clock domain synchronization), but these are all just a few clock cycles, easily understood and dealt with. Most importantly, even if you don't understand every detail, the amount of uncertainty is still so small it's trivially totally swamped by very modest safety margin.

Compared to this, what is really slowing beginner's attempts to write timing-predictable down are overly complex software libraries and layers. Needless to say, for tight control of timing, you need to be in control of the code. And there is no limit how slow and bloated it can get. A perfect example which demonstrates this has nothing to do with MCUs getting more complex, but everything to do with software writing practices becoming bloated, is the horribly slow timing of Arduino's DigitalWrite() and the friends. You can totally have slow and uncertain timing by using bloated libraries, even on the simple and "predictable" MCUs.

tggzzz · « **Reply #57 on:** April 05, 2022, 08:43:50 am »

Ignoring many valid points to concentrate on one...

Quote from: hans on April 05, 2022, 08:17:20 am

Mean is only a fun figure to have if you want to design a desktop program with a certain throughput aim. It's pretty much useless in the embedded space.

As someone that is a strong advocate of processors that guarantee hard realtime operation, I'll make a contrary point from my own experience.

There are many applications which are soft realtime, i.e. where a deadline can be passed without it being a failure, provided that statistical guarantees are met. One example of that is the telephone system and parts of the telephone system. Consider a cost control PAYG application where a call should be terminated when the subscriber's credit falls to zero. Those are specified in terms of mean latencies, although I would prefer 95th percentile latencies.

Now is that an embedded system? Is depends on the scale at which you are looking. The computer is a small part of a large system, and is dedicated to a single function; it is not acting as a general purpose computer. From that angle, it is an embedded system

If you want others, consider the processor in a router. That's probably running a cut down linux, which can't (and doesn't need to) offer hard real time guarantees. Is it embedded?

T3sl4co1l · « **Reply #58 on:** April 05, 2022, 10:09:22 am »

Quote from: tggzzz on April 05, 2022, 08:43:50 am

If you want others, consider the processor in a router. That's probably running a cut down linux, which can't (and doesn't need to) offer hard real time guarantees. Is it embedded?

Also a good illustration of hardware support, whether lightweight or heavy; many have some variety of hardware accelerator to switch packets automatically, handling the busywork that doesn't involve actually thinking about routing, or deep packet inspection, etc.. So the propagation delays can be quite stable indeed in the average case, with occasional bumps to some ~ms if the CPU has to think about something. Every little bit adds up of course, but there's only a couple dozen routers between most nodes on the internet, so a few ms each is tolerable compared to physical delay.

Tim

tggzzz · « **Reply #59 on:** April 05, 2022, 12:09:31 pm »

Quote from: T3sl4co1l on April 05, 2022, 10:09:22 am

Quote from: tggzzz on April 05, 2022, 08:43:50 am
If you want others, consider the processor in a router. That's probably running a cut down linux, which can't (and doesn't need to) offer hard real time guarantees. Is it embedded?

Also a good illustration of hardware support, whether lightweight or heavy; many have some variety of hardware accelerator to switch packets automatically, handling the busywork that doesn't involve actually thinking about routing, or deep packet inspection, etc.. So the propagation delays can be quite stable indeed in the average case, with occasional bumps to some ~ms if the CPU has to think about something. Every little bit adds up of course, but there's only a couple dozen routers between most nodes on the internet, so a few ms each is tolerable compared to physical delay.

Tim

True.

There are other ways to minimise mean latency, when that is critical. The FinTech mob go to the lengths of coding everything - up to and including business trading rules - in FPGAs. Microseconds matter in the high frequency trading world.

SpacedCowboy · « **Reply #60 on:** April 05, 2022, 02:26:41 pm »

*shrug* clock-cycle counting is indeed a thing. Whether it’s *your* thing is a different matter.

As part of a project, I wanted to provide ‘memory apertures’ on an old Atari XL/XE external bus, so by writing to register-space in the ‘external ROM’ area of the memory map, I could configure an offset and length (number of 256-byte pages) where memory i/o would be directed to external DRAM instead of internal RAM. Throw 64MB of RAM on the back, have several (8 in this case) of these memory apertures and there’s a few fun tricks you can play.

Ok, scene set: now the Atari line had two signals that need to be asserted to handle external memory (/MPD and /EXTSEL), these act as signals to turn off access to the system RAM, leaving the external RAM to work unimpeded. They were supposed to be implemented in hardware logic plugged into the back of the machine, so from the point where the address is valid on the bus, to the point where they are asserted low if needed, you have ~40ns within the bus-cycle according to the timing diagrams. In my case this part of the system had to

- read the address off the bus
- compare with the memory ranges that indicate external access
- assert signals if required

All in a hard time budget of 40ns. Has to work every single time or you have memory-access failure. Oh, and since these are old boxes, you also need to take into account any delays for level-changing. Nanoseconds count.

Micro-controllers would be easier, so I looked at the STM32 (just about possible with polling), I looked at XMOS, the RP2040 with its PIO didn’t really have enough pins. Considered the beagle-bone with its PRUs, but in the end went with an FPGA. A few tens of ns more and I’d have gone with a microcontroller. Anything running Linux wasn’t in contention. Anything that couldn’t guarantee a response latency, even if the CPU was doing something critical, wasn’t an option.

And for the record, I wanted the CPU to do more than just provide memory - this was an expansion slots system, and it had its own things to worry about. Going with polling meant extra complexity when managing the rest of the tasks I wanted the system to do. The timing was a lot more relaxed past that critical 40ns period, but it *had* to get that right.

T3sl4co1l · « **Reply #61 on:** April 05, 2022, 03:43:03 pm »

I guess it's kind of interesting that, as simple as parallel buses are, and as powerful as MCUs get -- doing the one with the other, is still one of the most difficult tasks for them to perform.

I suppose that kind of goes all the way up to the root of computing, and what it means: to be able to compute anything, in finite time -- that is, to do it at all, not also within a given time constraint! MCUs break up complex operations into simpler bits repeated (and varied) many times over: the complexity is broken up over time, so that the hardware (instant-to-instant computation) can be simpler. So, it's difficult to do much of anything in more than a handful of clock cycles, when that's the case. At least without very specially written instructions (like the absolute diversity of extensions, SIMD and whatnot, available on the most advanced CPUs) -- or support hardware.

Whereas combinatorial logic, can potentially be very flat and wide, but is a pain to do much of anything with -- it must be configured, or built that way from scratch, and then can only do that one narrow thing (or a few related things, given a bit more flexibility in the logic design).

And, it also speaks to the purpose of such things. MCUs are very egoist, they aren't made to cooperate with other CPUs (or DMAs or other multi-master things) on a single shared bus, they're made to stand alone and do their one thing very well. And acting as a parallel bus receiver isn't one of them, heh. And other than that, if you need more power, just get a better one, don't waste time trying to communicate between many. Whereas the CPUs of old, were often made that way, including parallel bus interface logic right on the chip obviously, but also not being hard to use on a multi-master bus, or multi-CPU system, etc..

So, being able to pull off such a function, with an MCU, is certainly a testament to its raw speed.

Tim

jemangedeslolos · « **Reply #62 on:** April 05, 2022, 03:57:55 pm »

Quote from: NorthGuy on April 04, 2022, 03:29:23 pm

Very well. Now switch to the trigger mode (OCTRIG = 1) and create an interrupt which clears the TRIGSTAT bit after the pulse is produced.

Quote from: SiliconWizard on April 04, 2022, 05:15:54 pm

I haven't use dSPICs for a few years, so I'll have to dig a little deeper.
One first remark: you are apparently trying to trigger OC1 from IC1? Are you sure IC1 is properly configured to begin with?

Hello,

I just want to take my biggest hammer, and hit as hard as I can on my board ( with my head between them )

I tried to set a flag in the IC1 interrupt handler, and then clear OC1CON2.TRIGSTAT in the main loop but it didn't work.
I tried to clear OC1CON2.TRIGSTAT inside the isr, but it didn't work either.

In the datasheet, TRIGSTAT is cleared when IC1RS = OC1TMR if TRIGMODE = 1 so Im not sure if I have to do this.
( But with TRIGMODE = 0, I have no more success ).

Im not 100% sure because it didn't work but I assume IC1 is configured correctly.
If I toggle a pin inside the IC1 handler, I have a pulse synced to the waveform generator on the rising edge.
And in synchronized mode, there is a pulse on OC1 synced to IC1.

In trigger and single shot mode OC1 interrupt fires only once.
inside the isr or inside the main loop, if I setup again OC1CON1.OCM = 0b010, it changes nothing.

In synchronize mode, OC1 interrupt fires every falling edge on the OC1 pin so it is working as expect in this mode.

I must forget something but I don't know what and I can't find anything in the datasheet or in the reference manual.
It's a shame because in synchronized mode I only have around 80ns between IC1 and OC1 rising edges.

If you have any ideas it is very welcome

SiliconWizard · « **Reply #63 on:** April 05, 2022, 06:10:43 pm »

In the end, I'm not completely sure what you have tried here. In trigger mode, have you gotten anything on the OC output pin? Nothing? Or a pulse just once and never again?

To move forward, I would suggest setting the trigger source as INT1 or INT2 and try that first. (But you need to have access to one of the corresponding pins.)

Sal Ammoniac · « **Reply #64 on:** April 05, 2022, 09:20:36 pm »

Quote from: SpacedCowboy on April 05, 2022, 02:26:41 pm

*shrug* clock-cycle counting is indeed a thing. Whether it’s *your* thing is a different matter.

It's probably not a thing with most embedded systems, which don't need that level of performance. The vast majority of MCUs are used in devices that either spend most of their time waiting around for user input, or control slowly changing systems.

Some people have a distorted idea of how fast an embedded system needs to respond to external events. Sure, there are cases where it really does matter, but those aren't typical. And we're lulled by the vast amount of resources (CPU, RAM, flash) we have now compared to what we had in the past. When I started doing embedded decades ago, we had microprocessors running at just a few MHz and just a few KB of RAM and EPROM, yet we were able to do a lot even with those restrictions. Contrast that to the several hundred MHz, 64-256 KB or RAM, and a MB or two of flash we have in MCUs today. And if a fast MCU can't handle it, we have the option of using an FPGA or an ASIC.

Another historical comparison: the Apollo guidance computer used on the moon landing missions. This was certainly an embedded application, and it controlled a spacecraft hurtling towards the moon at thousands of MPH in real-time with little input from the astronauts. The AGC ran at perhaps 1 MHz and had 2K words of RAM and 36K of fixed storage, and most of the guidance and autopilot code was written in an interpreted language that ran even slower than native code. Cycle times were measured in milliseconds, not nanoseconds, yet the whole thing worked fine for its intended purpose. It was able to do this because even with a 100 millisecond control loop it was able to maintain control of the vehicle. Even if they had a modern MCU back then, they probably wouldn't have run the control loop any faster (because it wasn't necessary).

uer166 · « **Reply #65 on:** April 05, 2022, 10:03:52 pm »

Quote from: Sal Ammoniac on April 05, 2022, 09:20:36 pm

Quote from: SpacedCowboy on April 05, 2022, 02:26:41 pm
*shrug* clock-cycle counting is indeed a thing. Whether it’s *your* thing is a different matter.

It's probably not a thing with most embedded systems, which don't need that level of performance. The vast majority of MCUs are used in devices that either spend most of their time waiting around for user input, or control slowly changing systems.

Some people have a distorted idea of how fast an embedded system needs to respond to external events. Sure, there are cases where it really does matter, but those aren't typical. And we're lulled by the vast amount of resources (CPU, RAM, flash) we have now compared to what we had in the past. When I started doing embedded decades ago, we had microprocessors running at just a few MHz and just a few KB of RAM and EPROM, yet we were able to do a lot even with those restrictions. Contrast that to the several hundred MHz, 64-256 KB or RAM, and a MB or two of flash we have in MCUs today. And if a fast MCU can't handle it, we have the option of using an FPGA or an ASIC.

Another historical comparison: the Apollo guidance computer used on the moon landing missions. This was certainly an embedded application, and it controlled a spacecraft hurtling towards the moon at thousands of MPH in real-time with little input from the astronauts. The AGC ran at perhaps 1 MHz and had 2K words of RAM and 36K of fixed storage, and most of the guidance and autopilot code was written in an interpreted language that ran even slower than native code. Cycle times were measured in milliseconds, not nanoseconds, yet the whole thing worked fine for its intended purpose. It was able to do this because even with a 100 millisecond control loop it was able to maintain control of the vehicle. Even if they had a modern MCU back then, they probably wouldn't have run the control loop any faster (because it wasn't necessary).

Not only that, but everywhere where it did matter (i.e. integrating accelerometers and reading gyros), was done using hardware counters. Just like today, if you need some special ultra-fast response time in a current mode controller, you'd use a dedicated peripheral, and not try to shoehorn general purpose computing at it by default. Nothing has fundamentally changed in the last 50 years, and I don't understand people pushing for Xcore or whatever newfangled thing comes up. The AGC wasn't even realtime in a sense that it had a large number of peripherals stealing bus cycles, so the control code execution had variable timing, and was done at a 1 second cadence.

NorthGuy · « **Reply #66 on:** April 05, 2022, 10:14:46 pm »

Quote from: jemangedeslolos on April 05, 2022, 03:57:55 pm

I just want to take my biggest hammer, and hit as hard as I can on my board ( with my head between them )

If you do things methodically, with little steps, the hummer may not be needed.

You already have the synchronization working as you have shown on your screenshot. Hang on to that. Now you only need to switch to trigger mode. So, keep everything in the setup you already have and make only one change - switch to the trigger mode (OCTRIG = 1). Once you do this, the input pulse will reset and trigger the timer and it will continue to run as is. Further pulses will not affect the timer. So, when you start it, there will be no signal at first, but once you get even a single pulse, the signal should appear and should generate pulses forever. Did you get to this point?

NorthGuy · « **Reply #67 on:** April 05, 2022, 10:21:48 pm »

Quote from: Siwastaja on April 05, 2022, 08:19:13 am

Interrupt latency on Cortex-M7 at 400MHz is 30ns, and this includes automatic stacking of registers and fetching the vector address. With code in ITCM, the source for jitter is due to logic synchronization - something that can't be avoided. Let's call it 1 clock cycle. So average 30ns with 2.5 ns of jitter.

Are these theoretical, or have you tried this in the real world? If theoretical, I suggest you try to measure the reaction time and jitter and post the results. It'll be interesting to see.

nctnico · « **Reply #68 on:** April 05, 2022, 11:01:28 pm »

Quote from: Sal Ammoniac on April 05, 2022, 09:20:36 pm

Quote from: SpacedCowboy on April 05, 2022, 02:26:41 pm
*shrug* clock-cycle counting is indeed a thing. Whether it’s *your* thing is a different matter.

It's probably not a thing with most embedded systems, which don't need that level of performance. The vast majority of MCUs are used in devices that either spend most of their time waiting around for user input, or control slowly changing systems.

Some people have a distorted idea of how fast an embedded system needs to respond to external events. Sure, there are cases where it really does matter, but those aren't typical.

I fully agree! And often it is a much better idea to let hardware deal with critical timing instead of trying to shoehorn that into software. Especially when you take into consideration that a lot of embedded code runs many parallel tasks nowadays.

Quote

Another historical comparison: the Apollo guidance computer used on the moon landing missions. This was certainly an embedded application, and it controlled a spacecraft hurtling towards the moon at thousands of MPH in real-time with little input from the astronauts. The AGC ran at perhaps 1 MHz and had 2K words of RAM and 36K of fixed storage, and most of the guidance and autopilot code was written in an interpreted language that ran even slower than native code. Cycle times were measured in milliseconds, not nanoseconds, yet the whole thing worked fine for its intended purpose. It was able to do this because even with a 100 millisecond control loop it was able to maintain control of the vehicle. Even if they had a modern MCU back then, they probably wouldn't have run the control loop any faster (because it wasn't necessary).

That is an excellent example!

I have another one: some people loath the use of soft floating point. In one of my more recent projects I used just that from within an interrupt that needs to update a realtime process with similar cycle times and more than enough processing power to spare. Hardware took care of doing time critical stuff like updating the DAC at precise intervals. If a solution serves it's purpose, it is good. Job done.

tggzzz · « **Reply #69 on:** April 05, 2022, 11:42:53 pm »

Quote from: Sal Ammoniac on April 05, 2022, 09:20:36 pm

Another historical comparison: the Apollo guidance computer used on the moon landing missions. This was certainly an embedded application, and it controlled a spacecraft hurtling towards the moon at thousands of MPH in real-time with little input from the astronauts. The AGC ran at perhaps 1 MHz and had 2K words of RAM and 36K of fixed storage, and most of the guidance and autopilot code was written in an interpreted language that ran even slower than native code. Cycle times were measured in milliseconds, not nanoseconds, yet the whole thing worked fine for its intended purpose. It was able to do this because even with a 100 millisecond control loop it was able to maintain control of the vehicle. Even if they had a modern MCU back then, they probably wouldn't have run the control loop any faster (because it wasn't necessary).

Er, not quite.

You should look up "program alarm 1202".

Good defensive system design saved the day; the 25 year old controllers made the right call to ignore that alarm.

SpacedCowboy · « **Reply #70 on:** April 06, 2022, 12:26:21 am »

Just to be clear here, I thought my first line "Whether it’s *your* thing is a different matter" was indicative, but ...

I'm not saying that all microcontrollers ought to be used in this way, my point was that it's a valid use-case for them. If something is predictable and has sufficiently deterministic latency to provide a solution, then it's a candidate - in my book at least. There may be *better* candidates, and if so, choose that. Sometimes the criterion is price, and specialist hardware doesn't win out over general purpose software....

In my case, the microcontroller was *almost* there, and would have been cheaper if it was. The FPGA needs more support chips for level-changers (non-H STMs are 5v-tolerant) and need reset-supervisors, multiple sequenced voltage rails etc., and availability meant a BGA part, which is more expensive in manufacturing and more complex to design for.

It's good to have choices.

Siwastaja · « **Reply #71 on:** April 06, 2022, 05:58:25 am »

Quote from: NorthGuy on April 05, 2022, 10:21:48 pm

Are these theoretical, or have you tried this in the real world? If theoretical, I suggest you try to measure the reaction time and jitter and post the results. It'll be interesting to see.

Good thing about theory is that it matches with reality. Otherwise, theory is faulty.

ARM Cortex-M7 interrupt latency is guaranteed by ARM, by design. There is no need for me to test it.

But the thing is, we are not discussing any full real-world problem here, just limiting us to the CPU. In reality, we need inputs and outputs, and their synchronization to whatever clock domains (and across clock domains) will be a source for delay and jitter. But this applies to any design. It would not be trivial to write a custom FPGA processing core which could run at 400MHz, for example, either.

This is also what makes the claim of "jitterless" and "predictable" xCORE funny. It is limited by the exact same mechanism as all other MCUs and even FPGAs: the nature of synchronous digital logic and its synchronization requirements. The jitter is in the same order of magnitude as on an ARM CPU.

tggzzz · « **Reply #72 on:** April 06, 2022, 07:28:20 am »

Quote from: Siwastaja on April 06, 2022, 05:58:25 am

This is also what makes the claim of "jitterless" and "predictable" xCORE funny. It is limited by the exact same mechanism as all other MCUs and even FPGAs: the nature of synchronous digital logic and its synchronization requirements. The jitter is in the same order of magnitude as on an ARM CPU.

Some ARM cpus, not others. Some programs running on ARMs, not others.

Do any of the ARM toolchains state how long it takes for the program to get from this instruction to that instruction?

Of course there is irreduceable jitter related to the clock period and clock jitter. I didn't think that needed to be stated explicitly.

jemangedeslolos · « **Reply #73 on:** April 06, 2022, 09:16:42 am »

Quote from: SiliconWizard on April 05, 2022, 06:10:43 pm

In the end, I'm not completely sure what you have tried here. In trigger mode, have you gotten anything on the OC output pin? Nothing? Or a pulse just once and never again?

To move forward, I would suggest setting the trigger source as INT1 or INT2 and try that first. (But you need to have access to one of the corresponding pins.)

Hello, I will try with INT1 instead of IC1. As this MCU has peripheral pin select, I can assign INTx to almost any pin

Quote from: NorthGuy on April 05, 2022, 10:14:46 pm

Quote from: jemangedeslolos on April 05, 2022, 03:57:55 pm
I just want to take my biggest hammer, and hit as hard as I can on my board ( with my head between them )

If you do things methodically, with little steps, the hummer may not be needed.

You already have the synchronization working as you have shown on your screenshot. Hang on to that. Now you only need to switch to trigger mode. So, keep everything in the setup you already have and make only one change - switch to the trigger mode (OCTRIG = 1). Once you do this, the input pulse will reset and trigger the timer and it will continue to run as is. Further pulses will not affect the timer. So, when you start it, there will be no signal at first, but once you get even a single pulse, the signal should appear and should generate pulses forever. Did you get to this point?

Hello,
If I only switch OCTRIG to 1 ( and keep OC1CON1.OCM to 0b101 ), I have nothing on OC1 ( high level ) until I enable pulse in the IC1 pin.
And then, I have continuous 20us pulses on OC1 just like you said

With OCTRIG = 1 and OC1CON1.OCM = 0b010, I have nothing on OC1 ( low level ) with or without pulse in the IC1 pin.
OC1 interrupt fires only one time and never fires again. It is the same with OC1CON1.TRIGMODE = 0 or OC1CON1.TRIGMODE = 1

In both cases I tried to reset OC1CON2.TRIGSTAT flag inside the OC1_isr but without any success

jemangedeslolos · « **Reply #74 on:** April 06, 2022, 02:33:34 pm »

Hello again,

I have the exact same result with INT1 ( external interrupt 1 ) instead of IC1 ( input capture 1 ).
With OC1CON2.OCTRIG = 0, the results are as expected but I can't get it working with OC1CON2.OCTRIG = 1.
OC1 interrupt fires only one time and never fires again.

The interesting thing is that the latency is lower with external interrupt. I measure only around 50ns between input and output rising edges instead of 80ns with input capture.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Best MCU for the lowest input capture interrupt latency (Read 18087 times)

Share me