XMOS xCORE is actually a pretty interesting architecture that packs a lot of power.
It is just that there range of applications is a bit niche, they are too big, complex and power hungry to be a MCU replacement while they don't quite have the throughput of FPGAs. They tend to shine the most when you need weird interfaces at moderate data rates. This could be a fitting use for it.
But you can also get around the cache timing issues on modern MCUs. Most of them are ARM and it can execute code from anywhere, so you can just put a ISR into RAM closest to the CPU and that one typically runs at the same clock speed as the CPU so there is no latency variability. But yeah not when you have to watch for multiple pulses, timer peripherals are there for a reason.
There is, IMNSHO, a better alternative if you are prepared to accept single source ICs that you can buy at Digikey: the XMOS xCORE.
I do want something between Xcore and a regular MCU.
I haven't been using XMOS since around the 2nd generation chips (Around where they renamed threads to being 'cores' and cores to 'tiles' for marketing reasons).
Back then they ware a bit limited by having only 64KB or RAM for each core (including instruction memory) and had basically no peripherals, so a lot of those impressive MIPS numbers got wasted on bit banging things like UART, SPI, I2C etc.. Admittedly they are incredibly good at 'bitbanging' protocols due to the super tight timing control they do, but it would be nice to have some standard peripherals for the common stuff. It is also really cool how you could link together multiple chips and it acts as 'one big computer'
I do want something between Xcore and a regular MCU.
Yep i been messing with the Pi Pico and i quite liked it.
Didn't go too deep in it, but one thing i hated is how much dicking around is needed to get a C++ IDE with debugging working on Windows. I ended up using Visual Studio with VisualGDB and a spare Pico flashed into being a SWD interface dongle, They seamed to be pushing that microPython thing too much.
Never used the PIO functionality, but i was certainly really impressed with what others done to it.
I'm considering a particular application and would very much appreciated some thoughts and/or recommendations.
I've used a particular MCU on a number of projects, and am now looking at a slight variation on a previous design, for which I'm considering the addition of an FPGA. I need to measure (frequency, duty cycle, etc.) a number of input signals (12+), and perform deterministic timestamping of each input at a specified rate (i.e. 100Hz). I've achieved something similar using the MCU with high priority interrupts in the past, but the overhead of servicing these interrupts, as well as the jitter of the timestamping function make this approach unsuitable in this particular application. I also looked at using the Timer Counter (TC) functionality of the MCU, but despite having 12 16-bit counters, there are only three available external inputs, which obviously doesn't suffice.What timing resolution and bits do you need to capture ? How many edges/second ?
You mention 100Hz, which suggests modest capture rates ? a HW capture module could be SW unloaded fine, at those rates.
100Hz is just an example in this case. I might see requirements orders of magnitude faster and slower than that depending on the particular implementation. Even if the logging rate is relatively slow, the input signals themselves might be relatively fast, so what I'm trying to avoid is having numerous (12 to 20+) inputs, all with 'relatively' high frequency (kHz+) having to be measured in ISRs within a single core MCU. Each rising and falling edge would need to be timestamped (if measuring duty cycle), which means multiple ISRs per channel, which can introduce a not insignificant amount of jitter in servicing the interrupts, timestamping, etc., as well as in servicing of the other MCU tasks (i.e. ADC result processing, non-DMA memory management, data staging, logging and transmission, etc.).IMHO this approach is just horribly wrong. Actually in two ways: A) never use interrupts on inputs that can change at will because they can and will lockup your application due to an unforeseen circumstance (which can be as simple as a lose wire, nearby noise source, etc). B) don't use interrupts when they interfere with eachother; find a common denominator and combine multiple interrupts into 1. IOW: Use a single timer interrupt in which you first sample the GPIO port (preferably having all the input pins on the same port) and then process the state of the inputs as needed. Keep in mind that after the GPIO input register (containing ALL input levels as a single read operation) has been read, there is no source of jitter that can be added to the signal so processing time doesn't matter for as long as you do it fast enough before the timer interval has passed. On a dual core MCU running at several hundred MHz (NXP's RT1000 series for example) you should be able to achieve time resolution in the sub-microsecond regions without much effort.
If interrupt jitter gives too much uncertainty, an option is to look for a controller that supports timer-triggered DMA transfers (one address to double buffer) which allows to process captured data 'en bloc' without having interrupt overhead as well.
timer-triggered DMA transfers (one address to double buffer) which allows to process captured data 'en bloc' without having interrupt overhead as well.
I'm considering a particular application and would very much appreciated some thoughts and/or recommendations.
I've used a particular MCU on a number of projects, and am now looking at a slight variation on a previous design, for which I'm considering the addition of an FPGA. I need to measure (frequency, duty cycle, etc.) a number of input signals (12+), and perform deterministic timestamping of each input at a specified rate (i.e. 100Hz). I've achieved something similar using the MCU with high priority interrupts in the past, but the overhead of servicing these interrupts, as well as the jitter of the timestamping function make this approach unsuitable in this particular application. I also looked at using the Timer Counter (TC) functionality of the MCU, but despite having 12 16-bit counters, there are only three available external inputs, which obviously doesn't suffice.What timing resolution and bits do you need to capture ? How many edges/second ?
You mention 100Hz, which suggests modest capture rates ? a HW capture module could be SW unloaded fine, at those rates.
100Hz is just an example in this case. I might see requirements orders of magnitude faster and slower than that depending on the particular implementation. Even if the logging rate is relatively slow, the input signals themselves might be relatively fast, so what I'm trying to avoid is having numerous (12 to 20+) inputs, all with 'relatively' high frequency (kHz+) having to be measured in ISRs within a single core MCU. Each rising and falling edge would need to be timestamped (if measuring duty cycle), which means multiple ISRs per channel, which can introduce a not insignificant amount of jitter in servicing the interrupts, timestamping, etc., as well as in servicing of the other MCU tasks (i.e. ADC result processing, non-DMA memory management, data staging, logging and transmission, etc.).IMHO this approach is just horribly wrong. Actually in two ways: A) never use interrupts on inputs that can change at will because they can and will lockup your application due to an unforeseen circumstance (which can be as simple as a lose wire, nearby noise source, etc). B) don't use interrupts when they interfere with eachother; find a common denominator and combine multiple interrupts into 1. IOW: Use a single timer interrupt in which you first sample the GPIO port (preferably having all the input pins on the same port) and then process the state of the inputs as needed. Keep in mind that after the GPIO input register (containing ALL input levels as a single read operation) has been read, there is no source of jitter that can be added to the signal so processing time doesn't matter for as long as you do it fast enough before the timer interval has passed. On a dual core MCU running at several hundred MHz (NXP's RT1000 series for example) you should be able to achieve time resolution in the sub-microsecond regions without much effort.
If interrupt jitter gives too much uncertainty, an option is to look for a controller that supports timer-triggered DMA transfers (one address to double buffer) which allows to process captured data 'en bloc' without having interrupt overhead as well.
100Hz is just an example in this case. I might see requirements orders of magnitude faster and slower than that depending on the particular implementation. Even if the logging rate is relatively slow, the input signals themselves might be relatively fast, so what I'm trying to avoid is having numerous (12 to 20+) inputs, all with 'relatively' high frequency (kHz+) having to be measured in ISRs within a single core MCU. Each rising and falling edge would need to be timestamped (if measuring duty cycle), which means multiple ISRs per channel, which can introduce a not insignificant amount of jitter in servicing the interrupts, timestamping, etc., as well as in servicing of the other MCU tasks (i.e. ADC result processing, non-DMA memory management, data staging, logging and transmission, etc.).
...
The Parallax is actually a really interesting option. I've not used them before, but on paper they would certainly handle this particular task. Having not used that platform before, the assembly language is certainly a bit foreign!
There is, IMNSHO, a better alternative if you are prepared to accept single source ICs that you can buy at Digikey: the XMOS xCORE.
Overall my summary is that it slots into the area where traditional microprocessors are pushing it but where FPGAs are overkill. You get the benefits of fast software iteration and timings guaranteed by design, not measurement.
Why so good? The hardware and software are designed from the ground up for hard realtime operation:
- each i/o port has its own (easily accessible) timer and FPGA-like simple/strobed/clocked/SERDES with pattern matching
- dedicate one (of 32) core and task to each I/O or group of I/Os
- software typically looks like the obvious
- initialise
- loop forever, waiting for input or timeout or message then instantly resume and do the processing
You mention ISRs as introducing jitter, but omit to mention caches. The xCORE devices have neither They do have up to 32 cores/chip 4000MIPS/chip (expandable).
Each IO port has its own timers, so guaranteeing output on a specific clock cycle, and measuring the specific clock cycle on which input arrived.
The equivalent of an RTOS (comms, scheduling) is implemented in silicon.
There is a long and solid theoretical and practical pedigree for the hardware (Transputer 1980s) and software (CSP/Occam 1970s).
The development tools (command line and IDE) will inspect the optimised code to determine exactly how many clock cycles it will take to get from here to there. None of this "measure and hope you have spotted the worst case" rubbish!
When I used it, I found it stunningly easy, having the first iteration of code working within a a day. There were no hardware nor software surprises, no errata sheets.
For a remarkably information dense flyer on the basic architecture, see https://www.xmos.ai/download/xCORE-Architecture-Flyer(1.3).pdf
For a glimpse of the software best/worst case timing calculations, see
I'm considering a particular application and would very much appreciated some thoughts and/or recommendations.
I've used a particular MCU on a number of projects, and am now looking at a slight variation on a previous design, for which I'm considering the addition of an FPGA. I need to measure (frequency, duty cycle, etc.) a number of input signals (12+), and perform deterministic timestamping of each input at a specified rate (i.e. 100Hz). I've achieved something similar using the MCU with high priority interrupts in the past, but the overhead of servicing these interrupts, as well as the jitter of the timestamping function make this approach unsuitable in this particular application. I also looked at using the Timer Counter (TC) functionality of the MCU, but despite having 12 16-bit counters, there are only three available external inputs, which obviously doesn't suffice.What timing resolution and bits do you need to capture ? How many edges/second ?
You mention 100Hz, which suggests modest capture rates ? a HW capture module could be SW unloaded fine, at those rates.
100Hz is just an example in this case. I might see requirements orders of magnitude faster and slower than that depending on the particular implementation. Even if the logging rate is relatively slow, the input signals themselves might be relatively fast, so what I'm trying to avoid is having numerous (12 to 20+) inputs, all with 'relatively' high frequency (kHz+) having to be measured in ISRs within a single core MCU. Each rising and falling edge would need to be timestamped (if measuring duty cycle), which means multiple ISRs per channel, which can introduce a not insignificant amount of jitter in servicing the interrupts, timestamping, etc., as well as in servicing of the other MCU tasks (i.e. ADC result processing, non-DMA memory management, data staging, logging and transmission, etc.).IMHO this approach is just horribly wrong. Actually in two ways: A) never use interrupts on inputs that can change at will because they can and will lockup your application due to an unforeseen circumstance (which can be as simple as a lose wire, nearby noise source, etc). B) don't use interrupts when they interfere with eachother; find a common denominator and combine multiple interrupts into 1. IOW: Use a single timer interrupt in which you first sample the GPIO port (preferably having all the input pins on the same port) and then process the state of the inputs as needed. Keep in mind that after the GPIO input register (containing ALL input levels as a single read operation) has been read, there is no source of jitter that can be added to the signal so processing time doesn't matter for as long as you do it fast enough before the timer interval has passed. On a dual core MCU running at several hundred MHz (NXP's RT1000 series for example) you should be able to achieve time resolution in the sub-microsecond regions without much effort.
If interrupt jitter gives too much uncertainty, an option is to look for a controller that supports timer-triggered DMA transfers (one address to double buffer) which allows to process captured data 'en bloc' without having interrupt overhead as well.
Thanks for your input, I really appreciate it.
I have actually considered this use case as well. I could set aside an entire 32 bit IO port for these digital input signals, allowing for a single read of the entire port (either in a software interrupt or via DMA transfer to buffer as you've suggested).
One clarification I do have though. If the port read is driven by a timer interrupt, is the only way to guarantee that a given channel hasn't toggled between timer interrupts to select a timer frequency that is guaranteed to be faster than the input signals?
Thanks for your input, it's greatly appreciated.
The option of having additional 'front-end' MCUs, feeding into the larger, more capable MCU is actually quite an interesting one.
This is an excellent post, thank you for taking the time to share your knowledge. I haven't considered the xCORE platform before, but it sounds like it would be of considerable interest to me as I look to move from the MCU domain into FPGAs.
Thanks for your input, it's greatly appreciated.
The option of having additional 'front-end' MCUs, feeding into the larger, more capable MCU is actually quite an interesting one.
I have done it that way before with extra microcontrollers of the same type handling the display, keyboard, real time I/O, whatever. Microcontrollers are so inexpensive that it can make sense, and if they are of the same type, then you already have the development system and tools.
It seems like an awful waste to replace a bunch of discrete logic with an entire microcontroller, but the economics favor it.
For understanding how to program the devices, see https://www.xmos.ai/download/XMOS-Programming-Guide-(documentation)(F).pdf It is remarkably easy to read, and introduces the key concepts gently, but not too gently. Basically it assumes you know how to program in C, and want to find out the ways in which this ecosystem is an improvement.
For understanding the i/o ports' features and capabilities, see https://www.xmos.ai/download/Introduction-to-XS1-ports(3).pdf
I will contend that even if you don't use xCORE, using the high level concepts will improve the sructure of your designs. After all, they have been around since the 1970s (CAR Hoare's Communicating Sequential Processes), and some of the concepts keep reappearing in various processors and languages (e.g. Go, most recently).
I have actually considered this use case as well. I could set aside an entire 32 bit IO port for these digital input signals, allowing for a single read of the entire port (either in a software interrupt or via DMA transfer to buffer as you've suggested).
One clarification I do have though. If the port read is driven by a timer interrupt, is the only way to guarantee that a given channel hasn't toggled between timer interrupts to select a timer frequency that is guaranteed to be faster than the input signals?
It seems like an awful waste to replace a bunch of discrete logic with an entire microcontroller, but the economics favor it.
I have actually considered this use case as well. I could set aside an entire 32 bit IO port for these digital input signals, allowing for a single read of the entire port (either in a software interrupt or via DMA transfer to buffer as you've suggested).
One clarification I do have though. If the port read is driven by a timer interrupt, is the only way to guarantee that a given channel hasn't toggled between timer interrupts to select a timer frequency that is guaranteed to be faster than the input signals?If you timer sample, then yes, your aperture is the timer rate.
Typically, to reduce memory needed, you check for any change, and only save (Pin+Time) pairs on a change.
However, you do not have to use a timer tick, if your inputs are low rate and you need better edge precision, you can use a pin-change / port-match interrupt, that then captures the timer.
The granularity is the timer, but there is a rate-window/blanking window which is the time to service that interrupt.
Pin-Change interrupts can even capture very narrow pulses by inference.
As always, the hardware is relatively easy and cheap. Hells teeth, doesn't a very simple MCU cost less than a 555 timer+passives now?!
The software is yet to catch up. xC is one good starting point, but we need more.
As always, the hardware is relatively easy and cheap. Hells teeth, doesn't a very simple MCU cost less than a 555 timer+passives now?!
The software is yet to catch up. xC is one good starting point, but we need more.
But if programmable logic is being added to a microcontroller, then that requires "software" as well, and it will be developed in a completely different programming environment. At least if the same series of microcontrollers is used, the software development environment can be the same for all of them.
until softies can think in fine grained parallel
until softies can think in fine grained parallelActually FPGAs are not fine grained parallel, they are just parallel, because FPGA don't execute anything, but wire up and program logic gates. The term "fine grained parallelism" means instruction-level parallel execution (meaning execution unit can execute instructions from different instruction stream on each cycle), which is what GPUs and a lot of various "AI accelerators" do. And with proliferation of the likes of CUDA this is no longer something out of ordinary for software folks. But this term can not be applied to FPGA because there is no "execution units" (unless FPGA designer puts one in), not does it "calculate" anything (again unless designed to do so), but it wires an actual logic gates and other hardware blocks.