MCU with FPGA vs. SoC FPGA

#25 Reply
Posted by tggzzz on 10 Jul, 2023 18:06
Quote from: Berni on 10 Jul, 2023 17:00
XMOS xCORE is actually a pretty interesting architecture that packs a lot of power.

It is just that there range of applications is a bit niche, they are too big, complex and power hungry to be a MCU replacement while they don't quite have the throughput of FPGAs. They tend to shine the most when you need weird interfaces at moderate data rates. This could be a fitting use for it.

I think that is a little harsh; they are much more than "weird interfaces" They certainly aren't FPGAs and won't replace them, but they are a half-way house between conventional MCUs and FPGAs. They have the advantages of both, within their fundamental constraints.

I sure as hell hope they won't be the final answer to hard realtime and parallel computation - we need better. But they sure as hell are a good way of making people realise the limitations of conventional MCUs and languages.

Quote
But you can also get around the cache timing issues on modern MCUs. Most of them are ARM and it can execute code from anywhere, so you can just put a ISR into RAM closest to the CPU and that one typically runs at the same clock speed as the CPU so there is no latency variability. But yeah not when you have to watch for multiple pulses, timer peripherals are there for a reason.

That can be part of the solution, but with limitations.

There is no way in hell that an off-the-shelf toolchain can accurately predict that multiple i/o operations and processing are guaranteed to complete on time. You might be able to infer part of those guarantees from the design architecture and implementation, but such inferences will require a lot of manual intervention and faith you haven't missed something.

Notable points about the xCORE ecosystem is the extremely competent way in which the hardware supports parallel operation and timing guarantees, the language concepts support parallel computation, and the language details support timely interaction with hardware. In comparison, C on conventional MCUs is stone-age technology!

As someone that has long been frustrated by the needless, false and damaging division between "hardware" and "software", the xCORE hardware/software/toolchain ecosystem beautifully blurs the distinctions - and shows what is possible.

#26 Reply
Posted by PCB.Wiz on 11 Jul, 2023 02:52
Quote from: tggzzz on 10 Jul, 2023 10:00
There is, IMNSHO, a better alternative if you are prepared to accept single source ICs that you can buy at Digikey: the XMOS xCORE.
Yes, XMOS parts have a place, but the price is quite high - cheapest at Digikey is over $10 100+, so you have to really need the MIPs on offer.

The Pi PICO PIO is simpler, but more than good enough for the OP's 100ns and some kHz specs, and it is far cheaper and available in module form.

#27 Reply
Posted by Berni on 11 Jul, 2023 05:32
I haven't been using XMOS since around the 2nd generation chips (Around where they renamed threads to being 'cores' and cores to 'tiles' for marketing reasons).

Back then they ware a bit limited by having only 64KB or RAM for each core (including instruction memory) and had basically no peripherals, so a lot of those impressive MIPS numbers got wasted on bit banging things like UART, SPI, I2C etc.. Admittedly they are incredibly good at 'bitbanging' protocols due to the super tight timing control they do, but it would be nice to have some standard peripherals for the common stuff. It is also really cool how you could link together multiple chips and it acts as 'one big computer'

It is certainly a big step forward, but was not quite mature back then. Even the compiler was a bit rough around the edges at first.

I do want something between Xcore and a regular MCU.

#28 Reply
Posted by PCB.Wiz on 11 Jul, 2023 06:54
Quote from: Berni on 11 Jul, 2023 05:32
I do want something between Xcore and a regular MCU.

The Parallax parts are probably the closest, with Pi PICO PIO giving another point on the price curve.
Parallax parts have 8 real/true separate cores, and 32b operations.
The older 40/44 pin P8X32A has 32k RAM for code/data and 512 words for code in each core.
The newer 100 pin P2X8C4M64P, has 512k RAM and smarter peripheral support at the pins, plus a DMA like streamer.

The Pi PICO PIO supports just 32 opcodes in the state engine area, that can be across up to 4 state engines.
Still, it's surprising what people have that doing, and it is much cheaper than XMOS or Parallax parts.

lcsc shows RP2040 at US$0.7366/100+ and 229870 In Stock

#29 Reply
Posted by Berni on 11 Jul, 2023 07:19
Yep i been messing with the Pi Pico and i quite liked it.

Didn't go too deep in it, but one thing i hated is how much dicking around is needed to get a C++ IDE with debugging working on Windows. I ended up using Visual Studio with VisualGDB and a spare Pico flashed into being a SWD interface dongle, They seamed to be pushing that microPython thing too much.

Never used the PIO functionality, but i was certainly really impressed with what others done to it.

#30 Reply
Posted by tggzzz on 11 Jul, 2023 09:57
Quote from: Berni on 11 Jul, 2023 05:32
I haven't been using XMOS since around the 2nd generation chips (Around where they renamed threads to being 'cores' and cores to 'tiles' for marketing reasons).

Back then they ware a bit limited by having only 64KB or RAM for each core (including instruction memory) and had basically no peripherals, so a lot of those impressive MIPS numbers got wasted on bit banging things like UART, SPI, I2C etc.. Admittedly they are incredibly good at 'bitbanging' protocols due to the super tight timing control they do, but it would be nice to have some standard peripherals for the common stuff. It is also really cool how you could link together multiple chips and it acts as 'one big computer'

Some do have standard peripherals, e.g. ethernet and USB interfaces. But you don't need them, e.g. you can capture+generate 100Mb/s ethernet packet bit streams in software, as well as processing the information in the packets.

But there is an argument that while conventional MCUs absolutely require highly specialised peripherals, they are less necessary in the xCORE ecosystem....

Overall peripherals are merely means to bang bits, one way or another. A conventional MCU's processor could do bit banging with a very simple peripheral - but it couldn't do much else while guaranteeing the timing. In order that the MCU can do other "useful" processing, its peripherals have to offload processing - which makes them more specialised, complex, complicated.

With xCORE you simply dedicate one of many cores for the I/O, thus avoiding the need for specific peripheral hardware. Plus there is the major benefit that the i/o information is delivered directly to your application for processing, without a complicated software lump in the middle. (The RTOS is implemented in silicon )

Quote
I do want something between Xcore and a regular MCU.

The benefit is derived from the hardware plus the software plus the toolchain, as I'm sure you are aware. The specific processor is less important. That was demonstrated by the XMOS device that had an ARM as one core, integrated into the software ecosystem. It doesn't seem to have been beneficial.

I wonder whether some processor's PIO "mini processors" can achieve something similar. I've looked at some, decided they looked complex and complicated to use, and if you discovered something that didn't quite fit (size, complexity) then you hit a brick wall. Both FPGAs and xCORE avoid such brick walls. I don't like discovering brick walls halfway through an implementation

#31 Reply
Posted by tggzzz on 11 Jul, 2023 10:10
Quote from: Berni on 11 Jul, 2023 07:19
Yep i been messing with the Pi Pico and i quite liked it.

Didn't go too deep in it, but one thing i hated is how much dicking around is needed to get a C++ IDE with debugging working on Windows. I ended up using Visual Studio with VisualGDB and a spare Pico flashed into being a SWD interface dongle, They seamed to be pushing that microPython thing too much.

Never used the PIO functionality, but i was certainly really impressed with what others done to it.

Yeah, I hate that too.

As I've said, I was gobsmacked at how long it didn't take between receiving the IDE and devboard, and having key features of my application running. Hours, not days. Everything just worked as expected, without surprises.

One question I would have about PIO functionality is whether the knowledge and experience would be useful in two years time.

The PIO functionality seems very low level and tied to a specific device. Overall they are "just" a way of making your own slightly specialised hardware peripheral.

While XMOS is single sourced, the high level concepts have been demontrated and proven over many decades. They strongly encourage thinking in terms of multiple events being processed simultaneously by the (multiple) FSMs as seen in the requirements documents. I don't feel the PIOs help much at that level.

#32 Reply
Posted by JPortici on 11 Jul, 2023 10:37
Quote from: nctnico on 08 Jul, 2023 21:27
Quote from: jars121 on 07 Jul, 2023 00:01
Quote from: PCB.Wiz on 06 Jul, 2023 23:16
Quote from: jars121 on 06 Jul, 2023 00:29
I'm considering a particular application and would very much appreciated some thoughts and/or recommendations.

I've used a particular MCU on a number of projects, and am now looking at a slight variation on a previous design, for which I'm considering the addition of an FPGA. I need to measure (frequency, duty cycle, etc.) a number of input signals (12+), and perform deterministic timestamping of each input at a specified rate (i.e. 100Hz). I've achieved something similar using the MCU with high priority interrupts in the past, but the overhead of servicing these interrupts, as well as the jitter of the timestamping function make this approach unsuitable in this particular application. I also looked at using the Timer Counter (TC) functionality of the MCU, but despite having 12 16-bit counters, there are only three available external inputs, which obviously doesn't suffice.
What timing resolution and bits do you need to capture ? How many edges/second ?
You mention 100Hz, which suggests modest capture rates ? a HW capture module could be SW unloaded fine, at those rates.

100Hz is just an example in this case. I might see requirements orders of magnitude faster and slower than that depending on the particular implementation. Even if the logging rate is relatively slow, the input signals themselves might be relatively fast, so what I'm trying to avoid is having numerous (12 to 20+) inputs, all with 'relatively' high frequency (kHz+) having to be measured in ISRs within a single core MCU. Each rising and falling edge would need to be timestamped (if measuring duty cycle), which means multiple ISRs per channel, which can introduce a not insignificant amount of jitter in servicing the interrupts, timestamping, etc., as well as in servicing of the other MCU tasks (i.e. ADC result processing, non-DMA memory management, data staging, logging and transmission, etc.).
IMHO this approach is just horribly wrong. Actually in two ways: A) never use interrupts on inputs that can change at will because they can and will lockup your application due to an unforeseen circumstance (which can be as simple as a lose wire, nearby noise source, etc). B) don't use interrupts when they interfere with eachother; find a common denominator and combine multiple interrupts into 1. IOW: Use a single timer interrupt in which you first sample the GPIO port (preferably having all the input pins on the same port) and then process the state of the inputs as needed. Keep in mind that after the GPIO input register (containing ALL input levels as a single read operation) has been read, there is no source of jitter that can be added to the signal so processing time doesn't matter for as long as you do it fast enough before the timer interval has passed. On a dual core MCU running at several hundred MHz (NXP's RT1000 series for example) you should be able to achieve time resolution in the sub-microsecond regions without much effort.

If interrupt jitter gives too much uncertainty, an option is to look for a controller that supports timer-triggered DMA transfers (one address to double buffer) which allows to process captured data 'en bloc' without having interrupt overhead as well.

Agree 100%, i also think the problem is being tackled the wrong way

#33 Reply
Posted by DiTBho on 11 Jul, 2023 14:08
Quote
timer-triggered DMA transfers (one address to double buffer) which allows to process captured data 'en bloc' without having interrupt overhead as well.

yup, nowadays they are not rare, so why do not take advantage of dedicated hw?
best advice ever!

#34 Reply
Posted by iMo on 11 Jul, 2023 19:02
Forth is interesting when running an MCU in FPGA. The machine is simple and small, while pretty fast.
I ran the j1a 16bit machine under MecrispForth in Lattice ice40UP5k FPGA at 24MHz and it processed 4+ concurrent firing random rising edge triggered interrupts at 100kHz just fine (a BluePill generated a 4bit wide random pattern each 10us). In each ISR a 32bit counter was incremented and I knew the final numbers in the 4 counters after a given number of sequences (ie 2billion) and it always matched. In addition the millis interrupt was firing as well.

#35 Reply
Posted by jars121 on 11 Jul, 2023 22:39
Quote from: nctnico on 08 Jul, 2023 21:27
Quote from: jars121 on 07 Jul, 2023 00:01
Quote from: PCB.Wiz on 06 Jul, 2023 23:16
Quote from: jars121 on 06 Jul, 2023 00:29
I'm considering a particular application and would very much appreciated some thoughts and/or recommendations.

I've used a particular MCU on a number of projects, and am now looking at a slight variation on a previous design, for which I'm considering the addition of an FPGA. I need to measure (frequency, duty cycle, etc.) a number of input signals (12+), and perform deterministic timestamping of each input at a specified rate (i.e. 100Hz). I've achieved something similar using the MCU with high priority interrupts in the past, but the overhead of servicing these interrupts, as well as the jitter of the timestamping function make this approach unsuitable in this particular application. I also looked at using the Timer Counter (TC) functionality of the MCU, but despite having 12 16-bit counters, there are only three available external inputs, which obviously doesn't suffice.
What timing resolution and bits do you need to capture ? How many edges/second ?
You mention 100Hz, which suggests modest capture rates ? a HW capture module could be SW unloaded fine, at those rates.

100Hz is just an example in this case. I might see requirements orders of magnitude faster and slower than that depending on the particular implementation. Even if the logging rate is relatively slow, the input signals themselves might be relatively fast, so what I'm trying to avoid is having numerous (12 to 20+) inputs, all with 'relatively' high frequency (kHz+) having to be measured in ISRs within a single core MCU. Each rising and falling edge would need to be timestamped (if measuring duty cycle), which means multiple ISRs per channel, which can introduce a not insignificant amount of jitter in servicing the interrupts, timestamping, etc., as well as in servicing of the other MCU tasks (i.e. ADC result processing, non-DMA memory management, data staging, logging and transmission, etc.).
IMHO this approach is just horribly wrong. Actually in two ways: A) never use interrupts on inputs that can change at will because they can and will lockup your application due to an unforeseen circumstance (which can be as simple as a lose wire, nearby noise source, etc). B) don't use interrupts when they interfere with eachother; find a common denominator and combine multiple interrupts into 1. IOW: Use a single timer interrupt in which you first sample the GPIO port (preferably having all the input pins on the same port) and then process the state of the inputs as needed. Keep in mind that after the GPIO input register (containing ALL input levels as a single read operation) has been read, there is no source of jitter that can be added to the signal so processing time doesn't matter for as long as you do it fast enough before the timer interval has passed. On a dual core MCU running at several hundred MHz (NXP's RT1000 series for example) you should be able to achieve time resolution in the sub-microsecond regions without much effort.

If interrupt jitter gives too much uncertainty, an option is to look for a controller that supports timer-triggered DMA transfers (one address to double buffer) which allows to process captured data 'en bloc' without having interrupt overhead as well.

Thanks for your input, I really appreciate it.

I have actually considered this use case as well. I could set aside an entire 32 bit IO port for these digital input signals, allowing for a single read of the entire port (either in a software interrupt or via DMA transfer to buffer as you've suggested).

One clarification I do have though. If the port read is driven by a timer interrupt, is the only way to guarantee that a given channel hasn't toggled between timer interrupts to select a timer frequency that is guaranteed to be faster than the input signals?

#36 Reply
Posted by jars121 on 11 Jul, 2023 22:40
Quote from: tggzzz on 10 Jul, 2023 10:00
Quote from: jars121 on 07 Jul, 2023 00:01
100Hz is just an example in this case. I might see requirements orders of magnitude faster and slower than that depending on the particular implementation. Even if the logging rate is relatively slow, the input signals themselves might be relatively fast, so what I'm trying to avoid is having numerous (12 to 20+) inputs, all with 'relatively' high frequency (kHz+) having to be measured in ISRs within a single core MCU. Each rising and falling edge would need to be timestamped (if measuring duty cycle), which means multiple ISRs per channel, which can introduce a not insignificant amount of jitter in servicing the interrupts, timestamping, etc., as well as in servicing of the other MCU tasks (i.e. ADC result processing, non-DMA memory management, data staging, logging and transmission, etc.).
...
The Parallax is actually a really interesting option. I've not used them before, but on paper they would certainly handle this particular task. Having not used that platform before, the assembly language is certainly a bit foreign!

There is, IMNSHO, a better alternative if you are prepared to accept single source ICs that you can buy at Digikey: the XMOS xCORE.

Overall my summary is that it slots into the area where traditional microprocessors are pushing it but where FPGAs are overkill. You get the benefits of fast software iteration and timings guaranteed by design, not measurement.

Why so good? The hardware and software are designed from the ground up for hard realtime operation:
- each i/o port has its own (easily accessible) timer and FPGA-like simple/strobed/clocked/SERDES with pattern matching
- dedicate one (of 32) core and task to each I/O or group of I/Os
- software typically looks like the obvious
  - initialise
  - loop forever, waiting for input or timeout or message then instantly resume and do the processing
You mention ISRs as introducing jitter, but omit to mention caches. The xCORE devices have neither They do have up to 32 cores/chip 4000MIPS/chip (expandable).

Each IO port has its own timers, so guaranteeing output on a specific clock cycle, and measuring the specific clock cycle on which input arrived.

The equivalent of an RTOS (comms, scheduling) is implemented in silicon.

There is a long and solid theoretical and practical pedigree for the hardware (Transputer 1980s) and software (CSP/Occam 1970s).

The development tools (command line and IDE) will inspect the optimised code to determine exactly how many clock cycles it will take to get from here to there. None of this "measure and hope you have spotted the worst case" rubbish!

When I used it, I found it stunningly easy, having the first iteration of code working within a a day. There were no hardware nor software surprises, no errata sheets.

For a remarkably information dense flyer on the basic architecture, see https://www.xmos.ai/download/xCORE-Architecture-Flyer(1.3).pdf

For a glimpse of the software best/worst case timing calculations, see
This is an excellent post, thank you for taking the time to share your knowledge. I haven't considered the xCORE platform before, but it sounds like it would be of considerable interest to me as I look to move from the MCU domain into FPGAs.

#37 Reply
Posted by nctnico on 11 Jul, 2023 23:03
Quote from: jars121 on 11 Jul, 2023 22:39
Quote from: nctnico on 08 Jul, 2023 21:27
Quote from: jars121 on 07 Jul, 2023 00:01
Quote from: PCB.Wiz on 06 Jul, 2023 23:16
Quote from: jars121 on 06 Jul, 2023 00:29
I'm considering a particular application and would very much appreciated some thoughts and/or recommendations.

I've used a particular MCU on a number of projects, and am now looking at a slight variation on a previous design, for which I'm considering the addition of an FPGA. I need to measure (frequency, duty cycle, etc.) a number of input signals (12+), and perform deterministic timestamping of each input at a specified rate (i.e. 100Hz). I've achieved something similar using the MCU with high priority interrupts in the past, but the overhead of servicing these interrupts, as well as the jitter of the timestamping function make this approach unsuitable in this particular application. I also looked at using the Timer Counter (TC) functionality of the MCU, but despite having 12 16-bit counters, there are only three available external inputs, which obviously doesn't suffice.
What timing resolution and bits do you need to capture ? How many edges/second ?
You mention 100Hz, which suggests modest capture rates ? a HW capture module could be SW unloaded fine, at those rates.

100Hz is just an example in this case. I might see requirements orders of magnitude faster and slower than that depending on the particular implementation. Even if the logging rate is relatively slow, the input signals themselves might be relatively fast, so what I'm trying to avoid is having numerous (12 to 20+) inputs, all with 'relatively' high frequency (kHz+) having to be measured in ISRs within a single core MCU. Each rising and falling edge would need to be timestamped (if measuring duty cycle), which means multiple ISRs per channel, which can introduce a not insignificant amount of jitter in servicing the interrupts, timestamping, etc., as well as in servicing of the other MCU tasks (i.e. ADC result processing, non-DMA memory management, data staging, logging and transmission, etc.).
IMHO this approach is just horribly wrong. Actually in two ways: A) never use interrupts on inputs that can change at will because they can and will lockup your application due to an unforeseen circumstance (which can be as simple as a lose wire, nearby noise source, etc). B) don't use interrupts when they interfere with eachother; find a common denominator and combine multiple interrupts into 1. IOW: Use a single timer interrupt in which you first sample the GPIO port (preferably having all the input pins on the same port) and then process the state of the inputs as needed. Keep in mind that after the GPIO input register (containing ALL input levels as a single read operation) has been read, there is no source of jitter that can be added to the signal so processing time doesn't matter for as long as you do it fast enough before the timer interval has passed. On a dual core MCU running at several hundred MHz (NXP's RT1000 series for example) you should be able to achieve time resolution in the sub-microsecond regions without much effort.

If interrupt jitter gives too much uncertainty, an option is to look for a controller that supports timer-triggered DMA transfers (one address to double buffer) which allows to process captured data 'en bloc' without having interrupt overhead as well.

Thanks for your input, I really appreciate it.

I have actually considered this use case as well. I could set aside an entire 32 bit IO port for these digital input signals, allowing for a single read of the entire port (either in a software interrupt or via DMA transfer to buffer as you've suggested).

One clarification I do have though. If the port read is driven by a timer interrupt, is the only way to guarantee that a given channel hasn't toggled between timer interrupts to select a timer frequency that is guaranteed to be faster than the input signals?
Yes. But this is true for any kind of solution you choose. There will always be a minimum sample interval so pulses which are narrower than the sample interval will be missed. You can opt to include a pulse-stretching circuit to make narrow pulses become wide enough to detect them. OTOH extremely narrow pulses are typically noise due to interference of some sort so you might want to filter the inputs. But this is very dependant on the actual signals you are sampling so I can't tell you what is best; I can online outline the options.

#38 Reply
Posted by jars121 on 11 Jul, 2023 23:09
Understood, thanks for clarifying.

Given the low speed of the input signals (<100kHz), I'm fairly comfortable that I wouldn't miss any channel toggles in this particular application, but it's good to know for future applications where the signal frequencies might be considerably faster.

#39 Reply
Posted by David Hess on 11 Jul, 2023 23:22
Quote from: jars121 on 06 Jul, 2023 02:23
Thanks for your input, it's greatly appreciated.

The option of having additional 'front-end' MCUs, feeding into the larger, more capable MCU is actually quite an interesting one.

I have done it that way before with extra microcontrollers of the same type handling the display, keyboard, real time I/O, whatever. Microcontrollers are so inexpensive that it can make sense, and if they are of the same type, then you already have the development system and tools.

It seems like an awful waste to replace a bunch of discrete logic with an entire microcontroller, but the economics favor it.

#40 Reply
Posted by tggzzz on 11 Jul, 2023 23:43
Quote from: jars121 on 11 Jul, 2023 22:40
This is an excellent post, thank you for taking the time to share your knowledge. I haven't considered the xCORE platform before, but it sounds like it would be of considerable interest to me as I look to move from the MCU domain into FPGAs.

You're welcome. It may be worth your while searching the entire forum for xCORE posts by tggzzz. There will be a lot of repetition, but some posts are more detailed than others, and in a few cases there has been a useful discussion with other people that have used it.

As with any technology, it comes with a set of advantages and disadvantages.

This technology is sufficiently different that it demonstrates that discussions about the relative merits of processor X59 vs processor X60 are relatively tedious when considering the bigger picture of the hardware+software+tooling and how that fits with high-level design concepts. That is emphasised when it is noted that XMOS have indicated that future cores will be RISC-V compatible. I hope and expect that will change nothing of any significance, but that remains to be verified.

For understanding how to program the devices, see https://www.xmos.ai/download/XMOS-Programming-Guide-(documentation)(F).pdf It is remarkably easy to read, and introduces the key concepts gently, but not too gently. Basically it assumes you know how to program in C, and want to find out the ways in which this ecosystem is an improvement.

For understanding the i/o ports' features and capabilities, see https://www.xmos.ai/download/Introduction-to-XS1-ports(3).pdf

I will contend that even if you don't use xCORE, using the high level concepts will improve the sructure of your designs. After all, they have been around since the 1970s (CAR Hoare's Communicating Sequential Processes), and some of the concepts keep reappearing in various processors and languages (e.g. Go, most recently).

#41 Reply
Posted by tggzzz on 11 Jul, 2023 23:50
Quote from: David Hess on 11 Jul, 2023 23:22
Quote from: jars121 on 06 Jul, 2023 02:23
Thanks for your input, it's greatly appreciated.

The option of having additional 'front-end' MCUs, feeding into the larger, more capable MCU is actually quite an interesting one.

I have done it that way before with extra microcontrollers of the same type handling the display, keyboard, real time I/O, whatever. Microcontrollers are so inexpensive that it can make sense, and if they are of the same type, then you already have the development system and tools.

It seems like an awful waste to replace a bunch of discrete logic with an entire microcontroller, but the economics favor it.

As always, the hardware is relatively easy and cheap. Hells teeth, doesn't a very simple MCU cost less than a 555 timer+passives now?!

The software is yet to catch up. xC is one good starting point, but we need more.

As you know, the xCORE approach is to note that ALUs+registers are far faster than memory, so it makes sense to timeshare the processor's ALU hardware between different cores. Intel/AMD do the same (and call it SMT), and Sun did it in their Niagara processors.

#42 Reply
Posted by SiliconWizard on 12 Jul, 2023 00:20
Quote from: tggzzz on 11 Jul, 2023 23:43
For understanding how to program the devices, see https://www.xmos.ai/download/XMOS-Programming-Guide-(documentation)(F).pdf It is remarkably easy to read, and introduces the key concepts gently, but not too gently. Basically it assumes you know how to program in C, and want to find out the ways in which this ecosystem is an improvement.

For understanding the i/o ports' features and capabilities, see https://www.xmos.ai/download/Introduction-to-XS1-ports(3).pdf

I will contend that even if you don't use xCORE, using the high level concepts will improve the sructure of your designs. After all, they have been around since the 1970s (CAR Hoare's Communicating Sequential Processes), and some of the concepts keep reappearing in various processors and languages (e.g. Go, most recently).

I concur.
And I agree the above read is very good indeed, even if you don't plan on using XMOS products.

#43 Reply
Posted by PCB.Wiz on 12 Jul, 2023 01:51
Quote from: jars121 on 11 Jul, 2023 22:39
I have actually considered this use case as well. I could set aside an entire 32 bit IO port for these digital input signals, allowing for a single read of the entire port (either in a software interrupt or via DMA transfer to buffer as you've suggested).

One clarification I do have though. If the port read is driven by a timer interrupt, is the only way to guarantee that a given channel hasn't toggled between timer interrupts to select a timer frequency that is guaranteed to be faster than the input signals?
If you timer sample, then yes, your aperture is the timer rate.
Typically, to reduce memory needed, you check for any change, and only save (Pin+Time) pairs on a change.

However, you do not have to use a timer tick, if your inputs are low rate and you need better edge precision, you can use a pin-change / port-match interrupt, that then captures the timer.

The granularity is the timer, but there is a rate-window/blanking window which is the time to service that interrupt.

Pin-Change interrupts can even capture very narrow pulses by inference.
If you get an interrupt, but the pins appear unchanged when captured, you know the impulse was shorter than the check delay.

#44 Reply
Posted by PCB.Wiz on 12 Jul, 2023 01:55
Quote from: David Hess on 11 Jul, 2023 23:22
It seems like an awful waste to replace a bunch of discrete logic with an entire microcontroller, but the economics favor it.

Yup, MCUs are so cheap and universal, that even using a single peripheral or interrupt feature, can make economic sense in a project.

#45 Reply
Posted by nctnico on 12 Jul, 2023 08:50
Quote from: PCB.Wiz on 12 Jul, 2023 01:51
Quote from: jars121 on 11 Jul, 2023 22:39
I have actually considered this use case as well. I could set aside an entire 32 bit IO port for these digital input signals, allowing for a single read of the entire port (either in a software interrupt or via DMA transfer to buffer as you've suggested).

One clarification I do have though. If the port read is driven by a timer interrupt, is the only way to guarantee that a given channel hasn't toggled between timer interrupts to select a timer frequency that is guaranteed to be faster than the input signals?
If you timer sample, then yes, your aperture is the timer rate.
Typically, to reduce memory needed, you check for any change, and only save (Pin+Time) pairs on a change.

However, you do not have to use a timer tick, if your inputs are low rate and you need better edge precision, you can use a pin-change / port-match interrupt, that then captures the timer.

The granularity is the timer, but there is a rate-window/blanking window which is the time to service that interrupt.

Pin-Change interrupts can even capture very narrow pulses by inference.
For that you have to define narrow. Typically there will be some synchronisation logic that samples the I/O pin using some clock frequency. So in such cases minimum pulse width be governed by the sampling interval. The same goes for timer capture, these inputs are typically sampled as well.

#46 Reply
Posted by David Hess on 12 Jul, 2023 16:20
Quote from: tggzzz on 11 Jul, 2023 23:50
As always, the hardware is relatively easy and cheap. Hells teeth, doesn't a very simple MCU cost less than a 555 timer+passives now?!

The software is yet to catch up. xC is one good starting point, but we need more.

But if programmable logic is being added to a microcontroller, then that requires "software" as well, and it will be developed in a completely different programming environment. At least if the same series of microcontrollers is used, the software development environment can be the same for all of them.

#47 Reply
Posted by tggzzz on 12 Jul, 2023 16:38
Quote from: David Hess on 12 Jul, 2023 16:20
Quote from: tggzzz on 11 Jul, 2023 23:50
As always, the hardware is relatively easy and cheap. Hells teeth, doesn't a very simple MCU cost less than a 555 timer+passives now?!

The software is yet to catch up. xC is one good starting point, but we need more.

But if programmable logic is being added to a microcontroller, then that requires "software" as well, and it will be developed in a completely different programming environment. At least if the same series of microcontrollers is used, the software development environment can be the same for all of them.

A lot will depend on the complexity of the PL. If in an FPGA then everything will be very different and will probably require different staff (until softies can think in fine grained parallel ). If a limited PIO FSM type peripheral, then the toolset will still be different - but much simpler and with a smaller learning curve.

In either case, it highlights the potential benefits of an ecosystem that pushes from traditional MCU space towards traditional FPGA space.

#48 Reply
Posted by asmi on 12 Jul, 2023 16:56
Quote from: tggzzz on 12 Jul, 2023 16:38
until softies can think in fine grained parallel
Actually FPGAs are not fine grained parallel, they are just parallel, because FPGA don't execute anything, but wire up and program logic gates. The term "fine grained parallelism" means instruction-level parallel execution (meaning execution unit can execute instructions from different instruction stream on each cycle), which is what GPUs and a lot of various "AI accelerators" do. And with proliferation of the likes of CUDA this is no longer something out of ordinary for software folks. But this term can not be applied to FPGA because there is no "execution units" (unless FPGA designer puts one in), not does it "calculate" anything (again unless designed to do so), but it wires an actual logic gates and other hardware blocks.

#49 Reply
Posted by tggzzz on 12 Jul, 2023 17:16
Quote from: asmi on 12 Jul, 2023 16:56
Quote from: tggzzz on 12 Jul, 2023 16:38
until softies can think in fine grained parallel
Actually FPGAs are not fine grained parallel, they are just parallel, because FPGA don't execute anything, but wire up and program logic gates. The term "fine grained parallelism" means instruction-level parallel execution (meaning execution unit can execute instructions from different instruction stream on each cycle), which is what GPUs and a lot of various "AI accelerators" do. And with proliferation of the likes of CUDA this is no longer something out of ordinary for software folks. But this term can not be applied to FPGA because there is no "execution units" (unless FPGA designer puts one in), not does it "calculate" anything (again unless designed to do so), but it wires an actual logic gates and other hardware blocks.

There is no single definition of fine grained parallelism, especially since parallelism can be defined and expressed at many many levels and in many many ways.

But don't ignore the wood and concentrate on one clump of trees. That isn't enlightening.

For many people it is more enlightening for them to consider the relationship between a specification/algorithm, how it can be implemented in hardware and/or software, and the deep equivalence between hardware and software. More useful, too