Author Topic: High speed bidirectional MPU interface (Read 1743 times)

ezalys · « **on:** March 28, 2022, 02:44:37 pm »

What’s the typical solution when you want to interface an MPU with an FPGA faster than SPI can handle, but your MPU has no explicit peripherals for doing so? Is it common for an FPGA to masquerade as NAND flash or eMMC or something else for communication?

ejeffrey · « **Reply #1 on:** March 28, 2022, 04:42:30 pm »

It mostly depends on what high speed peripherals your MCU has and the nature of the communication. One important question is if it needs to operate in the background with DMA. Bit-banging a synchronous 32 bit parallel interface is going to be about fast as you can get but will keep the MCU busy.

SiliconWizard · « **Reply #2 on:** March 28, 2022, 05:39:55 pm »

Highly depends on the available peripherals on your "MPU". Can you describe what you mean exactly by MPU? Examples?

It's almost impossible to answer without knowing what exactly you're dealing with. Since you mentioned SPI, one can assume that said MPU would have at least a SPI peripheral, but that's about it. And even so, SPI comes in many forms, from basic SPI with 1 bit of data in each direction and a few tens of MHz max, to quad or even octal SPI with speeds of 100MHz+.

If whatever SPI you have doesn't have enough throughput (but check that first), other typical approaches would be "parallel" interfaces. Many MCUs do have them in various forms and various levels of configurability. But if your "MPU" doesn't have anything like that, we'll need to know what kind of interfaces it has, to begin with. We can't guess.

You mentioned "NAND flash" or eMMC. The interfaces for this are often QSPI these days. QSPI is close to SPI, with often 4-bit of data, or more. Difference is that the data bus is bidirectional, and the typical QSPI protocol is more involved, but again, most MCUs supporting QSPI have very configurable QSPI peripherals, so you could definitely configure that to implement your own protocol instead of just "masquerading" as existing memory chips.

Alternatives would be to implement an ethernet of even USB link on the FPGA. Quite a few MCUs support that, and most more complex "SoCs" as well (again, please detail what you mean by MPU.) Of course that's more complex than SPI, a simple parallel bus or even QSPI. Since that would be a direct link on what I suppose would be the same PCB, you usually wouldn't need dedicated PHYs for ethernet of USB, you could just use differential pairs directly.

Those are general ideas - knowing what kind of "processor" you have in mind and the kind of throughput you're after would help giving more precise answers.

ezalys · « **Reply #3 on:** March 28, 2022, 05:59:25 pm »

Well -- I'm just looking at the processors listed on the https://jaycarlson.net/embedded-linux page. There's a few with parallel interfaces -- namely the STM32MP1. I'm just curious if people work around not having such an interface this if it means you can save a few bucks on the processor or if the parts with this feature are out of stock or some such.

What I really want is 8 MB/s average throughput, with 30 ms of latency tolerable. DMA would of course be good.

T3sl4co1l · « **Reply #4 on:** March 28, 2022, 08:06:56 pm »

I recall there are a few MCUs with parallel bus style GPIOs, I think the PIC32s are such a case? -- the whole parallel array of GPIOs on that channel can be driven at once, along with a clock/bus strobe signal, hands-free with DMA. This certainly isn't available on everything, so YMMV.

I would think anything with memory interface (e.g. ST's FSMC), you can just map the FPGA to a suitable slice of address space and go with it. Parallel RAM interfaces are quite simple, even SDRAM or DDR isn't beyond the pale (of course, you'll need quite a bit more logic inside the FPGA to interface that).

For example, one of the (older/now obsolete?) Discovery boards had (SD?)RAM and (16-bit parallel) LCD on the same FSMC bus, mapped appropriately. LCD shows up as a couple addresses (registers) I think? Very easy to dump lots of data to the LCD that way.

If you don't have a parallel bus interface, you're limited by bit-banging, and whatever rate that can be done at; usually IO propagates not just through multiple bus/clock domains but different clock rates as well, and the CPU may be stalled (wait states) during those delays. I'm not sure what is generally available/possible in the average device but most things can toggle pins at CPU clock rate, when CLK_CPU = CLK_PERIPH (so like, Cortex M0 etc. stuff).

When core runs faster, YMMV; it may wait, it may go through anyway (some devices allow astonishingly high pin toggle rates), it might be resynchronized (some transitions go missing??), or even cached (slowed down to local bus rate?). Like uh, I think x86/64 CPUs do IO cache, where a sequence of IO operations is propagated in-order, but I'm not sure what all exactly, and anyway you're likely not using much of that on such a system (that's low level driver stuff you'd rarely even see).

Like, even among AVRs, there's a few with a bus interface; which are, I think some very old ones, back when onboard SRAM was expensive and DIP packages ruled, so you could add external to beef it up as needed; and, among more recent families, I think just the top end (e.g. ATXMEGA256A3U?) had a bus interface peripheral. These could offer couple-cycle wait state performance, so, at up to 32MHz 8-bit CPU, you'd be looking at say 4MB/s without much trouble (even in C, maybe?), but just not having a lot of memory to buffer things in, let alone CPU power to do much raw data processing on. Whereas with GPIO writes, you have to address the ports, fetch data, write, strobe, wait, and so on, and you're lucky to get maybe 1MB/s that way. (I did a reverb effects box using external parallel RAM this way, with an XMEGA64D3; I think that's about right -- on the order of 20-30 cycles per word fetched, at 32MHz. Oh---was that per byte or per word, maybe it wasn't so bad after all? But then, I was also doing some DSP operations inlined with that, so it's hard to say just in terms of raw rate.)

So yeh, for most like Cortex M0 things, running at nominal speeds, that should be doable even with GPIOs, but you may have to optimize it in ASM to get the timings cycle-exact, and it won't be DMA'd. If GPIOs can be DMA'd as if a bus, or if an external bus can be configured, or if the CPU can run somewhat faster, you'll also be set. There is no single solution here, too many things to choose from; you'll have to check out what MCUs have what features to offer, and go with that. Bit-bang GPIO is probably the most portable/universal, but even that can be subject to limitations.

Also... supposing a wide bus is acceptable, then taking a whole 16 or 32 GPIOs at once, obviously helps the bitrate. If not, then clock speeds will be that much more critical, and something like a multi-lane SPI bus may prove more attractive. Some devices have almost arbitrary numbers of lanes, I think? Like, if you want to treat 8 bits as a parallel bus driven like SPI, you can?

Tim

tszaboo · « **Reply #5 on:** March 28, 2022, 08:38:30 pm »

QSPI. While it sounds as SPI, the data rate is significantly higher. Or connect the FPGA to the memory bus, and map parts of it's internal memory.

Someone · « **Reply #6 on:** March 28, 2022, 11:20:15 pm »

Quote from: ezalys on March 28, 2022, 02:44:37 pm

What’s the typical solution when you want to interface an MPU with an FPGA faster than SPI can handle, but your MPU has no explicit peripherals for doing so?

Get faster SPI, 30MHz+ is easily done and 32 bit word/packets are available in some devices.

Abusing audio/camera/MIPI interfaces is possible but usually a pain.

SiliconWizard · « **Reply #7 on:** March 29, 2022, 12:40:34 am »

Quote from: ezalys on March 28, 2022, 05:59:25 pm

Well -- I'm just looking at the processors listed on the https://jaycarlson.net/embedded-linux page. There's a few with parallel interfaces -- namely the STM32MP1. I'm just curious if people work around not having such an interface this if it means you can save a few bucks on the processor or if the parts with this feature are out of stock or some such.

What I really want is 8 MB/s average throughput, with 30 ms of latency tolerable. DMA would of course be good.

8 MB/s is nothing to write home about really, except maybe if you were using ultra-simple MCUs at low clock rates. For the CPUs you see listed there, no issue whatsoever.
Most or all of them have ethernet and high-speed USB, which you could use as I said, but it would even be overkill for 8 MB/s.

Let's just take the SAM9X60: it has "Quad I/O SPI", which is basically SPI with 4-bit data instead of just 1-bit (as I suggested earlier.) I think most of those "MPUs" do have something similar.
I haven't seen what the max rate was for it, but I'm pretty sure it's at least 200 MHz - which is typical for accessing 4-bit QSPI flash and the like (PSRAM for instance) - which would mean, at least 100 MB/s (4 bits per clock pulse). Way over what you need here. Even basic SPI (1-bit) should work with those MPUs, but I would definitely suggest going for quad SPI, which has become pretty standard, it not much more difficult to implement (except you need to bother with bus direction), and will give you more throughput room for the future.

Note that "QSPI" standardizes more than just a 4-bit synchronous bus, it defines a set of standard commands for memory chips, but you can absolutely use a QSPI peripheral without knowing all the details of that in your own application.

SpacedCowboy · « **Reply #8 on:** April 01, 2022, 03:34:21 pm »

I'm planning on putting together a microcontroller interfaced to an FPGA, and I want fast i/o between them.In this case, I chose the RP2040 for a few reasons:

They're available - DigiKey had 80,000 or so of them last time I checked. Not an unimportant quality these days...
They're cheap, well, as cheap as a dual-core ARM @ anything up to a couple of hundred MHz is going to be, need external QSPI but that's not a huge expense
They have a couple of hundred KB of SRAM inside, useful for buffering
They have programmable i/o state-machines, so you can implement your i/o protocol without having the ARM get involved until the data is ready. People have made dual-DVI output using these PIO blocks (!)

In my case, it's going to be

read from FPGA
process data
write to FPGA

... so the relatively small number (30) of GPIO on the chips doesn't matter to me - and I can set up one PIO block to do the read, one to do the write, and have dedicated 8-bit bus + clock in both directions - sort of dual unidirectional octa-spi. I'll probably use a simple protocol like:

byte 0 : packet-type
byte 1 : packet-length
... data bytes...
byte N+2 : checksum

The silicon is only qualified up to 133MHz, but I've yet to see anyone have problems pushing them well beyond that. The DVI project linked above runs the chip at 252MHz. It is also driving the i/o pins at that frequency, if that's your thing. These little things are surprisingly powerful for their price.

(Aside: I don't think I've used so many {list}s in a post before

)

jbb · « **Reply #9 on:** April 01, 2022, 07:13:16 pm »

I looked on the ST website and it seems FMC (parallel), Ethernet, USB and QSPI are available.

A big question is, what sort of traffic do you expect to the FPGA? Forwarding predictable-sized chunks back and forth? Streaming data mostly in one direction? Lots of little back-and-forth transfers?

For predictable chunks or streaming, the serial types (eg QSPI) might be a strong contender.

For many little transfers, parallel bus through the FMC interface might be appropriate. That would let you make a register interface in the FPGA and treat it like a custom STM32 peripheral. However, parallel bus can be annoying to layout - lots of traces for address and data! - and require some special interfacing on the FPGA side.

Don’t forget to include some interrupts from the FPGA back to the STM32…


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: High speed bidirectional MPU interface (Read 1743 times)

ezalys

High speed bidirectional MPU interface

ejeffrey

Re: High speed bidirectional MPU interface

SiliconWizard

Re: High speed bidirectional MPU interface

ezalys

Re: High speed bidirectional MPU interface

T3sl4co1l

Re: High speed bidirectional MPU interface

tszaboo

Re: High speed bidirectional MPU interface

Someone

Re: High speed bidirectional MPU interface

SiliconWizard

Re: High speed bidirectional MPU interface

SpacedCowboy

Re: High speed bidirectional MPU interface

jbb

Re: High speed bidirectional MPU interface

Share me