Author Topic: ARM with fast parallel GPIO  (Read 10287 times)

0 Members and 1 Guest are viewing this topic.

Offline kamtarTopic starter

  • Regular Contributor
  • *
  • Posts: 62
ARM with fast parallel GPIO
« on: January 16, 2021, 11:49:35 pm »
Hello,

I'm looking for some Cortex-M MCU which would be ideal to feed fast DAC through a parallel interface.

1. I don't want any DSP or FPGA, just a regular ARM MCU.
2. I don't have any strict minimal speeds in mind just as fast as it can be.. getting some parallel interface that could run close to 50-100Mhz would be nice.

I'm in a process of reading up on various MCUs and going over my options but if there is somebody who has used some ARM for something similar I would be glad to hear it.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14297
  • Country: fr
Re: ARM with fast parallel GPIO
« Reply #1 on: January 17, 2021, 12:06:27 am »
You should be able to do that with the FMC peripheral of STM32's MCUs, for instance. For 100MHz, I would suggest a STM32F7 or H7.
Now the issue there is that the speed is not the only requirement. If you're driving a parallel DAC, data has to get out at a fixed frequency with low jitter. I can't guarantee the FMC peripheral of above MCUs can get you that.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: ARM with fast parallel GPIO
« Reply #2 on: January 17, 2021, 12:11:17 am »
It is going to be a challenge. I'm looking for some ARM device with a fast parallel interface with no special protocol, but not a whole lot of luck.

Most of the time you get static memory controller, but it inserts address latch cycles, so not ideal for just transferring raw data. In some cases there is ability do disable the address cycle, but the interface speed is still not that fast. One such example is Nuvoton M480 series.

Often the best way to read/write raw data stream is camera/display interfaces. But again, in many cases controllers are too smart and expect proper hsync/vsync pulses.

For getting the data into the device I found SAM E70 to be the best. It has parallel capture controller, which boils down to 8-bit bus, external clock, and a couple enable pins.

Interfacing with FPGAs is such a common task that I don't understand why chip vendors do not include a dedicated peripheral for that, which would be also reusable as a general purpose parallel streaming interface.
Alex
 

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 8972
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: ARM with fast parallel GPIO
« Reply #3 on: January 17, 2021, 02:03:42 pm »
You should be able to do that with the FMC peripheral of STM32's MCUs, for instance. For 100MHz, I would suggest a STM32F7 or H7.
Now the issue there is that the speed is not the only requirement. If you're driving a parallel DAC, data has to get out at a fixed frequency with low jitter. I can't guarantee the FMC peripheral of above MCUs can get you that.
Run it in slave mode with an external oscillator supplying the clock. That said, such a high clock rate is asking a bit much from a microcontroller, a cheap FPGA or CPLD would probably be a better solution.
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline kamtarTopic starter

  • Regular Contributor
  • *
  • Posts: 62
Re: ARM with fast parallel GPIO
« Reply #4 on: January 17, 2021, 04:56:28 pm »
Thanks for the inputs, I will take a look at SAM E70s.
To be more precise I don't plan driving DAC directly it's more of a DDS IC, this is just for a prototype I will use to actually figure out what everything I can do with it using the modulation registers and so on.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6171
  • Country: fi
    • My home page and email address
Re: ARM with fast parallel GPIO
« Reply #5 on: January 17, 2021, 07:30:46 pm »
It is going to be a challenge. I'm looking for some ARM device with a fast parallel interface with no special protocol, but not a whole lot of luck.
Did you look at SAM D5x/E5x?

I looked at common ARM microcontrollers available to a hobbyist like me that could provide parallel 18-bit data port to displays (+ 5 or so control signals) via DMA, and basically the only one I found with > 16-bit wide GPIO banks was SAM D5x/E5x.  On these, the GPIO A bank has 26 consecutive pins (A00 to A25), B bank 18 (B00 to B17) on 64-pin TQFP/VQFN and 26 (B00 to B25) on 100-pin TQFP/VQFN, and so on.  Using suitable choices of pins, you can do 32-bit DMA to/from an entire pin bank.

However, I am no EE, and have no idea about D5x/E5x hidden gotchas; all I wanted/looked at was having sufficiently wide GPIO bank I could DMA data from in parallel, on a common enough ARM microcontroller.  Any insight on parallel GPIO on these?
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: ARM with fast parallel GPIO
« Reply #6 on: January 17, 2021, 07:35:30 pm »
Did you look at SAM D5x/E5x?
Every single day at work. I work for Microchip :)

In my case I was not interested in anything without High Speed USB, since my projects currently involve transferring large amounts of data to/from PC.

But also SAM D5x/E5x will not be fast, it takes at a minimum 6 clock cycles to toggle the pin. At 120 MHz absolute best toggling (just toggling, no actual logic) rate is 20 MHz.. If you want to set the data, it will be way-way slower.

DMA-ing a parallel interface is not easy, as there is no real trigger from the GPIO controller.
Alex
 
The following users thanked this post: jancumps, Nominal Animal, I wanted a rude username

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6171
  • Country: fi
    • My home page and email address
Re: ARM with fast parallel GPIO
« Reply #7 on: January 17, 2021, 10:34:29 pm »
In my case I was not interested in anything without High Speed USB, since my projects currently involve transferring large amounts of data to/from PC.
I use a Teensy 4.0 and 4.1 for that (NXP i.MX RT1064, Cortex-M7).  For a hobbyist like me, Teensyduino is a pretty easy environment to play with.

Pity i.MX RT1064 is only available in MAPBGA-196 with 0.65mm or 0.8mm pitch; I definitely don't have the skills to make my own board with those.  Otherwise, it'd be a pretty darn powerful microcontroller with lots of RAM for many use cases, like that Arduino-programmable display controller (for games or human-machine interfaces) that I experimented with SAMD51J20A.
 

Offline kamtarTopic starter

  • Regular Contributor
  • *
  • Posts: 62
Re: ARM with fast parallel GPIO
« Reply #8 on: January 18, 2021, 12:19:29 am »
Pity i.MX RT1064 is only available in MAPBGA-196 with 0.65mm or 0.8mm pitch; I definitely don't have the skills to make my own board with those.  Otherwise, it'd be a pretty darn powerful microcontroller

I didn't read up on RT1064 and I bet it's much better than the RT1010 and RT1020 in LQFP I have experience with but still, those MCUs are made to a price and they aren't that good as they seem on paper, lots of limitations and compromises in their peripherals.
« Last Edit: January 18, 2021, 12:21:17 am by kamtar »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6171
  • Country: fi
    • My home page and email address
Re: ARM with fast parallel GPIO
« Reply #9 on: January 18, 2021, 08:25:57 pm »
Pity i.MX RT1064 is only available in MAPBGA-196 with 0.65mm or 0.8mm pitch; I definitely don't have the skills to make my own board with those.  Otherwise, it'd be a pretty darn powerful microcontroller
I didn't read up on RT1064 and I bet it's much better than the RT1010 and RT1020 in LQFP I have experience with but still, those MCUs are made to a price and they aren't that good as they seem on paper, lots of limitations and compromises in their peripherals.
Teensy 4.0 only provides 50 I/O pins and Teensy 4.1 55, and a limited subset of the peripherals anyway.  But the ones it does provide, are pretty much amazing considering the price point and ease of use (4.0 being < USD $20).  Not to mention it runs at 600 MHz, and has 512kB+512kB of RAM.  (Teensy 4.1 has pads for additional PSRAM, though.)  I haven't found the practical upper limit for USB HS bandwidth yet, I only know it is over 200 Mbits/s (25 MiB/s) because even a simple Arduino/Teensyduino using USB CDC ACM in one direction achieves that.  I like them.

Sure, there are limitations (and I hear the development of Teensy 4.x took a lot of time and effort, especially the early initialization part), but they're nothing compared to the gains.  Personally, I'm not even interested in most of the peripherals; I just want DMA, GPIO, SPI, I2C, USB HS, and lots of RAM; and preferably contiguous banks of GPIO pins so I could DMA out data in parallel to a small display controller, with easy to use DMA triggers.  The stuff I do isn't complicated.

I haven't seen anything comparable.  The STM32H7 series looks interesting, but requires a separate ULPI transceiver for USB HS.  That does not mean they are not available; that's just what this one hobbyist has seen :-//
 

Offline kamtarTopic starter

  • Regular Contributor
  • *
  • Posts: 62
Re: ARM with fast parallel GPIO
« Reply #10 on: January 18, 2021, 08:35:52 pm »
Pity i.MX RT1064 is only available in MAPBGA-196 with 0.65mm or 0.8mm pitch; I definitely don't have the skills to make my own board with those.  Otherwise, it'd be a pretty darn powerful microcontroller
I didn't read up on RT1064 and I bet it's much better than the RT1010 and RT1020 in LQFP I have experience with but still, those MCUs are made to a price and they aren't that good as they seem on paper, lots of limitations and compromises in their peripherals.
Teensy 4.0 only provides 50 I/O pins and Teensy 4.1 55, and a limited subset of the peripherals anyway.  But the ones it does provide, are pretty much amazing considering the price point and ease of use (4.0 being < USD $20).  Not to mention it runs at 600 MHz, and has 512kB+512kB of RAM.  (Teensy 4.1 has pads for additional PSRAM, though.)  I haven't found the practical upper limit for USB HS bandwidth yet, I only know it is over 200 Mbits/s (25 MiB/s) because even a simple Arduino/Teensyduino using USB CDC ACM in one direction achieves that.  I like them.

Sure, there are limitations (and I hear the development of Teensy 4.x took a lot of time and effort, especially the early initialization part), but they're nothing compared to the gains.  Personally, I'm not even interested in most of the peripherals; I just want DMA, GPIO, SPI, I2C, USB HS, and lots of RAM; and preferably contiguous banks of GPIO pins so I could DMA out data in parallel to a small display controller, with easy to use DMA triggers.  The stuff I do isn't complicated.

I haven't seen anything comparable.  The STM32H7 series looks interesting, but requires a separate ULPI transceiver for USB HS.  That does not mean they are not available; that's just what this one hobbyist has seen :-//

I used RT1010 on my last board (USB UAC2 DAC) and yeah its the only option you have if you really want those 500Mhz for cheap but too limiting for my prototyping (mainly that I cant use it as a clock divider) so I'm eyeing that SAME70 now.
 

Offline DC1MC

  • Super Contributor
  • ***
  • Posts: 1882
  • Country: de
Re: ARM with fast parallel GPIO
« Reply #11 on: January 18, 2021, 08:40:22 pm »
What about Cypress FX3, is an ARM A9 and has a nice 32bit programmable paralel interface at 100MHz  :-// ?

 Cheers,
 DC1MC
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: ARM with fast parallel GPIO
« Reply #12 on: January 18, 2021, 08:55:38 pm »
BGA-only. I'll avoid BGAs as much as I can.

Also, it is not A9, it is ARM9, namely ARM926EJ, a pretty old core. It is still better than 8051 stuff, of course.

They are also pretty pricey. It is fine if you actually need SS, but if all you need is a decent HS, then it gets a bit more questionable.
« Last Edit: January 18, 2021, 08:58:14 pm by ataradov »
Alex
 

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 8972
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: ARM with fast parallel GPIO
« Reply #13 on: January 20, 2021, 02:21:10 am »
In my case I was not interested in anything without High Speed USB, since my projects currently involve transferring large amounts of data to/from PC.
For the DAC, just get a cheap fl2k VGA adapter for 3 channels at up to about 150MS/s.
https://osmocom.org/projects/osmo-fl2k/wiki.
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living
Re: ARM with fast parallel GPIO
« Reply #14 on: February 04, 2021, 12:10:14 am »
Interfacing with FPGAs is such a common task that I don't understand why chip vendors do not include a dedicated peripheral for that, which would be also reusable as a general purpose parallel streaming interface.

THANK YOU ... yes, this exactly. I want to see a synchronous parallel bus master with bidirectional data, address output and byte-lane enable outs.  I want the interface to provide a clock to the FPGA, so the FPGA doesn't have to deal with synchronization. The interface should have a place in the micro's memory map so you talk to it just by accessing the memory space. That means it can be a target for DMA operations if necessary.

There are many "almost there" interfaces. But they seem mostly designed for memory. They support asynchronous SRAMs, so they don't provide the clock. You only get a clock out when the peripheral is configured as an SDRAM controller.

I don't know why such a general purpose synchronous parallel bus interface is not provided. hell, even expose the APB or AHB or whatever -- just output the damn clock!
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 3996
  • Country: nz
Re: ARM with fast parallel GPIO
« Reply #15 on: February 04, 2021, 12:38:22 am »
Quote
Teensy 4.0 only provides 50 I/O pins and Teensy 4.1 55, and a limited subset of the peripherals anyway.  But the ones it does provide, are pretty much amazing considering the price point and ease of use (4.0 being < USD $20).  Not to mention it runs at 600 MHz, and has 512kB+512kB of RAM.

Yeah, they're great.

Not only 600 MHz, they seem to run just fine at 960 MHz, though a heatsink would be a good idea at that speed. And it's got a really good dual-issue core, so the Teensy 4.0 at 960 MHz actually matches a U54 (i.e. Rocket chip, also used in FE310 and K210) at 1.45 GHz on my primes benchmark.

For $20 it's a beast.

Adding one or two 8 MB PSRAM chips to Teensy 4.1 for $1.20 each also seems a pretty good deal. If I get a 4.1 I'll have to try that.
 
The following users thanked this post: SiliconWizard

Offline aheid

  • Regular Contributor
  • *
  • Posts: 245
  • Country: no
Re: ARM with fast parallel GPIO
« Reply #16 on: March 08, 2021, 03:43:51 am »
Isn't this what the RPi Nano PIO stuff was made for?
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: ARM with fast parallel GPIO
« Reply #17 on: March 08, 2021, 03:50:35 am »
Isn't this what the RPi Nano PIO stuff was made for?
Yes, but it is attached to a subpar rest of the system. We need that principle to propagate to other MCUs.
Alex
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6171
  • Country: fi
    • My home page and email address
Re: ARM with fast parallel GPIO
« Reply #18 on: March 08, 2021, 04:19:36 am »
Isn't this what the RPi Nano PIO stuff was made for?
Yes, but it is attached to a subpar rest of the system. We need that principle to propagate to other MCUs.
Agreed; with a minimal ALU, please; at least addition, so we can do PDM.  And more than 32 instructions across several units.
 

Online Berni

  • Super Contributor
  • ***
  • Posts: 4922
  • Country: si
Re: ARM with fast parallel GPIO
« Reply #19 on: March 08, 2021, 08:12:11 am »
Use a MCU that has a RGB bus LCD controller inside it. You can likely abuse the timing settings to make it output one big blob of data from RAM. The bus runs on a fixed clock divided down from the main clock so it should give the DAC consistent timing.

But even just regular DMA into a GPIO port should work. For example the STM32H7 family has the GPIO peripheral connected to a 200MHz AHB bus. The bus that the DMA and RAM sit on is also 200MHz so the maximum throughput is likely 100MHz since half the time the DMA needs to read from RAM and half the time it needs to write to GPIO. This could possibly be pushed up to 150MHz if the DMA is smart enough to read samples as 32bit and then write them as 16bit to save some RAM read cycles. But it likely won't run any faster than that due to bus bandwidth limitations.

Past that you are going to need a FPGA.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: ARM with fast parallel GPIO
« Reply #20 on: March 08, 2021, 08:21:13 am »
Any solutions with GPIO+DMA are not workable in practice. Let's say I have 8-byte array I need to send to FPGA. I need 8 data lines + 1 clock line. So now you have to reformat your array as 16 bits + 2 times more for the clock toggle. So 4x the data size. Plus you need to convince DMA to only affect 9 bits of the GPIO port without touching the others. And with external clock it is impossible at all.

And for receive it is impossible with internal or external clock.

And the issue with using  display controllers or camera interfaces for this is that they generate or expect line and frame blanking and synchronization signals and often want the frame data to be aligned with those signals. All this while pixel clock is still generated for a dummy frame. So receiving this mess in FPGA is not easy.
« Last Edit: March 08, 2021, 08:23:13 am by ataradov »
Alex
 

Online Berni

  • Super Contributor
  • ***
  • Posts: 4922
  • Country: si
Re: ARM with fast parallel GPIO
« Reply #21 on: March 08, 2021, 01:53:39 pm »
Any solutions with GPIO+DMA are not workable in practice. Let's say I have 8-byte array I need to send to FPGA. I need 8 data lines + 1 clock line. So now you have to reformat your array as 16 bits + 2 times more for the clock toggle. So 4x the data size. Plus you need to convince DMA to only affect 9 bits of the GPIO port without touching the others. And with external clock it is impossible at all.

And for receive it is impossible with internal or external clock.

And the issue with using  display controllers or camera interfaces for this is that they generate or expect line and frame blanking and synchronization signals and often want the frame data to be aligned with those signals. All this while pixel clock is still generated for a dummy frame. So receiving this mess in FPGA is not easy.

You can do tricks to get around that. Some MCUs can be set up to provide a certain divided down clock on a pin, use this as a clock and then use some of the other mechanisms to start the DMA on its edge(wait loop or interrupt or timer event or something). Then once the transfer is done reconfigure the clock pin back to GPIO to stop the clock. Might be also possible to use the SPI peripheral to generate the clock while also triggering the DMA transfer. But yes all of this are pretty hacky solutions that involve things the MCU was never designed to do. Also when doing this DMA transfer you are likely limited in what the CPU can do in the mean time since doing a lot of RAM access might stall the DMA, so it might not be able to do any useful work while the transfer is happening.

If this was a solution for production id definitely just use a FPGA that can do such a thing easily, or at the very least a simple cheep dumb CPLD that just orchestrates data transfer between the device and a SRAM chip. But once you do have a FPGA you can transfer data in any weird way you like, even if it is RGB LCD frames, tho id recommend using the external memory bus functionality of MCUs for that.
 

Offline Doctorandus_P

  • Super Contributor
  • ***
  • Posts: 3321
  • Country: nl
Re: ARM with fast parallel GPIO
« Reply #22 on: May 15, 2021, 12:19:09 am »
It's not ARM, but the Cypress CY7C68013A has a pretty specialized interface to stream data between USB and I/O. That is the reason it is very popular in Logic Analyzers and USB scopes. The chip itself is a boring 8051 compatible.

It's not super fast for today's world, but Cypress also has an "FX3" variant and that one's a lot quicker. I do not know if "FX3" still has an 8051 core. Maybe Cypress even put that peripheral also in other chips.

Just another Idea:
What do USB HDD's use these days? Maybe you can repurpose that hardware. Going from SATA to PATA, is not much more then a shift register.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: ARM with fast parallel GPIO
« Reply #23 on: May 15, 2021, 12:56:52 am »
FX3 has ARM926. But it is also only available in BGA packages and generally more annoying to use.
Alex
 

Offline TimCambridge

  • Regular Contributor
  • *
  • Posts: 97
  • Country: gb
Re: ARM with fast parallel GPIO
« Reply #24 on: May 15, 2021, 03:03:01 pm »
Octal SPI?
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6171
  • Country: fi
    • My home page and email address
Re: ARM with fast parallel GPIO
« Reply #25 on: May 15, 2021, 03:47:56 pm »
If anyone is interested, I just posted in another thread here a description on how to use SET and CLEAR GPIO registers available on many ARMs with a couple of lookup tables, to implement semi-efficient "buses" with completely arbitrary pin mappings; just in case there are devs unfamiliar with this technique.

These write-only registers have the property that clear (0) bits mean "no change" to the output pin state, and set bits (1) either set or clear the GPIO pin output state corresponding to that bit and bank.  There is usually also a third register, TOGGLE, that inverts the output pin state; this can be very useful for read/write strobes and the clock pin.  The downside is that it does require a dozen or so instructions per "bus" access, some RAM (or Flash, if the pinout is fixed at compile time) lookup accesses, and that even when the pins are in the same GPIO bank, rising edges are always before falling edges (or vice versa).

The code shown there is not optimized for any specific architecture, and can be optimized further.  (In particular, instead of doing a second lookup for the clear part, one could just invert all the bus bits in the looked-up bit masks instead.  Much faster.)
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: ARM with fast parallel GPIO
« Reply #26 on: May 16, 2021, 10:52:46 am »
I know you don't want FPGA, but it seem to me that a Zynq 7010, SmartFusion or a Hercules M7 would work best for your use. All of which are FPGA + ARM combo chips, Zynq 7010 has a dual-core Cortex-A9, while SmartFusion and Hercules M7 has Cortex-M3.

For the package, SmartFusion has QFP144 and QFP208. Hercules M7 has QFP144 and QFN88.

What you need here would be a minimal initialization on the FPGA with that parallel interface and necessary memories hooked directly into the ARM core.
« Last Edit: May 16, 2021, 10:54:57 am by technix »
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: ARM with fast parallel GPIO
« Reply #27 on: May 16, 2021, 10:58:05 am »
Interfacing with FPGAs is such a common task that I don't understand why chip vendors do not include a dedicated peripheral for that, which would be also reusable as a general purpose parallel streaming interface.
AFAIK most FPGA's uses either standard (Q)SPI interface or standard SRAM interface for MCU interfacing - from booting to application communications. Then there are all those FPGA + MCU hybrids like Microchip SmartFusion, Gowin Little Bee and Hercules M7, as well as FPGA + MPU hybrids like Zynq and Cyclone SoC.
 

Offline dietert1

  • Super Contributor
  • ***
  • Posts: 2018
  • Country: br
    • CADT Homepage
Re: ARM with fast parallel GPIO
« Reply #28 on: May 16, 2021, 12:46:02 pm »
I remember using a Kinetis Arm Cortex with a feature they called "Flexbus", maybe 2014 or so. I could configure it for different data and address widths and it would behave like a microcomputer bus, mapping a segment of the MCUs data address space. I used it to map the RAM of an LCD controller for fast access.

Regards, Dieter
 

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living
Re: ARM with fast parallel GPIO
« Reply #29 on: May 20, 2021, 05:20:30 am »
Interfacing with FPGAs is such a common task that I don't understand why chip vendors do not include a dedicated peripheral for that, which would be also reusable as a general purpose parallel streaming interface.
AFAIK most FPGA's uses either standard (Q)SPI interface or standard SRAM interface for MCU interfacing - from booting to application communications. Then there are all those FPGA + MCU hybrids like Microchip SmartFusion, Gowin Little Bee and Hercules M7, as well as FPGA + MPU hybrids like Zynq and Cyclone SoC.

The fun thing about FPGAs is that you can use any sort of thing you want to interface to the MCU. That is, there isn't any standard. I've used I2C, I've used UART, I've used the SiLabs EMIF.

But I still want a synchronous parallel interface that supplies the clock. Why do the MCU vendors give us an SDRAM interface but an async SRAM interface? Having the MCU supply a continuous clock with the data, address and R/W strobe solves a bunch of problems.
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: ARM with fast parallel GPIO
« Reply #30 on: May 20, 2021, 04:01:36 pm »
The fun thing about FPGAs is that you can use any sort of thing you want to interface to the MCU. That is, there isn't any standard. I've used I2C, I've used UART, I've used the SiLabs EMIF.
I should have been more clear about this. What I mean is ports that allow the MCU to both configure and communicate with he FPGA.

But I still want a synchronous parallel interface that supplies the clock. Why do the MCU vendors give us an SDRAM interface but an async SRAM interface? Having the MCU supply a continuous clock with the data, address and R/W strobe solves a bunch of problems.
I'd rather have an independent clock for the FPGA from the MCU, and use the WAIT flag for flow control.
 

Online Berni

  • Super Contributor
  • ***
  • Posts: 4922
  • Country: si
Re: ARM with fast parallel GPIO
« Reply #31 on: May 21, 2021, 05:26:17 am »
I should have been more clear about this. What I mean is ports that allow the MCU to both configure and communicate with he FPGA.
FPGAs can do self reconfiguration so its possible to have a FPGA reconfigure part of itself trough any custom interface. Tho this does get rather complicated.

Another way to use a custom bus to load a configuration into a FPGA is to implement a bridge from your custom interface to whatever external configuration flash it uses. You write the new configuration into flash and then reboot into it. To prevent bricking itself its also often possible to have 2 sets of images and choose the one to boot depending on some pins.

In any case pretty much every external boot flash FPGA out there supports booting from SPI flash. Not only does pretty much every MCU under the sun have a SPI port, the modern ones typically have like 2 to 6 of these things. Sometimes you also get I2C or a 8/16bit parallel bus. Also you don't always have to emulate a flash chip on this bus since a lot of the FPGAs typically also have a "slave mode" on this interface, so you get to feed the FPGA its bitstream at whatever speed you like.

But I still want a synchronous parallel interface that supplies the clock. Why do the MCU vendors give us an SDRAM interface but an async SRAM interface? Having the MCU supply a continuous clock with the data, address and R/W strobe solves a bunch of problems.
I'd rather have an independent clock for the FPGA from the MCU, and use the WAIT flag for flow control.

Once you have spent some time designing FPGA code you will see just how annoying clock crossings are. It's usually most convenient to make the FPGA march along at whatever clock the other device is going at because then you can always sample input signal on a clock edge and be sure you got good clean non metastable corrupted data.

The biggest problem with having no clock when you start doing asynchronous burst operations on a memory bus. These kinds of transfers tend to be required to squeeze the maximum throughout out of a MCUs memory controller. But during these transfers you typically have all the control lines stay in the active state while only the address pins count up. This makes it very difficult for a FPGA sampling the bus at its own clock speed to determine when it should put the next bit of data on the bus, especially since sampling right on a transition might get you a garbage address. So you see it counting 0x0E 0x0E 0x1E 0x10 0x10 0x10 0x10 0x11 0x11

Still QSPI is an excellent high throughput interface bus to a MCU. It still wont go as fast as a 16bit parallel bus but it does tend to support fairly high clock speeds so its not that far behind.
 

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living
Re: ARM with fast parallel GPIO
« Reply #32 on: May 24, 2021, 09:07:01 pm »
But I still want a synchronous parallel interface that supplies the clock. Why do the MCU vendors give us an SDRAM interface but an async SRAM interface? Having the MCU supply a continuous clock with the data, address and R/W strobe solves a bunch of problems.
I'd rather have an independent clock for the FPGA from the MCU, and use the WAIT flag for flow control.

Once you have spent some time designing FPGA code you will see just how annoying clock crossings are. It's usually most convenient to make the FPGA march along at whatever clock the other device is going at because then you can always sample input signal on a clock edge and be sure you got good clean non metastable corrupted data.

Berni's correct here. This is exactly why I want the micro to supply a clock to the FPGA that is synchronous with the data and address and strobes. It's the basic source-synchronous bus. It makes the design a lot easier.

I notice that the TI TM4C1294 "Tiva" and its cousin the MSP432E have exactly what I ask for in an external bus interface.
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: ARM with fast parallel GPIO
« Reply #33 on: May 26, 2021, 01:05:49 pm »
Synchronous parallel... can the FPGA just emulate SDRAM? Also is it somewhat synchronous if the asynchronous-style bus and the independent-ish clock has the same source? (STM32F103ZE + FPGA style - STM32F103ZE has an asynchronous parallel external memory interface with wait signals, it also have a MCO pin that outputs some internal clock signal, and both are clocked from the same PLL.

If you do get to use Zynq, SmartFusion or Hercules M7, those FPGA + MCP/MPU combo chips just give you bare AXI/AHB interfaces which is also synchronous.
 

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living
Re: ARM with fast parallel GPIO
« Reply #34 on: May 26, 2021, 07:01:37 pm »
Synchronous parallel... can the FPGA just emulate SDRAM?

Sure, but why overcomplicate the matter?

 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: ARM with fast parallel GPIO
« Reply #35 on: May 27, 2021, 03:15:12 am »
Sure, but why overcomplicate the matter?
It is much more common to find SDRAM support on processors, both MCUs and MPUs. It is up there with QSPI and direct internal clock output.
 

Offline jonroger

  • Regular Contributor
  • *
  • Posts: 72
  • Country: us
Re: ARM with fast parallel GPIO
« Reply #36 on: June 14, 2021, 02:37:38 pm »
Here is a case where 50 Msps parallel input was achieved on a teensy 4.1.   Simple software polling is good for about 15-30
 Msps.

https://forum.pjrc.com/threads/66201-Teensy-4-1-How-to-start-using-FlexIO?p=279459
I am available for custom hardware/firmware development.
 
The following users thanked this post: Nominal Animal

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6171
  • Country: fi
    • My home page and email address
Re: ARM with fast parallel GPIO
« Reply #37 on: June 15, 2021, 07:49:42 am »
The cheaper Teensy 4.0 ($20) has 5 FlexIO1 pins (2, 3, 4, 5, 33), 9 FlexIO2 pins (6, 7, 8, 9, 10, 11, 12, 13, 32), and 14 FlexIO3 pins (7, 8, 14, 25, 16, 17, 18, 19, 20, 21, 22, 23, 26, 27); of these, pins 7 and 8 are available in both FlexIO2 and FlexIO3.

FlexIO1 has one consecutive group of pins, 4-8 (pins 2, 3, 4, 33, 8).
FlexIO2 has three consecutive groups of pins: 0-3 (pins 10, 12, 11, 13), 10-12 (pins 6, 9, 32), 16-17 (pins 8, 7).
FlexIO3 has three consecutive groups of pins: 0-3 (pins 19, 18, 14, 15), 6-11 (pins 17, 16, 22, 23, 20, 21), and 14-17 (pins 26, 27, 8, 7).

What I haven't realized before, that just because the other pins aren't exposed, does not mean they cannot be used.  For example, FlexIO3 4-5 (AD_B1_04, AD_B1_05) and 12-13 (AD_B1_12, AD_B1_13) are not used on the Teensy (connection schematics here).  They do exist on the BGA package, but are not exposed nor used for other purposes, like say AD_B0_04..AD_B0_11 are for the bootloader chip.  So, if one does an 16-bit shift to FlexIO3 0-15, one gets bits 0..3, 6..11, and 14..15 on pins 19, 18, 14, 15, 17, 16, 22, 23, 20, 21, 26, 27; i.e., twelve of the 16 bits as outputs.  For receive/input, the 12 input bits are just spread over 16 data bits with two two-bit holes in the middle.

Mmm, more experiments to do! :-/O
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14297
  • Country: fr
Re: ARM with fast parallel GPIO
« Reply #38 on: June 16, 2021, 07:58:18 pm »
Sure, but why overcomplicate the matter?
It is much more common to find SDRAM support on processors, both MCUs and MPUs. It is up there with QSPI and direct internal clock output.

I see the point; SDRAM support is common, and the IOs dedicated to SDRAM can toggle much faster than typical GPIOs (they usually should accomodate 133MHz or 166MHz SDRAM.) Implementing a communication bus at those rates with common GPIOs is usually not possible on most MCUs.
 

Online Berni

  • Super Contributor
  • ***
  • Posts: 4922
  • Country: si
Re: ARM with fast parallel GPIO
« Reply #39 on: June 17, 2021, 09:20:22 am »
The SDRAM pins usually don't really have any special IO drivers on MCUs, they just set the IO for the highest drive strength.

You often see support for both SRAM and SDRAM on the same data and address pins. However they usually only support async SRAM, so you don't get any clock pin, and its understandable since most SRAM out there is async, its only the more specialized high performance stuff that is synchronous SRAM and that is something you typically wouldn't use as memory for a MCU.

The reason why you wouldn't want to use the SDRAM bus is because it is much more complicated than SRAM. On SRAM the address is simply placed on the address pins and the data is returned onto the data pins, that is it. On the other hand SDRAM operates in a way that is convenient for DRAM, so instead of an address it sends commands for selecting rows and columns, these commands take certain numbers of clock cycles to execute (CAS RAS timing) and read/write commands also execute with a certain numbers of cycles of delay, so you can string together read commands into the pipeline and have them execute staggered simultaneously with all of there data coming out later on. This means that acting like a SDRAM memory chip is a lot more work in a FPGA, while making it act like a SRAM chip takes about 10 lines of Verilog/VHDL.

Even the more serious larger SOCs sometimes retain a SRAM bus, but its mostly for backwards compatibility with async bus peripherals. The actual memory tends to be some flavor of DDR and this does typically run on dedicated pins that often don't even have the ability to be used as GPIO at all.
 
The following users thanked this post: Bassman59

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14297
  • Country: fr
Re: ARM with fast parallel GPIO
« Reply #40 on: June 17, 2021, 05:22:34 pm »
The SDRAM pins usually don't really have any special IO drivers on MCUs, they just set the IO for the highest drive strength.

Possibly so, although I do not know that for sure on all MCUs.

But that's beyond the point - what matters is, SDRAM controllers can usually get you faster data throughput than using any other peripheral on most MCUs. As I mentioned, what mid-range MCU using what peripheral  can get you a parallel bus @166MHz? I do not know of many examples. That was the point.

Of course if on some MCU, you have an FMC-like interface that can work at the same frequencies, it's much easier to use that. But I just haven't seen many MCUs like that.
« Last Edit: June 17, 2021, 05:24:56 pm by SiliconWizard »
 

Online Berni

  • Super Contributor
  • ***
  • Posts: 4922
  • Country: si
Re: ARM with fast parallel GPIO
« Reply #41 on: June 17, 2021, 07:09:37 pm »
True the SDRAM interface is typically the highest clocked. But it does have significant amount of overhead and latency where it has to set up a transaction before it happens so there are some slight tradeoffs still.

The kind of interface that is really well suited are synchronous SRAM with burst transfer support. This is used by some parallel interface raw flash chips.It looks similar to SRAM except it does have a clock and the burst transfer will transfer a word of data on each clock cycle. This also typically optionally supports address latch mode so the first word of the transfer contains the address, saving you a lot of address lines.

Still the sort of speeds possible with these are so high that a MCU might have difficulties doing something useful with it, since at that point the CPU only has 1 or 2 instruction cycles per word for doing any actuall processing on it. So for this reason a high clocked QSPI tends to be as fast as you would typically need to go. Most of the benefit for using a external memory bus is that the FPGA can be memory mapped to the CPU like any other peripheral. This potentially saves a significant amount of overhead inside peripheral drivers that otherwise need to be told to actually send a command over that does the thing you want to do, yet if its memory mapped the CPU can work with it just like the FPGA was built into the same chip. No need to DMA buffers over into RAM to be worked on, you can just access the buffer like it is RAM itself.
 

Online tom66

  • Super Contributor
  • ***
  • Posts: 6678
  • Country: gb
  • Electronics Hobbyist & FPGA/Embedded Systems EE
Re: ARM with fast parallel GPIO
« Reply #42 on: June 30, 2021, 09:00:26 pm »
For 4-bits, Quad SPI can be used.  However, for more bits, it gets tricky.

A solution I have seen before used an inexpensive CPLD, which took SPI data and derived a 1/8th clock from the byte data to do serial to parallel conversion.    In principle if you keep the SPI buffer fed the data will be continuous. The trick then comes in doing it at high data rates - SPI might max out around 25Mbit/s, so max 8-bit data rate ~3.12MHz.  You could try a quad-SPI to 8-bit converter, which would get you up to 12MHz.  Still quite below your 50-100MHz goal.

There exist some ARM+FPGA devices, though probably overkill for this.   A lightweight option of a Microsemi SmartFusion SoC is an option, but the toolchain is bloody terrible, and the device isn't exactly cheap.  Xilinx Zynq is almost certainly overkill, but this kind of work is its bread and butter (software/FPGA fusion)

A CPLD or micro FPGA with an integrated PLL might be able to take jittery 8-bit parallel data and clean it up somewhat - feed in the data with some handshaking signals and you could get a decent buffer rate out.  But you'd probably need to buffer hundreds of samples to clean the jitter up nice enough and the FPGA design would not necessarily be trivial if you haven't done that kind of thing before.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14297
  • Country: fr
Re: ARM with fast parallel GPIO
« Reply #43 on: June 30, 2021, 10:45:36 pm »
True the SDRAM interface is typically the highest clocked. But it does have significant amount of overhead and latency where it has to set up a transaction before it happens so there are some slight tradeoffs still.

Yes, it's far from ideal. I was just somehow getting technix's point.

Generally speaking, I find easy-to-use, general-purpose, high-speed interfaces lacking on MCUs in general.
Your best bet for high-speed/low pin count are ethernet (but it's rarely gigabit ethernet on MCUs, which would still be "only" about 100 MBytes/s, and 100M ethernet is 1/10th that...) and USB. Both are relatively annoying to implement on the other side, except if the device you want to communicate with already supports this.

Could be nice to have some high-speed SERDES on common MCUs, say Cortex-M7, that could be used for general-purpose stuff, without too much overhead and a not too complex protocol to handle. And preferably one that is not covered by a nasty patent with fees to pays to use it. Anyway...

Otherwise, of course, one approach is to use a FPGA. Either connected to a MCU - as said by tom66, comm between the two can be asynchronous, and the FPGA can take care of making it synchronous. Or, you can do it all on FPGA with a soft core.
 

Online Berni

  • Super Contributor
  • ***
  • Posts: 4922
  • Country: si
Re: ARM with fast parallel GPIO
« Reply #44 on: July 01, 2021, 05:54:39 am »
I seen cases where people used the SPI peripheral as "SERDES" on a 8bit AVR in order to get fast enough IO to bitbang color composite video out of it. Tho on these small 8bit MCUs the SPI runs at the same clock speed as the CPU. On a more modern ARM chip the SPI typically runs significantly slower on its own peripheral bus, so its not nearly as useful of a trick. Still QSPI is not to be underestimated in terms of speed.

In my projects i always used a SRAM bus to get high bandwidth communication between a MCU and FPGA and it works great (Especially love being able to debug FPGA peripherals straight from the MCU IDE since its all just memory). Or if speed was not as critical, then a SPI bus that encodes  bus read/write commands.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19280
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: ARM with fast parallel GPIO
« Reply #45 on: July 01, 2021, 07:42:04 am »
Hello,

I'm looking for some Cortex-M MCU which would be ideal to feed fast DAC through a parallel interface.

1. I don't want any DSP or FPGA, just a regular ARM MCU.
2. I don't have any strict minimal speeds in mind just as fast as it can be.. getting some parallel interface that could run close to 50-100Mhz would be nice.

I'm in a process of reading up on various MCUs and going over my options but if there is somebody who has used some ARM for something similar I would be glad to hear it.

Hardware and bus interfaces are usually the easy bit. The more difficult bit is guaranteeing by design hard realtime timing in software, especially if the processor has caches and uses interrupts. Naturally doing anything other than feeding the i/o significantly complicates matters, but that is not insuperable.

There is only one family of processors that I am aware of that directly addresses and solves those issues: the XMOS xCORE processors. Buy those at DigiKey. FFI, see my other posts on the subject.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online DiTBho

  • Super Contributor
  • ***
  • Posts: 3793
  • Country: gb
Re: ARM with fast parallel GPIO
« Reply #46 on: July 01, 2021, 11:31:40 am »
[..] timing [..]

I am having a lot of troubles with u-boot on a the RAM controller of PowerPC SoM.
I have here several PC100 and PC133 ram sticks. Some do work, some do not work.
I am manually patching the code here and there, and timing is very problematic.

I guess, it's not as easy as people think.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living
Re: ARM with fast parallel GPIO
« Reply #47 on: July 02, 2021, 04:49:54 am »
Hello,

I'm looking for some Cortex-M MCU which would be ideal to feed fast DAC through a parallel interface.

1. I don't want any DSP or FPGA, just a regular ARM MCU.
2. I don't have any strict minimal speeds in mind just as fast as it can be.. getting some parallel interface that could run close to 50-100Mhz would be nice.

I'm in a process of reading up on various MCUs and going over my options but if there is somebody who has used some ARM for something similar I would be glad to hear it.

Hardware and bus interfaces are usually the easy bit. The more difficult bit is guaranteeing by design hard realtime timing in software, especially if the processor has caches and uses interrupts. Naturally doing anything other than feeding the i/o significantly complicates matters, but that is not insuperable.

There is only one family of processors that I am aware of that directly addresses and solves those issues: the XMOS xCORE processors. Buy those at DigiKey. FFI, see my other posts on the subject.

... but what if my application is not hard real-time? I just want a fast synchronous parallel bus. The bus fits somewhere in the micro's memory map and provides address (of however many bits are interesting), data (same), read/write indication and maybe a chip select just to save pins (we don't need a full 32 bit address). By synchronous I mean that the micro provides a clock synchronous to the bus, and that clock always runs, so my FPGA doesn't need a separate oscillator and there are no synchronization issues. The micro does read and write accesses to that address space and it just works. We've been doing this forever, back when we were using microprocessors and not microcontrollers!

This bus exists: synchronous SRAMs use them. Hell, even old-school parallel PCI is exactly this, albeit with the overhead of BARs and such. But PCI cores are still provided in most current FPGA families so ...

Anyway: TI TM4C Tiva parts and MSP432 parts have a synchronous parallel EBI.

Anyway: if you're using an XMOS part, do you need the FPGA? That's application-dependent, of course.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19280
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: ARM with fast parallel GPIO
« Reply #48 on: July 02, 2021, 08:20:45 am »
Hello,

I'm looking for some Cortex-M MCU which would be ideal to feed fast DAC through a parallel interface.

1. I don't want any DSP or FPGA, just a regular ARM MCU.
2. I don't have any strict minimal speeds in mind just as fast as it can be.. getting some parallel interface that could run close to 50-100Mhz would be nice.

I'm in a process of reading up on various MCUs and going over my options but if there is somebody who has used some ARM for something similar I would be glad to hear it.

Hardware and bus interfaces are usually the easy bit. The more difficult bit is guaranteeing by design hard realtime timing in software, especially if the processor has caches and uses interrupts. Naturally doing anything other than feeding the i/o significantly complicates matters, but that is not insuperable.

There is only one family of processors that I am aware of that directly addresses and solves those issues: the XMOS xCORE processors. Buy those at DigiKey. FFI, see my other posts on the subject.

... but what if my application is not hard real-time? I just want a fast synchronous parallel bus. The bus fits somewhere in the micro's memory map and provides address (of however many bits are interesting), data (same), read/write indication and maybe a chip select just to save pins (we don't need a full 32 bit address). By synchronous I mean that the micro provides a clock synchronous to the bus, and that clock always runs, so my FPGA doesn't need a separate oscillator and there are no synchronization issues. The micro does read and write accesses to that address space and it just works. We've been doing this forever, back when we were using microprocessors and not microcontrollers!

This bus exists: synchronous SRAMs use them. Hell, even old-school parallel PCI is exactly this, albeit with the overhead of BARs and such. But PCI cores are still provided in most current FPGA families so ...

Anyway: TI TM4C Tiva parts and MSP432 parts have a synchronous parallel EBI.

Anyway: if you're using an XMOS part, do you need the FPGA? That's application-dependent, of course.

If you don't need hard realtime then life is much easier and there are many more solutions :)

XMOS xCORE processors fit into the niche between standard MCUs and FPGAs, offering (within limits!) the advantages of FPGAa with the advantages of standard software development tools. Whether your application's requirements fit xCORE/MCU/FPGA/discrete/etc technology is always an interesting topic. It is a shame that too many people only have a hammer in their toolbox :)
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf