Author Topic: ARM with fast parallel GPIO  (Read 4154 times)

0 Members and 1 Guest are viewing this topic.

Offline kamtar

  • Regular Contributor
  • *
  • Posts: 62
ARM with fast parallel GPIO
« on: January 16, 2021, 11:49:35 pm »
Hello,

I'm looking for some Cortex-M MCU which would be ideal to feed fast DAC through a parallel interface.

1. I don't want any DSP or FPGA, just a regular ARM MCU.
2. I don't have any strict minimal speeds in mind just as fast as it can be.. getting some parallel interface that could run close to 50-100Mhz would be nice.

I'm in a process of reading up on various MCUs and going over my options but if there is somebody who has used some ARM for something similar I would be glad to hear it.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 7208
  • Country: fr
Re: ARM with fast parallel GPIO
« Reply #1 on: January 17, 2021, 12:06:27 am »
You should be able to do that with the FMC peripheral of STM32's MCUs, for instance. For 100MHz, I would suggest a STM32F7 or H7.
Now the issue there is that the speed is not the only requirement. If you're driving a parallel DAC, data has to get out at a fixed frequency with low jitter. I can't guarantee the FMC peripheral of above MCUs can get you that.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 7982
  • Country: us
    • Personal site
Re: ARM with fast parallel GPIO
« Reply #2 on: January 17, 2021, 12:11:17 am »
It is going to be a challenge. I'm looking for some ARM device with a fast parallel interface with no special protocol, but not a whole lot of luck.

Most of the time you get static memory controller, but it inserts address latch cycles, so not ideal for just transferring raw data. In some cases there is ability do disable the address cycle, but the interface speed is still not that fast. One such example is Nuvoton M480 series.

Often the best way to read/write raw data stream is camera/display interfaces. But again, in many cases controllers are too smart and expect proper hsync/vsync pulses.

For getting the data into the device I found SAM E70 to be the best. It has parallel capture controller, which boils down to 8-bit bus, external clock, and a couple enable pins.

Interfacing with FPGAs is such a common task that I don't understand why chip vendors do not include a dedicated peripheral for that, which would be also reusable as a general purpose parallel streaming interface.
Alex
 

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 7416
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: ARM with fast parallel GPIO
« Reply #3 on: January 17, 2021, 02:03:42 pm »
You should be able to do that with the FMC peripheral of STM32's MCUs, for instance. For 100MHz, I would suggest a STM32F7 or H7.
Now the issue there is that the speed is not the only requirement. If you're driving a parallel DAC, data has to get out at a fixed frequency with low jitter. I can't guarantee the FMC peripheral of above MCUs can get you that.
Run it in slave mode with an external oscillator supplying the clock. That said, such a high clock rate is asking a bit much from a microcontroller, a cheap FPGA or CPLD would probably be a better solution.
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline kamtar

  • Regular Contributor
  • *
  • Posts: 62
Re: ARM with fast parallel GPIO
« Reply #4 on: January 17, 2021, 04:56:28 pm »
Thanks for the inputs, I will take a look at SAM E70s.
To be more precise I don't plan driving DAC directly it's more of a DDS IC, this is just for a prototype I will use to actually figure out what everything I can do with it using the modulation registers and so on.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 2827
  • Country: fi
    • My home page and email address
Re: ARM with fast parallel GPIO
« Reply #5 on: January 17, 2021, 07:30:46 pm »
It is going to be a challenge. I'm looking for some ARM device with a fast parallel interface with no special protocol, but not a whole lot of luck.
Did you look at SAM D5x/E5x?

I looked at common ARM microcontrollers available to a hobbyist like me that could provide parallel 18-bit data port to displays (+ 5 or so control signals) via DMA, and basically the only one I found with > 16-bit wide GPIO banks was SAM D5x/E5x.  On these, the GPIO A bank has 26 consecutive pins (A00 to A25), B bank 18 (B00 to B17) on 64-pin TQFP/VQFN and 26 (B00 to B25) on 100-pin TQFP/VQFN, and so on.  Using suitable choices of pins, you can do 32-bit DMA to/from an entire pin bank.

However, I am no EE, and have no idea about D5x/E5x hidden gotchas; all I wanted/looked at was having sufficiently wide GPIO bank I could DMA data from in parallel, on a common enough ARM microcontroller.  Any insight on parallel GPIO on these?
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 7982
  • Country: us
    • Personal site
Re: ARM with fast parallel GPIO
« Reply #6 on: January 17, 2021, 07:35:30 pm »
Did you look at SAM D5x/E5x?
Every single day at work. I work for Microchip :)

In my case I was not interested in anything without High Speed USB, since my projects currently involve transferring large amounts of data to/from PC.

But also SAM D5x/E5x will not be fast, it takes at a minimum 6 clock cycles to toggle the pin. At 120 MHz absolute best toggling (just toggling, no actual logic) rate is 20 MHz.. If you want to set the data, it will be way-way slower.

DMA-ing a parallel interface is not easy, as there is no real trigger from the GPIO controller.
Alex
 
The following users thanked this post: jancumps, Nominal Animal, I wanted a rude username

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 2827
  • Country: fi
    • My home page and email address
Re: ARM with fast parallel GPIO
« Reply #7 on: January 17, 2021, 10:34:29 pm »
In my case I was not interested in anything without High Speed USB, since my projects currently involve transferring large amounts of data to/from PC.
I use a Teensy 4.0 and 4.1 for that (NXP i.MX RT1064, Cortex-M7).  For a hobbyist like me, Teensyduino is a pretty easy environment to play with.

Pity i.MX RT1064 is only available in MAPBGA-196 with 0.65mm or 0.8mm pitch; I definitely don't have the skills to make my own board with those.  Otherwise, it'd be a pretty darn powerful microcontroller with lots of RAM for many use cases, like that Arduino-programmable display controller (for games or human-machine interfaces) that I experimented with SAMD51J20A.
 

Offline kamtar

  • Regular Contributor
  • *
  • Posts: 62
Re: ARM with fast parallel GPIO
« Reply #8 on: January 18, 2021, 12:19:29 am »
Pity i.MX RT1064 is only available in MAPBGA-196 with 0.65mm or 0.8mm pitch; I definitely don't have the skills to make my own board with those.  Otherwise, it'd be a pretty darn powerful microcontroller

I didn't read up on RT1064 and I bet it's much better than the RT1010 and RT1020 in LQFP I have experience with but still, those MCUs are made to a price and they aren't that good as they seem on paper, lots of limitations and compromises in their peripherals.
« Last Edit: January 18, 2021, 12:21:17 am by kamtar »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 2827
  • Country: fi
    • My home page and email address
Re: ARM with fast parallel GPIO
« Reply #9 on: January 18, 2021, 08:25:57 pm »
Pity i.MX RT1064 is only available in MAPBGA-196 with 0.65mm or 0.8mm pitch; I definitely don't have the skills to make my own board with those.  Otherwise, it'd be a pretty darn powerful microcontroller
I didn't read up on RT1064 and I bet it's much better than the RT1010 and RT1020 in LQFP I have experience with but still, those MCUs are made to a price and they aren't that good as they seem on paper, lots of limitations and compromises in their peripherals.
Teensy 4.0 only provides 50 I/O pins and Teensy 4.1 55, and a limited subset of the peripherals anyway.  But the ones it does provide, are pretty much amazing considering the price point and ease of use (4.0 being < USD $20).  Not to mention it runs at 600 MHz, and has 512kB+512kB of RAM.  (Teensy 4.1 has pads for additional PSRAM, though.)  I haven't found the practical upper limit for USB HS bandwidth yet, I only know it is over 200 Mbits/s (25 MiB/s) because even a simple Arduino/Teensyduino using USB CDC ACM in one direction achieves that.  I like them.

Sure, there are limitations (and I hear the development of Teensy 4.x took a lot of time and effort, especially the early initialization part), but they're nothing compared to the gains.  Personally, I'm not even interested in most of the peripherals; I just want DMA, GPIO, SPI, I2C, USB HS, and lots of RAM; and preferably contiguous banks of GPIO pins so I could DMA out data in parallel to a small display controller, with easy to use DMA triggers.  The stuff I do isn't complicated.

I haven't seen anything comparable.  The STM32H7 series looks interesting, but requires a separate ULPI transceiver for USB HS.  That does not mean they are not available; that's just what this one hobbyist has seen :-//
 

Offline kamtar

  • Regular Contributor
  • *
  • Posts: 62
Re: ARM with fast parallel GPIO
« Reply #10 on: January 18, 2021, 08:35:52 pm »
Pity i.MX RT1064 is only available in MAPBGA-196 with 0.65mm or 0.8mm pitch; I definitely don't have the skills to make my own board with those.  Otherwise, it'd be a pretty darn powerful microcontroller
I didn't read up on RT1064 and I bet it's much better than the RT1010 and RT1020 in LQFP I have experience with but still, those MCUs are made to a price and they aren't that good as they seem on paper, lots of limitations and compromises in their peripherals.
Teensy 4.0 only provides 50 I/O pins and Teensy 4.1 55, and a limited subset of the peripherals anyway.  But the ones it does provide, are pretty much amazing considering the price point and ease of use (4.0 being < USD $20).  Not to mention it runs at 600 MHz, and has 512kB+512kB of RAM.  (Teensy 4.1 has pads for additional PSRAM, though.)  I haven't found the practical upper limit for USB HS bandwidth yet, I only know it is over 200 Mbits/s (25 MiB/s) because even a simple Arduino/Teensyduino using USB CDC ACM in one direction achieves that.  I like them.

Sure, there are limitations (and I hear the development of Teensy 4.x took a lot of time and effort, especially the early initialization part), but they're nothing compared to the gains.  Personally, I'm not even interested in most of the peripherals; I just want DMA, GPIO, SPI, I2C, USB HS, and lots of RAM; and preferably contiguous banks of GPIO pins so I could DMA out data in parallel to a small display controller, with easy to use DMA triggers.  The stuff I do isn't complicated.

I haven't seen anything comparable.  The STM32H7 series looks interesting, but requires a separate ULPI transceiver for USB HS.  That does not mean they are not available; that's just what this one hobbyist has seen :-//

I used RT1010 on my last board (USB UAC2 DAC) and yeah its the only option you have if you really want those 500Mhz for cheap but too limiting for my prototyping (mainly that I cant use it as a clock divider) so I'm eyeing that SAME70 now.
 

Offline DC1MC

  • Super Contributor
  • ***
  • Posts: 1343
  • Country: de
Re: ARM with fast parallel GPIO
« Reply #11 on: January 18, 2021, 08:40:22 pm »
What about Cypress FX3, is an ARM A9 and has a nice 32bit programmable paralel interface at 100MHz  :-// ?

 Cheers,
 DC1MC
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 7982
  • Country: us
    • Personal site
Re: ARM with fast parallel GPIO
« Reply #12 on: January 18, 2021, 08:55:38 pm »
BGA-only. I'll avoid BGAs as much as I can.

Also, it is not A9, it is ARM9, namely ARM926EJ, a pretty old core. It is still better than 8051 stuff, of course.

They are also pretty pricey. It is fine if you actually need SS, but if all you need is a decent HS, then it gets a bit more questionable.
« Last Edit: January 18, 2021, 08:58:14 pm by ataradov »
Alex
 

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 7416
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: ARM with fast parallel GPIO
« Reply #13 on: January 20, 2021, 02:21:10 am »
In my case I was not interested in anything without High Speed USB, since my projects currently involve transferring large amounts of data to/from PC.
For the DAC, just get a cheap fl2k VGA adapter for 3 channels at up to about 150MS/s.
https://osmocom.org/projects/osmo-fl2k/wiki.
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2045
  • Country: us
  • Yes, I do this for a living
Re: ARM with fast parallel GPIO
« Reply #14 on: February 04, 2021, 12:10:14 am »
Interfacing with FPGAs is such a common task that I don't understand why chip vendors do not include a dedicated peripheral for that, which would be also reusable as a general purpose parallel streaming interface.

THANK YOU ... yes, this exactly. I want to see a synchronous parallel bus master with bidirectional data, address output and byte-lane enable outs.  I want the interface to provide a clock to the FPGA, so the FPGA doesn't have to deal with synchronization. The interface should have a place in the micro's memory map so you talk to it just by accessing the memory space. That means it can be a target for DMA operations if necessary.

There are many "almost there" interfaces. But they seem mostly designed for memory. They support asynchronous SRAMs, so they don't provide the clock. You only get a clock out when the peripheral is configured as an SDRAM controller.

I don't know why such a general purpose synchronous parallel bus interface is not provided. hell, even expose the APB or AHB or whatever -- just output the damn clock!
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 2320
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: ARM with fast parallel GPIO
« Reply #15 on: February 04, 2021, 12:38:22 am »
Quote
Teensy 4.0 only provides 50 I/O pins and Teensy 4.1 55, and a limited subset of the peripherals anyway.  But the ones it does provide, are pretty much amazing considering the price point and ease of use (4.0 being < USD $20).  Not to mention it runs at 600 MHz, and has 512kB+512kB of RAM.

Yeah, they're great.

Not only 600 MHz, they seem to run just fine at 960 MHz, though a heatsink would be a good idea at that speed. And it's got a really good dual-issue core, so the Teensy 4.0 at 960 MHz actually matches a U54 (i.e. Rocket chip, also used in FE310 and K210) at 1.45 GHz on my primes benchmark.

For $20 it's a beast.

Adding one or two 8 MB PSRAM chips to Teensy 4.1 for $1.20 each also seems a pretty good deal. If I get a 4.1 I'll have to try that.
 
The following users thanked this post: SiliconWizard

Offline aheid

  • Regular Contributor
  • *
  • Posts: 236
  • Country: no
Re: ARM with fast parallel GPIO
« Reply #16 on: March 08, 2021, 03:43:51 am »
Isn't this what the RPi Nano PIO stuff was made for?
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 7982
  • Country: us
    • Personal site
Re: ARM with fast parallel GPIO
« Reply #17 on: March 08, 2021, 03:50:35 am »
Isn't this what the RPi Nano PIO stuff was made for?
Yes, but it is attached to a subpar rest of the system. We need that principle to propagate to other MCUs.
Alex
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 2827
  • Country: fi
    • My home page and email address
Re: ARM with fast parallel GPIO
« Reply #18 on: March 08, 2021, 04:19:36 am »
Isn't this what the RPi Nano PIO stuff was made for?
Yes, but it is attached to a subpar rest of the system. We need that principle to propagate to other MCUs.
Agreed; with a minimal ALU, please; at least addition, so we can do PDM.  And more than 32 instructions across several units.
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 3715
  • Country: si
Re: ARM with fast parallel GPIO
« Reply #19 on: March 08, 2021, 08:12:11 am »
Use a MCU that has a RGB bus LCD controller inside it. You can likely abuse the timing settings to make it output one big blob of data from RAM. The bus runs on a fixed clock divided down from the main clock so it should give the DAC consistent timing.

But even just regular DMA into a GPIO port should work. For example the STM32H7 family has the GPIO peripheral connected to a 200MHz AHB bus. The bus that the DMA and RAM sit on is also 200MHz so the maximum throughput is likely 100MHz since half the time the DMA needs to read from RAM and half the time it needs to write to GPIO. This could possibly be pushed up to 150MHz if the DMA is smart enough to read samples as 32bit and then write them as 16bit to save some RAM read cycles. But it likely won't run any faster than that due to bus bandwidth limitations.

Past that you are going to need a FPGA.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 7982
  • Country: us
    • Personal site
Re: ARM with fast parallel GPIO
« Reply #20 on: March 08, 2021, 08:21:13 am »
Any solutions with GPIO+DMA are not workable in practice. Let's say I have 8-byte array I need to send to FPGA. I need 8 data lines + 1 clock line. So now you have to reformat your array as 16 bits + 2 times more for the clock toggle. So 4x the data size. Plus you need to convince DMA to only affect 9 bits of the GPIO port without touching the others. And with external clock it is impossible at all.

And for receive it is impossible with internal or external clock.

And the issue with using  display controllers or camera interfaces for this is that they generate or expect line and frame blanking and synchronization signals and often want the frame data to be aligned with those signals. All this while pixel clock is still generated for a dummy frame. So receiving this mess in FPGA is not easy.
« Last Edit: March 08, 2021, 08:23:13 am by ataradov »
Alex
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 3715
  • Country: si
Re: ARM with fast parallel GPIO
« Reply #21 on: March 08, 2021, 01:53:39 pm »
Any solutions with GPIO+DMA are not workable in practice. Let's say I have 8-byte array I need to send to FPGA. I need 8 data lines + 1 clock line. So now you have to reformat your array as 16 bits + 2 times more for the clock toggle. So 4x the data size. Plus you need to convince DMA to only affect 9 bits of the GPIO port without touching the others. And with external clock it is impossible at all.

And for receive it is impossible with internal or external clock.

And the issue with using  display controllers or camera interfaces for this is that they generate or expect line and frame blanking and synchronization signals and often want the frame data to be aligned with those signals. All this while pixel clock is still generated for a dummy frame. So receiving this mess in FPGA is not easy.

You can do tricks to get around that. Some MCUs can be set up to provide a certain divided down clock on a pin, use this as a clock and then use some of the other mechanisms to start the DMA on its edge(wait loop or interrupt or timer event or something). Then once the transfer is done reconfigure the clock pin back to GPIO to stop the clock. Might be also possible to use the SPI peripheral to generate the clock while also triggering the DMA transfer. But yes all of this are pretty hacky solutions that involve things the MCU was never designed to do. Also when doing this DMA transfer you are likely limited in what the CPU can do in the mean time since doing a lot of RAM access might stall the DMA, so it might not be able to do any useful work while the transfer is happening.

If this was a solution for production id definitely just use a FPGA that can do such a thing easily, or at the very least a simple cheep dumb CPLD that just orchestrates data transfer between the device and a SRAM chip. But once you do have a FPGA you can transfer data in any weird way you like, even if it is RGB LCD frames, tho id recommend using the external memory bus functionality of MCUs for that.
 

Online Doctorandus_P

  • Super Contributor
  • ***
  • Posts: 1691
  • Country: nl
Re: ARM with fast parallel GPIO
« Reply #22 on: May 15, 2021, 12:19:09 am »
It's not ARM, but the Cypress CY7C68013A has a pretty specialized interface to stream data between USB and I/O. That is the reason it is very popular in Logic Analyzers and USB scopes. The chip itself is a boring 8051 compatible.

It's not super fast for today's world, but Cypress also has an "FX3" variant and that one's a lot quicker. I do not know if "FX3" still has an 8051 core. Maybe Cypress even put that peripheral also in other chips.

Just another Idea:
What do USB HDD's use these days? Maybe you can repurpose that hardware. Going from SATA to PATA, is not much more then a shift register.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 7982
  • Country: us
    • Personal site
Re: ARM with fast parallel GPIO
« Reply #23 on: May 15, 2021, 12:56:52 am »
FX3 has ARM926. But it is also only available in BGA packages and generally more annoying to use.
Alex
 

Offline TimCambridge

  • Regular Contributor
  • *
  • Posts: 87
  • Country: gb
Re: ARM with fast parallel GPIO
« Reply #24 on: May 15, 2021, 03:03:01 pm »
Octal SPI?
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf