Author Topic: Daisy chaining NOR flash modules to create a high bandwidth ROM  (Read 1181 times)

0 Members and 1 Guest are viewing this topic.

Offline OM222OTopic starter

  • Frequent Contributor
  • **
  • Posts: 768
  • Country: gb
for a project (FPGA image processing accelerator) I need to create a high bandwidth read-only memory. I settled on using Quad SPI NOR flash modules (will use them in XIP mode) but I have some concerns. First I'm not sure if I need to use buffer ICs for the clock and CS lines (for prototyping I used 4 and it was fine to drive them using 1 pin, but the end goal is something like 64 daisy-chained). If I do, can you recommend some high-speed chips? The modules can run at 133 MHz which is much faster than anything I've worked with, in the past.

Another question is do I need to impedance match the data lines? if yes, does the propagation delay in the clock buffers cause problems?

How about termination resistors?

I have validated the system at 1MHz on a breadboard and want to move on to the PCB design, but It'd be great to have some tips on high-speed routing considerations before designing a PCB. If you can point me to any sources that I can learn from, that'd be great!

Also before you suggest "the right tool for the job" (for example an FPGA with built-in HBM2 or similar), this is a student project and those solutions are out of the question. These SPI modules are fairly cheap and I/O pins on the FPGA are free. A 256-bit bus at 133MHz is around 4.25GB/s which should be fast enough for real-time image processing. If not, FPGA boards with around 500 IO pins are still affordable (around 100$).
« Last Edit: April 13, 2022, 01:43:23 am by OM222O »
 

Online Someone

  • Super Contributor
  • ***
  • Posts: 4981
  • Country: au
    • send complaints here
 

Offline free_electron

  • Super Contributor
  • ***
  • Posts: 8550
  • Country: us
    • SiliconValleyGarage
Re: Daisy chaining NOR flash modules to create a high bandwidth ROM
« Reply #2 on: April 13, 2022, 03:24:26 am »
parallel rom ? there is flsh memory or good old eprom with 16 bit databusses. slap 4 of those next to each other and you get a 64 bit wide datapath. that's 8 bytes per clockcycle. a 12ns rom system would give you a sub 2ns per byte. that's 500mBYTE per second !
how hard do you want to go ?

other option : external sdram and copy the rom to ram during bootup
Professional Electron Wrangler.
Any comments, or points of view expressed, are my own and not endorsed , induced or compensated by my employer(s).
 

Offline OM222OTopic starter

  • Frequent Contributor
  • **
  • Posts: 768
  • Country: gb
Re: Daisy chaining NOR flash modules to create a high bandwidth ROM
« Reply #3 on: April 13, 2022, 03:46:16 am »
as a rough estimate, I need to load in a 4,706,878 element array into FIFOs for processing, with each element being 32 bits. That's around 150 megabits for one grayscale frame. 3x that for RGB and the target is 60 FPS which starts to add up quickly. This is for a moderate size operation (images are 926x926) so if in the future we decide to go higher resolution, for example, 1080x1080, you can see I do need something in the range of a few GB/s data rate. Anyhow, I don't want this topic to drift into oblivion like the last thread, so please lets just stick to the main question: What should I consider for routing 133MHz SPI signals?
 

Online Someone

  • Super Contributor
  • ***
  • Posts: 4981
  • Country: au
    • send complaints here
Re: Daisy chaining NOR flash modules to create a high bandwidth ROM
« Reply #4 on: April 13, 2022, 04:35:57 am »
As before you've pointed yourself in the wrong direction, and done little to no research before posting the question, and now are already so far in you wont change course.
https://electronics.stackexchange.com/questions/443109/nor-flash-max-frequency-of-operation
PCB routing and impedance are not the major problems, they can be fairly loose as its mostly (short) point to point signalling. Fan-out needs assessment with a signal integrity method. Solve all that? Then you will get completely lost with the complexity of the logic and their constraints to operate 100MHz+.
 

Offline OM222OTopic starter

  • Frequent Contributor
  • **
  • Posts: 768
  • Country: gb
Re: Daisy chaining NOR flash modules to create a high bandwidth ROM
« Reply #5 on: April 13, 2022, 05:27:21 am »
I'm not sure what you mean by "complexity of the logic and their constraints to operate 100MHz+." The "logic" for each core is simply a single DSP slice performing integer MAC and vivado seems to be happy with running the slice at those speeds. besides the logic doesn't have to run at the same frequency as the memory because the data is transferred to FIFOs where it'll be processed in parallel so if the logic needs to run slower, I can simply add more FIFOs and "cores" to process it. There are plenty of DSP slices and CLBs for that and it won't be an issue. if you're curious about the end architecture, here is a diagram:


The data width and transfer rate to FIFOs will be different but that also doesn't matter, as long as there is enough throughput from the ROM to read the size, index, and coefficients, everything else works.

Regarding max flash clock speed, I'm more concerned with the PCB design for the time being. If that works but the controller is too slow, I can always switch to DTR mode and use half the frequency and read at both clock edges, which requires minimal changes to the SPI controller I've written.

I've spent the past year and a half validating the designs and running simulations and everything works in theory, so I'm at the final stage of delivering the product. I would appreciate it if you can point me to some sources for high-speed PCB layout, impedance matching and proper line termination.
 

Online Berni

  • Super Contributor
  • ***
  • Posts: 5028
  • Country: si
Re: Daisy chaining NOR flash modules to create a high bandwidth ROM
« Reply #6 on: April 13, 2022, 05:58:53 am »
You will definitely need some clock buffering chips to distribute the clock and CS signal to that many places. At 133MHz you might also want to think about roughly length matching traces since with this many chips the far end chip might end up with a rather long trace. You also want to wire it up to the correct pins on the FPGA that let the clock be generated by the PLL directly as well as phase shifted around to help you squeeze the best possible timings out of it.

Running FPGA designs at >100MHz also takes some care as it starts to require pipelined design and LUT implementation considerations. So the HDL code has to be designed to run fast from the beginning by using speed optimization tricks. Knowing how to use timing analysis helps a lot here to identify clock speed bottlenecks.

That being said using QSPI flash is an odd choice when there are flash chips with 8bit or 16bit buses on the market that run even run faster than 133MHz. You can also buy ready to go FPGA dev boards that include DDR3 memory connected to a hard ip controller. So for 32bit DDR3 running at 400MHz you get 3.2GB/s
 
The following users thanked this post: OM222O

Online Someone

  • Super Contributor
  • ***
  • Posts: 4981
  • Country: au
    • send complaints here
Re: Daisy chaining NOR flash modules to create a high bandwidth ROM
« Reply #7 on: April 13, 2022, 08:13:22 am »
At 133MHz you might also want to think about roughly length matching traces since with this many chips the far end chip might end up with a rather long trace.
I see this repeated quite often, yet its almost always wrong. Consider some situations where length matching is required:
https://resources.altium.com/p/how-do-pcb-trace-length-matching-vs-frequency
1.3mm of differential trace skew for matching 6ps edges.
https://www.nxp.com/docs/en/application-note/AN2582.pdf
1.2mm of data pair skew for 20ps alignment (333MHz DDR).

Scale that back to the bit period of QSPI and the trace length matching is hard to get wrong! 1/10 of a bit interval at 133MHz is 750ps, you could loop around a SOIC-8 and still be ok. But even that level of matching is unnecessary with typical QSPI controllers as they aren't relying on static timing.
 

Online Berni

  • Super Contributor
  • ***
  • Posts: 5028
  • Country: si
Re: Daisy chaining NOR flash modules to create a high bandwidth ROM
« Reply #8 on: April 13, 2022, 09:39:38 am »
At 133MHz you might also want to think about roughly length matching traces since with this many chips the far end chip might end up with a rather long trace.
Scale that back to the bit period of QSPI and the trace length matching is hard to get wrong! 1/10 of a bit interval at 133MHz is 750ps, you could loop around a SOIC-8 and still be ok. But even that level of matching is unnecessary with typical QSPI controllers as they aren't relying on static timing.

That is why i used the world "roughly length matching"

No need for Altiums fancy automatic length matching to match every last fraction of a milometer out of it. The FPGA pins themselves won't be matched that closely unless you use appropriate DQS groups.

What i meant by it is to not stack the flash chips in rows and then just run the clock and data lines up the rows, so that the first flash chip is right next to the clock source while the last flash chip has its clock line running half way around the entire board. No need to actually measure the lengths, just lay out the board in a way where they end up within the same sort of length ballpark. A few centimeters up or down, doesn't really matter.
 
The following users thanked this post: Someone

Offline free_electron

  • Super Contributor
  • ***
  • Posts: 8550
  • Country: us
    • SiliconValleyGarage
Re: Daisy chaining NOR flash modules to create a high bandwidth ROM
« Reply #9 on: April 13, 2022, 02:37:36 pm »
The FPGA pins themselves won't be matched that closely unless you use appropriate DQS groups.

That's why you need to feed the pin length into the board layout so the bond wire length can be accommodated for. all the high speed device manufacturers can give you that data per pin.
Professional Electron Wrangler.
Any comments, or points of view expressed, are my own and not endorsed , induced or compensated by my employer(s).
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf