Author Topic: FPGA, Synchronous RAM & Triple 8-bit videoDAC for 1080p  (Read 4595 times)

0 Members and 1 Guest are viewing this topic.

Offline ElEctric_EyETopic starter

  • Newbie
  • Posts: 2
  • Country: us
FPGA, Synchronous RAM & Triple 8-bit videoDAC for 1080p
« on: September 08, 2014, 11:47:07 pm »
I found this site after I did an ixquick search for scanline buffer + fifo + dualport ram and I was directed to this site and found this post. More specifically in that thread your member PK mentioned this:
...As far as I know, there are 5 ways to deal with the writes, but two of them really stand out (here's my design, if you'd like to reference the chip and the VHDL I used: http://hackaday.io/project/1943/log/6464-yes-but-why)

  • Switching Buffers - Use two RAM chips, and when your write is done, flip the read/write RAM.  This is how most video cards do it today.
  • Interleaved write - I'm using this one in my project, and like you I'm clocked at 2x the pixel clock.  The only reason I'm able to do it is because the WE pulse only has to be 8ns for the RAM I picked, ISSI's IS61LV5128AL
  • Dual Port RAM - Special hardware
  • Zero Bus Turnaround RAM (sort of like interleaving, but you can read/write on different edges) - Special hardware
  • Wait for the V/H blanking intervals (Doubtful on a CPLD unless data comes in very slowly)
...
Currently, at the heart of my project lies a 144-pin Xilinx XC6SLX9-144, a (ZBT/NoBL) GSI GS8320Z18AGT-400 2Mx18 synchronous RAM (in flow-thru mode) and a ADV7125 triple 8 bit videoDAC. It is a 16-bit system with an embedded 16-bit  6502 'like' processor with 5-6-5 RGB video outputted to a 1080p monitor's VGA input  through a VGA connector on a board I designed and soldered. Currently it is displaying 1920x1080, non-interlaced. The pixel clock and H/V sync generator is running @150MHz, while the cpu/blockRAMs/hardware accelerator are only capable of running 1/2 speed @ 75MHz. I have developed a simple 2D accelerator that does Bresenham lines, rectangular fill, pixel plot, read pixel color, copy & paste rectangle, character plot.

I built the project in order to learn Verilog and now the project is going on to about 3 years. While I have learned alot, I am not a master yet and do not foresee myself becoming one soon. My trade is auto tech by day. It pays the bills, but my love is/has been electronics since 8th grade back in the late 1970's.

The board is simple, it is 3.5"x2.8". More info about it can be found in the Programmable Logic section of http://forum.6502.org . Search Parallel Video Board.

I quoted your member PK because he mentioned 5 ways to get a glitch-free writeable display without waiting for a non-display period, but I think there may be another method involving a scan-line buffer utilizing an FPGA dual-port RAM, and a FIFO? But I cannot yet picture the state machine in my head. Any help? TIA


 

Offline DanielS

  • Frequent Contributor
  • **
  • Posts: 798
Re: FPGA, Synchronous RAM & Triple 8-bit videoDAC for 1080p
« Reply #1 on: September 09, 2014, 12:48:18 am »
I quoted your member PK because he mentioned 5 ways to get a glitch-free writeable display without waiting for a non-display period, but I think there may be another method involving a scan-line buffer utilizing an FPGA dual-port RAM, and a FIFO? But I cannot yet picture the state machine in my head. Any help? TIA
It depends on exactly what you want to achieve here.

If all you want to do is be able to access the frame buffer memory without having to worry about disrupting the output stream to your display as you are drawing to memory, you can certainly craft a FIFO/DMA engine that keeps the display output buffer as full as possible so your microcontroller can interrupt the SRAM-to-display stream for drawing at just about any time. This works if you want to do double/triple-buffering on a shared memory pool as well.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: FPGA, Synchronous RAM & Triple 8-bit videoDAC for 1080p
« Reply #2 on: September 09, 2014, 02:37:20 am »
I've got a project that does something like what you are after. http://hamsterworks.co.nz/mediawiki/index.php/File:Dualhead_mcb_frame_buffer.zip

It uses the Spartan 6 Memory control block, but your design using SRAM will have much the same functional blocks

   Application logic (here is where your 6502 can go) - at the CPU freq
      /\
      ||
      \/
   One port on the Memory controller (which has built in FIFOs), bridging the CPU freq to the memory frequency
      /\
      ||
      \/
   Hardened Memory Controller, running at memory frequency <==>  DDR RAM - 16x @ 200MHz = 800MB/s peak
      ||
      ||
      \/
    Another port on the Memory controller (read only), which has a built in 64-word data FIFO, bridging the memory frequency to the pixel clock
      ||
      ||
      \/
    VGA Generator, running at pixel clock ==> Analog VGA out
      ||
      ||
      \/
   DVI-D Interface, running at pixel clock, 2x pixel clock and 5x pixel clock.
      ||
      ||
      \/
   DVI-D Output running, at 10x pixel clock (using DDR)

The priority on the MCB interface is set such that read-only port has proiority over the CPU inferface, allowing it to never get staved of bytes.

To implement this with SRAM the architecture will be similar, and your SRAM memory controller have a strong preference for keeping the FIFO to the DAC fuil over processing transactions from the CPU - even to the point of stalling the CPU interface for a long period of time.

Another technique I've used in the past is to interleave video and processor transactions, but it isn't very efficient for a few reasons (e.g. bus turnaround), so I don't think it will work for you at your desired resolution. It only really worked well for me with monochrome 1-bit-per-pixel display, as needed a read only once every 8 clock cycles, and such a design can really only be used with SRAM as the DRAM refresh cycles and row activations get in the way upsetting the timing, unless you can schedule the all for the blanking intervals.

However, a SRAM based frame buffer is a bit of a technological dead end - SRAM is very expensive per bit. You really need a design that uses a FIFO to decouple the output pixel pipeline from the memory controller to the DAC.

For your next design this paper has some interesting ideas on exploting the banked nature of SDRAM to give you full memory bandwidth most of the time : http://ics.kaist.ac.kr/intjpapers/High-Performance%20and%20Low-Power%20Memory%20Interface%20Architecture.pdf


Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline DanielS

  • Frequent Contributor
  • **
  • Posts: 798
Re: FPGA, Synchronous RAM & Triple 8-bit videoDAC for 1080p
« Reply #3 on: September 09, 2014, 03:43:29 am »
The priority on the MCB interface is set such that read-only port has proiority over the CPU inferface, allowing it to never get staved of bytes.
If you use a block-RAM based FIFO instead of a 64 bytes LUT-based one for the read-only buffer assuming you can spare one, that gives you a 1k pixels read-ahead. I seriously doubt the microcontroller will manage to tie up the memory controller long enough at a time to starve the display buffer even without priority... and if the microcontroller port is really busy sometimes, you can defer the priority status until the display buffer is at least 3/4-empty to make things more efficient. This way, as long as the microcontroller's bursts are not too long, the output buffer never drops below empty threshold and the microcontroller never needs to wait after it.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: FPGA, Synchronous RAM & Triple 8-bit videoDAC for 1080p
« Reply #4 on: September 09, 2014, 04:16:32 am »
I seriously doubt the microcontroller will manage to tie up the memory controller long enough at a time to starve the display buffer even without priority... and if the microcontroller port is really busy sometimes,

It is a no-brainer, you have to do it.

If you make the display the highest priority and it doesn't use it all, it doesn't make a difference, Your FIFO remains comfortably full for most of the time and it doesn't impact the CPU at all.

If you make them equal priority and bandwidth is oversubscribed then your display will be broken during busy times - the exact problem the OP is trying to avoid.

If you make read-for-display burst size big enough that there is minimal overhead (e.g. 16 or 32 words at a time), your FIFO only needs to be 32 or 64 words deep. Wasting a block RAM is just a shame, esp on the small LX9. The OP has also got to work out how he syncs the start of frame over the FIFO, and being able to add one or two bits to the width of the FIFO to send through the start of frame flag (or HSYNC/VSYNC bits) is  a great advantage.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline ElEctric_EyETopic starter

  • Newbie
  • Posts: 2
  • Country: us
Re: FPGA, Synchronous RAM & Triple 8-bit videoDAC for 1080p
« Reply #5 on: September 09, 2014, 11:45:00 am »
...However, a SRAM based frame buffer is a bit of a technological dead end - SRAM is very expensive per bit.

The IC was expensive at ~$60, but for static RAM it's a "fast" part capable of 250MHz and has Zero Bus Turnaround. I tried running it at 300MHz in order to try the interleave idea, but the SRAM doesn't operate correctly, no surprise there.

You really need a design that uses a FIFO to decouple the output pixel pipeline from the memory controller to the DAC...
This is what I am trying to grasp. :scared: Thanks for sharing the links. I'm checking them out.
 

Offline DanielS

  • Frequent Contributor
  • **
  • Posts: 798
Re: FPGA, Synchronous RAM & Triple 8-bit videoDAC for 1080p
« Reply #6 on: September 09, 2014, 01:43:08 pm »
If you make read-for-display burst size big enough that there is minimal overhead (e.g. 16 or 32 words at a time), your FIFO only needs to be 32 or 64 words deep. Wasting a block RAM is just a shame, esp on the small LX9. The OP has also got to work out how he syncs the start of frame over the FIFO, and being able to add one or two bits to the width of the FIFO to send through the start of frame flag (or HSYNC/VSYNC bits) is  a great advantage.
The way I sync my display output is simply reset the stream during vertical blanking and start refilling the buffer from there so the FIFO is full by the time the output needs its first pixel.

As for "wasting" a BRAM, that depends on whether OP already needed it for something else or not. If you code the dual-port SRAM HDL properly (took me a while to get this right), XST will automatically pick between LUT and BRAM implementation depending on the array's size and available resources so if you have leftover BRAMs, you can size your buffer for that and if you later need to repurpose that BRAM for something else, you shrink it to match your preferred or minimum size requirement is depending on what CLB/LUT resources you can spare. If you use generics, the change can be as simple as changing one or two values in the port map.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf