Author Topic: FPGA "soft" serializer  (Read 4255 times)

0 Members and 1 Guest are viewing this topic.

Offline dcarrTopic starter

  • Regular Contributor
  • *
  • Posts: 117
FPGA "soft" serializer
« on: November 14, 2019, 12:25:09 am »
Reading through the recent VGA/HDMI thread had me thinking about how to implement a simple HDMI interface on a FPGA.  To do that, we need to be able to take a low speed 8-bit R/G/B signal and serialize it---emitting a 10x faster 1-bit signal.  Some FPGAs provide native hardware for doing this, but I'm curious if there are ways to do this with standard FPGA logic.

Let's say that our high speed (single bit) is 250MHz and our low speed (10 bit word) clock is 25MHz.  Let go further and say that the lower speed clock is produced from the high speed clock and thus is synchronous.

Here's some Verilog that seems like what we want:

reg [9:0] serializer_reg;

always @(posedge slow_clk)
begin
   serializer_reg <= pixel_data;  //pixel_data is 10 bits coming from somewhere else
end

reg [3:0] counter;
reg output_bit; //high speed output

always @(posedge fast_clk)
begin
   if (counter == MAX)
      counter <= 0;  //wrap around
 
   output_bit <= serializer_reg[counter];  //emit the serialized data
end

In practice, I can image all sorts of issues here.  We're crossing a clock domain and even though the clocks are related it seems like there could be timing issues.  Is this a valid approach or how is a situation like this normally handled?

Thanks!
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4655
  • Country: dk
Re: FPGA "soft" serializer
« Reply #1 on: November 14, 2019, 12:29:13 am »
run everything on the fast clock and treat slow clk as an enable
 

Offline dcarrTopic starter

  • Regular Contributor
  • *
  • Posts: 117
Re: FPGA "soft" serializer
« Reply #2 on: November 14, 2019, 12:43:05 am »
run everything on the fast clock and treat slow clk as an enable

This is definitely the most straightforward approach, but it puts a lot of timing pressure on elements in the slow path that isn't strictly necessary.
I wonder if there are other ways...
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4655
  • Country: dk
Re: FPGA "soft" serializer
« Reply #3 on: November 14, 2019, 12:53:50 am »
run everything on the fast clock and treat slow clk as an enable

This is definitely the most straightforward approach, but it puts a lot of timing pressure on elements in the slow path that isn't strictly necessary.
I wonder if there are other ways...

if you sample the pixel data and slow clk with the fast clk, and picks the data at the right time I don't see why it would affect the slow path
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: FPGA "soft" serializer
« Reply #4 on: November 14, 2019, 01:29:46 am »
Using DDR to halve the fast clock domain's rate is a happy midpoint, and far simpler than using the SERDES hardware

Working with 25MHz and 125MHz is quite achievable.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: FPGA "soft" serializer
« Reply #5 on: November 14, 2019, 02:10:56 am »
Thinking a little more than this... here's the entire design for the "soft serialiser" running in the fast clock domain for one channel

As long as your slow and fast clocks are derived from the same source (e.g. PLL/MMCM/whatever) you should have timing issues at all. And should you want to use a DDR register you just need to shift two bits at a time.

Code: [Select]
signal phase : std_logic_vector(9 downto 0) := "0000000001";

process(fast_clock)
  begin
    if rising_edge(fast_clock) then
      -- Output new bit - Don't forget to constrain out_bit to an IO block!
      out_bit <= shift_reg(9);

      -- keep track of phase
      phase = phase(8 downto 0) & phase(9);
 
      -- either shift, or load a new value value.
      if phase(0) = '1' then
         shift_reg <= data_in_fast_domain;
      else
         shift_reg <= shift_reg(8 downto 0) & '0';
      end if;

      -- Very simple synchronous clock domain crossing
      -- most likely not really needed at all, but will ease timing
      data_in_fast_domain <= data_in_slow_domain
  end if;

Better using a shift register rather than a MUX, especially if your fast domain starts to push the FPGA's limits.
« Last Edit: November 14, 2019, 02:17:01 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: ebclr, Yansi, dcarr

Offline Scrts

  • Frequent Contributor
  • **
  • Posts: 798
  • Country: lt
Re: FPGA "soft" serializer
« Reply #6 on: November 18, 2019, 02:06:32 pm »
This is commonly done for years now. I remember using DVB-ASI serializer softcore on Cyclone II. DDR output will also help as already mentioned here.
 

Offline Ditiris

  • Contributor
  • Posts: 10
  • Country: us
Re: FPGA "soft" serializer
« Reply #7 on: November 26, 2019, 02:02:50 am »
Others have already answered, but there is no timing issue. If you derive the fast clock and slow clock from the same source the tools will automatically infer the correct constraint. You can get rid of the "extra timing pressure" where you don't need it by creating a multicycle constraint when you sample the parallel data with the faster clock domain, but for 250MHz, there should be plenty of margin in a modern FPGA.

You're doing manually what an output serializer block would otherwise do. The difference is the I/O blocks have physically optimized flip-flops for the fast serial-side clocks and transitions to/from the parallel domain.
 

Offline aandrew

  • Frequent Contributor
  • **
  • Posts: 277
  • Country: ca
Re: FPGA "soft" serializer
« Reply #8 on: November 26, 2019, 04:47:06 am »
This is a newbie-ish question but I've always struggled with anything but the most basic timing constraints. I'm getting better but it's a struggle.

I understand the concept behind a multi-cycle constraint (the data does not need to be valid at the destination FF until some number of cycles after it leaves the source FF), but specifying it is the trick.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3238
  • Country: ca
Re: FPGA "soft" serializer
« Reply #9 on: November 26, 2019, 03:53:52 pm »
The hardware SERDES has clock restrictions too, and the faster you go the harder it is to keep fast and slow cocks in-sync. Xilinx has special clocking schemes just for this purose.

The mechanism itself is very simple - you latch the shift register at the slow clock edge, then shift it out. The MSB of the shift register is your output.

Code: [Select]
process(fast_clock)
  begin
    if rising_edge(fast_clock) then
      last_slow_clock <= slow_clock;
      if (last_slow_clock = '0') and (slow_clock = '1') then
        shift_reg <= slow_reg;
     else
        shift_reg <= shift_reg(6 downto 0) & '0';
     end if;
   end if;
end process;
« Last Edit: November 26, 2019, 04:08:50 pm by NorthGuy »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8031
  • Country: ca
Re: FPGA "soft" serializer
« Reply #10 on: November 27, 2019, 06:22:42 am »
Ok, so the rules are simple, a soft serializer for 'DVI' output.  Meaning, no differential drivers, yet DVI needs differential.  At least 25MHz pixel clock for VGA output.  Using a -C8 IC.  8 outputs are required, 2 for pixel clock, 2 for red, 2 for green, 2 for blue.  To get the best timing, we will need to output these on 1 IO bank, preferably the higher speed one if your FPGA has IO banks suited for DDR IOs, use them.

I setup this code in Quartus with a Cyclone III -C8 FPGA to see what happens.

Code: [Select]
module sw_serialize (
 input wire reset,
 input wire sclk,
 input wire pclk,
 input wire [9:0] r,g,b,
 
 output reg [7:0] serial_dvi );

wire [9:0] c;
assign c=10'b0000011111;

reg [9:0] ser_out[8];
reg [9:0] c_reg,r_reg,g_reg,b_reg;
reg       last_pclk,pclk_trigger;

integer i;

always @ (posedge pclk) begin
r_reg <= r;
g_reg <= g;
b_reg <= b;
c_reg <= c;
end


always @ (posedge sclk) begin
last_pclk <= pclk;
pclk_trigger <= ~last_pclk && pclk;

if ( pclk_trigger ) begin
ser_out[0] <=  c_reg;
ser_out[1] <= ~c_reg;
ser_out[2] <=  r_reg;
ser_out[3] <= ~r_reg;
ser_out[4] <=  g_reg;
ser_out[5] <= ~g_reg;
ser_out[6] <=  b_reg;
ser_out[7] <= ~b_reg;
end else begin
for ( i=0 ; i<8 ; i=i+1 ) ser_out[i][8:0] <= ser_out[i][9:1];
end

for ( i=0 ; i<8 ; i=i+1 ) serial_dvi[i] <= ser_out[i][0];

end
endmodule
With the top diagram looking like this:
879560-0

It compiled with an FMAX of 402MHz.  Whoa, hey, I think this is great.  Let's do a functional simulation:
879564-1

Ok, the reference clock and 's_div[0] & s_dvi[1]' works, while the red and green 's_dvi[2..5]' also hold true to their 10 bit 10101010 input patterns, as well as the blue deliberate inverted clock pattern on the 's_dvi[6..7]'.

Ok, let's do a timing simulation:
879568-2

Whoa, what the crap...  My vertical grid is 1ns and everything is crapped out of time by up to 2 ns.  Now I realize that did not specify any IO timing restrictions, but I just want to see what's possible here.

Next step, some secret magic sauce.  In Quartus's 'Assignment Editor', I add an assignment to the 's_dvi[]' output group.  The assignment is called 'Fast Output Registers'.  This particular assignment tells the Fitter in Quartus that the 'flip-flop' driving the IO pin must be the one of the 2 or 4 which is right at that IO pin, and any logic before generating the output is forced to that particular flip-flop driving the IO pin.  (The 2-4 flip-flops at the IO pin using 'fast-output-enable' on a 180 degree phased clock is how the DDR IO system works if used, the other 2 flip-flops are for the DDR input.)

Now, after running a timing simulation, I get this:
879572-3

Wow, a crap load better.  In fact, for VGA 640x480, this is well timed enough to drive DVI directly.  In fact, here is a closeup with a 100ps time grid:
879576-4

Our pin-2-pin error is now within 50ps, IE +/-25ps timing error, and this is on a -C8 Cyclone III.  Let me just re-check the FMAX.
Still 402MHz, however, there are some timing hold violations between the 'pclk' feeding the last_pclk and pclk_trigger which are clocked by the 'sclk'  To fix this, I had to delay the PLL's pclk output by 1ns.

This is actually really good as the commercial broadcast HDMI 480P with digital audio requires 270MHz and we clear it no problem.  Without any DDR trick.  The same FPGA with a -C6 gives me a FMAX of 437MHz.  Though the outputs seem to come out 700ps sooner, the +/-25 ps error between each output is still a match.

Now, I know everyone says, where are you going to get an HDMI core with digital audio...  Well, here is a Verilog open core of exactly that:
https://www.eevblog.com/forum/fpga/fpga-vga-controller-for-8-bit-computer/msg2783700/#msg2783700

Well, should I try for higher frequencies, like 720p.  We would need 743MHz serial out, or using the DDR trick, 372MHz.  The -C8 just might pull it off, however, I would need to read the data sheet to make sure.

(Note: I did not use serdes IOs, or LVDS, or differential as you can see with my Verilog code.  This was 100% 2.5v IO output setting in Quartus)

Additional LOL, I just added the assignment of maximum current strength, and the timing error between pins has improved around another 50%.

I've attached a copy of the Quartus project.  It just needs re-compile to build everything.
« Last Edit: November 27, 2019, 08:35:35 am by BrianHG »
 
The following users thanked this post: ebclr, Yansi

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8031
  • Country: ca
Re: FPGA "soft" serializer
« Reply #11 on: November 27, 2019, 08:44:18 pm »
Ok, Ok, Ok,.... If I didn't do it, I would  |O .

The DDR trick, using a Cyclone III, slowest C8 version, attempting 720p DVI which would be 742.5 Megabits per second.
Step 1, the change to the code and block diagram:
880026-0
Code: [Select]
module sw_serialize (
 input wire reset,
 input wire sclk,
 input wire pclk,
 input wire [9:0] r,g,b,
 
 output reg [7:0] serial_dvi_h,
 output reg [7:0] serial_dvi_l );

wire [9:0] c;
assign c=10'b0000011111;

reg [9:0] ser_out[8];
reg [9:0] c_reg,r_reg,g_reg,b_reg;
reg       last_pclk,pclk_trigger;

integer i;

always @ (posedge pclk) begin
r_reg <= r;
g_reg <= g;
b_reg <= b;
c_reg <= c;
end


always @ (posedge sclk) begin
last_pclk <= pclk;
pclk_trigger <= ~last_pclk && pclk;

if ( pclk_trigger ) begin
ser_out[0] <=  c_reg;
ser_out[1] <= ~c_reg;
ser_out[2] <=  r_reg;
ser_out[3] <= ~r_reg;
ser_out[4] <=  g_reg;
ser_out[5] <= ~g_reg;
ser_out[6] <=  b_reg;
ser_out[7] <= ~b_reg;
end else begin
for ( i=0 ; i<8 ; i=i+1 ) ser_out[i][7:0] <= ser_out[i][9:2];
end

for ( i=0 ; i<8 ; i=i+1 ) serial_dvi_h[i] <= ser_out[i][1];
for ( i=0 ; i<8 ; i=i+1 ) serial_dvi_l[i] <= ser_out[i][0];

end
endmodule

The results, running it at the old 25MHz:
880030-1

Hun?  Some of the bits are selected out of place.  Oh, dumb me, I have to swap these 1 and 0 around in the code:
   for ( i=0 ; i<8 ; i=i+1 ) serial_dvi_h[ i ] <= ser_out[ i ][0];
   for ( i=0 ; i<8 ; i=i+1 ) serial_dvi_l[ i ] <= ser_out[ i ][1];

Ok:
880034-2

That looks good, now up the pixel clock to 74.25Mhz:
880038-3

Now for the more astute, you may have noticed a little more timing noise than in the SDR version in the above thread, so here is a closeup:
880042-4

As you can see with the 100ps grid, the error is twice as bad as the SDR version.  This is to be expected as having a single clocked DFF driving an IO always on, single clock input, would have less noise than a 2 DFFs, clocked at half rate, yet output enable buffers on their shared output cycling back-and-forth would introduce some timing errors.

I wonder if a +/-50ps error introduced into a 720p DVI transmission is still within spec/usable by monitors?

I wonder what would happen if I tried clocking the SDR version to 742.5Mhz:
880046-5
As you can see, the data is corrupt, even on the fastest C6 version of the CycloneIII, however, I wonder what would happen If I make a hybrid DDR/SDR version to gain that extra fine clock edge...

Here is my idea:
880050-6
and:
880054-7
It works, error free!!! Though the timing report has a few complaints.  Even so:
880058-8
Zoomed in with a 100ps grid, you can see the edge alignment is almost twice as refined as the authentic DDR method, and we are still using a -C8 CycloneIII.

I'll attach the normal DDR version of the Quartus project which has no timing violations.

Note that even with that +/-50ps timing error, there is still a full 1.2ns wide data window.  The edges may look better if we used dedicated differential outputs or strategically chose the best group of IOs, however, having the cleanest output means getting the longest possible cable drive working to the worst compliant HDMI TV input.  This would most likely have little trouble working with a good PCB and a 6 foot / 1 meter cable.  Do not expect miracles coming out of a FPGA output unless you will be using a HDMI re-clocker cable driver amp IC which is designed to create it's own new clocked edges and designed to drive a 5-10 meter cable.
« Last Edit: November 27, 2019, 09:04:31 pm by BrianHG »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2782
  • Country: ca
Re: FPGA "soft" serializer
« Reply #12 on: November 27, 2019, 10:07:22 pm »
Do not expect miracles coming out of a FPGA output unless you will be using a HDMI re-clocker cable driver amp IC which is designed to create it's own new clocked edges and designed to drive a 5-10 meter cable.
This is a difference between bad FPGA and good one :) 7 series can output HDMI on any differential pairs (which are 48 out of 50 IO pins per bank), it's fully HDMI-compliant drive, and it requires exactly zero fabric resources - only a single MCMM to generate 5x serial clock from pixel clock. I had zero issues driving HDMI signal over 5 meter long cable. And that is on top of all other advantages Xilinx offers (faster fabric, lots of FREE IPs like DDR3 controller), so I wonder why would any hobbyist bother using other parts, except for maybe toy FPGAs like iCE40 as they have some advantages like low pin count and (relatively) low power consumption.

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8031
  • Country: ca
Re: FPGA "soft" serializer
« Reply #13 on: November 27, 2019, 10:14:43 pm »
Do not expect miracles coming out of a FPGA output unless you will be using a HDMI re-clocker cable driver amp IC which is designed to create it's own new clocked edges and designed to drive a 5-10 meter cable.
This is a difference between bad FPGA and good one :) 7 series can output HDMI on any differential pairs (which are 48 out of 50 IO pins per bank), it's fully HDMI-compliant drive, and it requires exactly zero fabric resources - only a single MCMM to generate 5x serial clock from pixel clock. I had zero issues driving HDMI signal over 5 meter long cable. And that is on top of all other advantages Xilinx offers (faster fabric, lots of FREE IPs like DDR3 controller), so I wonder why would any hobbyist bother using other parts, except for maybe toy FPGAs like iCE40 as they have some advantages like low pin count and (relatively) low power consumption.
Having the right chip for the job is great.
This thread was about a 'soft serializer'.  I'm using a 10-15 year old chip.  This design would work on a 2.84$ MAX 10 fpga and output a much cleaner signal as it's modern DDR IOs have 3x the performance of the fastest CycloneII/III.
« Last Edit: November 27, 2019, 10:16:55 pm by BrianHG »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2782
  • Country: ca
Re: FPGA "soft" serializer
« Reply #14 on: November 28, 2019, 04:41:18 am »
This thread was about a 'soft serializer'.  I'm using a 10-15 year old chip. 
Well, 7 series was introduced in 2010, so it's almost 10 years old too ;)

This design would work on a 2.84$ MAX 10 fpga and output a much cleaner signal as it's modern DDR IOs have 3x the performance of the fastest CycloneII/III.
Except that signal won't contain anything useful because you're going to run out of gates to implement any logic which would actually generate something worth seeing ;)

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8031
  • Country: ca
Re: FPGA "soft" serializer
« Reply #15 on: November 28, 2019, 08:51:20 am »
Gate count isn't the problem, it IO and ram.  I only need 2k logic cells for a full display controller with stereo 16 channel sound, say something like a Lattice 'LCMXO3L-4300E-5UWG81CTR1K' (63 IOs) which has 4.3k logic cells at 3.54$ and a 90cent 64mb DDR2 sdram and I can give you a complete head's up display/controller or cheap 8bit/16bit true-color 720p gaming system with sound at more than double the level of a Amiga home computer, using the remaining gates for a soft-core CPU.

And I got even the cheap Amiga 16 color high res & 32/4096 color low res to display a lot of useful things.  720p true 65536 color or 16mcolors compressed (YUV 422) color would have been a dream.

(Awww crap, I spent soooo much money on my Amiga hardware in the late 80's to early 90's.....)
« Last Edit: November 28, 2019, 03:00:02 pm by BrianHG »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2782
  • Country: ca
Re: FPGA "soft" serializer
« Reply #16 on: November 28, 2019, 03:13:08 pm »
Gate count isn't the problem, it IO and ram.  I only need 2k logic cells for a full display controller with stereo 16 channel sound, say something like a Lattice 'LCMXO3L-4300E-5UWG81CTR1K' (63 IOs) which has 4.3k logic cells at 3.54$ and a 90cent 64mb DDR sdram and I can give you a complete head's up display/controller or cheap 8bit/16bit true-color 720p gaming system with sound at more than double the level of a Amiga home computer, using the remaining gates for a soft-core CPU.
I would love to see a source code of a DDR controller with HDMI out DMA which take less than 2k cells. Do you happen to have it handy? 720p/16bpp@60Hz requires 1280*720*16*60 = 843.75 Mbit/s of memory bandwidth, and you need at least double that (so you can write the next frame as you read out current one), so this would consume over 60% of "ideal" bandwidth of DDRx8@166 MHz, so memory controller has to be quite efficient with bursts and taking advantage of multiple banks. Very impressive stuff!

As for price - I'm not a volume manufacturer, so things like "90 cent parts" don't impress me at all. I'd rather use $30 FPGA and $10 DDR3L which does all that I need with the least amount of hassle, than save a few morning coffees' worth of parts at expense of massive headaches to get it all to work. For example, instead of implementing my own DMA bus master to read a framebuffer from external DDR3, I use a bunch of Xilinx's free IPs which do all of that. They are most certainly overkill for this job because they have a ton of features I don't use, but I can afford such waste because I get my working design much faster, so I can focus on actual project-specific logic and programming instead of spending time trying to get basic infrastructure stuff to work.

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8031
  • Country: ca
Re: FPGA "soft" serializer
« Reply #17 on: November 28, 2019, 05:54:02 pm »
Gate count isn't the problem, it IO and ram.  I only need 2k logic cells for a full display controller with stereo 16 channel sound, say something like a Lattice 'LCMXO3L-4300E-5UWG81CTR1K' (63 IOs) which has 4.3k logic cells at 3.54$ and a 90cent 64mb DDR sdram and I can give you a complete head's up display/controller or cheap 8bit/16bit true-color 720p gaming system with sound at more than double the level of a Amiga home computer, using the remaining gates for a soft-core CPU.
I would love to see a source code of a DDR controller with HDMI out DMA which take less than 2k cells. Do you happen to have it handy? 720p/16bpp@60Hz requires 1280*720*16*60 = 843.75 Mbit/s of memory bandwidth, and you need at least double that (so you can write the next frame as you read out current one), so this would consume over 60% of "ideal" bandwidth of DDRx8@166 MHz, so memory controller has to be quite efficient with bursts and taking advantage of multiple banks. Very impressive stuff!

Ok, ram controller, + 4 read/write port DMA controller, with automatic continuous page burst, set to 16bit DDR ram, 32 bit internal bus.  See resource sections in green: (Wrote it myself)
880514-0

Total = 1300 logic cells, 8192 bits or internal memory.
2 page cache for rendering display rendering (most optimum burst my controller can do): 256*32*2 = 16384 bits

LCMXO3L-4300E-5UWG81CTR1K  (which has a built in configuration rom) We would still have 3020 free logic cells and 69632 bits of free ram.  (3.78$ for 25 at digikey)

We would use x16 memory at 200MHz, or 400mtps 16 bit words.  To drive the display at 16 bit per pixel, we will eat 75m of the remaining 300m assuming we loose 100m in overhead.  That leaves us more than 200m transfers of 16bit data per second free.  Remember, the Amiga had a 7MHz CPU and ran the 16bit display ram at 14mhz...

A soft DVI transmitter, with encoding logic would take another 100 logic cells while we would need 250 for a display controller and 500 logic cells plus 8192 bits ram for a simple but functional audio controller and 500 logic cells and another 8192 bits of ram for simple accelerated graphics drawing and blitting command sub-processor.

We are now down to 1670 free logic cells. and still have 53248 bits of free ram.  Maybe a simple 200 or 400 Mhz softcore CPU as my ram controller would still have a free R/W DMA port.

Now, I'm not sure what people could want out of all this, and such simple functions can be replaced with an all in 1 cpu with a graphics HDMI output port, but, this wasn't the purpose of this thread.


Note the my ram controller has 1 weakness, when writing and reading bytes, it's smallest burst size is 2.  Though written at the turn of the century way back beginning with Cyclone I FPGAs, with better tools and personal knowledge today, I know there is huge room for improvement.  It's design was routed around video frame buffers which is why it was designed to read up to a full page as needed + the memory address were weaved among the address rows while the smart refresh counter/sequencer in the 'sdram_arbiter' module would know if any refresh cycles were needed anywhere.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2782
  • Country: ca
Re: FPGA "soft" serializer
« Reply #18 on: November 28, 2019, 06:21:30 pm »
Ok, ram controller, + 4 read/write port DMA controller, with automatic continuous page burst, set to 16bit DDR ram, 32 bit internal bus.  See resource sections in green: (Wrote it myself)
(Attachment Link)
Total = 1300 logic cells, 8192 bits or internal memory.
2 page cache for rendering display rendering (most optimum burst my controller can do): 256*32*2 = 16384 bits
Do you mind sharing source code? You can send it to me via PM if you don't want to share it publicly. I feel like I will learn a lot from it!

LCMXO3L-4300E-5UWG81CTR1K  (which has a built in configuration rom) We would still have 3020 free logic cells and 69632 bits of free ram.  (3.78$ for 25 at digikey)
Except that this chip requires $1000+ HDI PCB ;D I can buy one heck of FPGA for that money ;D

We are now down to 1670 free logic cells. and still have 53248 bits of free ram.  Maybe a simple 200 or 400 Mhz softcore CPU as my ram controller would still have a free R/W DMA port.
That's another thing I would love to see a source code for. I'm yet to see a 200 MHz - let alone 400 MHz - pipelined softcore even for faster Artix-7 fabric.

Note the my ram controller has 1 weakness, when writing and reading bytes, it's smallest burst size is 2.  Though written at the turn of the century way back beginning with Cyclone I FPGAs, with better tools and personal knowledge today, I know there is huge room for improvement.  It's design was routed around video frame buffers which is why it was designed to read up to a full page as needed + the memory address were weaved among the address rows while the smart refresh counter/sequencer in the 'sdram_arbiter' module would know if any refresh cycles were needed anywhere.
That's fine, specialized hardware is what FPGAs are all about.

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8031
  • Country: ca
Re: FPGA "soft" serializer
« Reply #19 on: November 28, 2019, 08:15:23 pm »
For the cheap 4 layer 25$ PCBs, the finest part I would attempt is this guy:

LFE5U-12F-6BG256C,  0.8mm 256 ball.  12k logic cells, 589kbit ram., 5.82$ for 25 at Arrow.

As for my ram controller, ok, to give you an idea of what you are in for...
This is the top of the sequencer module connected to the IO pins:
880656-0

This is the inside:
880660-1

Now, as you can see, this connects to a 128 bit, 2x64 bit sodimm modules, which means inside the FPGA, an old C8 CycloneIII, I have a 256 bit buss.  The code inside the sequencer:

Code: [Select]
See attached code

Uses a memory block which hold a sequence which counts & sets the ram controls in 4 different pages.  Now I can give out the .mif, but editing anything is a binary job which would drive one nuts.  Also, I have 3 clocks coming from the PLL.  System clock CLK, DDR_CLK which is 11 degrees out of phase of the system clock, and  DQS_CLK which is 33 degrees out of phase of the system clock. 

Now if you read above about the Quartus system assignment 'fast output registers', which also exists fast input and fast OE registers, which in all my designs are set to all IOs.  And to prevent an FMAX due to IO pin and logic routing on the FPGA, all my IO have an additional D-Flipflop clocked stage meaning it's like my designs are cocooned inside a square with simple D-Flipflops all around the outer edges with those flipflop clocks tied exclusively to the PLL making every IO have the best possible timing, like within +/-50ps of each other throughout the entire FPGA.  (Yes, with JFet amplified probes and a GND spring clip, on my 1GHz BW scope, every IO pin on my memory buss or anywhere else was a clean edges and timed as a whistle.)

This still isn't all, there is the sdram_arbiter, which controls 8 read/write ports, refresh counter, and steers the write memory mux logic and read memory destination which is piped through the SDRam sequencer in the extended 16 bit address above the first 4gb address range:
880664-2

And, here is what it look like to 'MANUALLY' program a round-robin arbiter priority encoder for selecting which read/write DMA memory channel goes next:

Code: [Select]
See attached code

Now, if you want to investigate doing a DDR/DDR2/DDR3 ram controller in verilog, this time around better design, maybe a forum thread and expect to do it in 200 logic cells + number of bits of the connected ram IC.  However, there will be 2 items I would still need to lean on external to the verilog core.  The PLL core config and the pin driving 'DDR-IO' function.  I would stay away from special advanced features like trying to gain the half read clock cycle feature to gain that upper step and I would ignore the DQS on the reads as I would fix the CAS read latency setting.

Such a simple controller should easily achieve 200Mhz clock, meaning 400 mtps, or in the case of modern FPGAs, your looking at double.
« Last Edit: November 28, 2019, 09:54:28 pm by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8031
  • Country: ca
Re: FPGA "soft" serializer
« Reply #20 on: November 28, 2019, 08:58:40 pm »
The trick with reads and writes, pipe your address through you ram controller, and also, for that address, have an additional dummy through address channel.  Nothing but a d-flipflop clocked chain so that when sending a read, that piped through chain comes out at the other end in phase with the valid data.  You use that separate address channel data to drive the latches in the destination logic.  Once you have something like that built into you ram sequencing controller, setting that multiple read channel address when posting a bunch of reads from a multiplex of read address inputs will end up in the correct destination as the read data comes in without pausing the ram controller as you wait for your mux to sort out the next read and have it wait until the read is finished before posting the next read.  This allows you to post non-stop sequential reads.

Worked like a charm as the clips you see from my project ran 5 read channels and 2 write channels, all in parallel with that 256 bit internal buss driving 6x 1080p studio grade 444 30 bit color video streams all in parallel.

« Last Edit: November 28, 2019, 09:14:06 pm by BrianHG »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf