Author Topic: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.  (Read 92104 times)

0 Members and 1 Guest are viewing this topic.

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #275 on: September 11, 2022, 10:32:00 pm »
Not really understanding this error...

ERROR (CV0013) : Pin(ddr_dq[0]) of 'gowin_DQ_bus[0].gowin_dq_tbuf_inst'(TBUF) does not connect to port

Given the code:
Code: [Select]
             wire gowin_dq_in;
             wire gowin_dq_out;
             wire gowin_dq_oe;
             wire gowin_ibuf_in;


            ODDR gowin_dq_oddr_inst 
                (
                .Q0(gowin_dq_out),                  // 2x SDR -> DDR
                .Q1(gowin_dq_oe),                   // in-phase output enable
                .D0(PIN_WDATA_PIPE_h[0][x]),        // Input data [SDR]
                .D1(PIN_WDATA_PIPE_l[0][x]),        // Input data [SDR]
                .TX(PIN_OE_WDQ_wide[x]),            // Input 'output enable' 
                .CLK(DDR_CLK_WDQ)                   // write clock
                );

            TBUF gowin_dq_tbuf_inst
                (
                .O(DDR3_DQ[x]),                      // TBUF -> pad
                .I(gowin_dq_out),                    // ODDR -> TBUF
                .OEN(~gowin_dq_oe)                   // input when 1'b1
                );

             IBUF gowin_dq_extra_ibuf
                (
                .I(DDR3_DQ[x]),
                .O(gowin_ibuf_in)
                );

            IDDR gowin_dq_iddr_inst 
                (
                .Q0(RDQ_l[x]),                      // SDR to app #0
                .Q1(RDQ_h[x]),                      // SDR to app #1
                .D(gowin_ibuf_in),                  // DDR input signal
                .CLK(DDR_CLK_RDQ)                   // read clock
                );

- I have stared at this for the last 5 minutes, and it certainly seems to be linked up correctly.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7852
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #276 on: September 11, 2022, 10:48:47 pm »
That would make me scratch my head too.
What happens if you bypass the IBUF.

I looked at the Gowin data sheet you linked to last week.
Is this all you get?
I even read the DDR memory interface modules.  With such lack of description, never mind figuring out how they work, but, how are you supposed to work out the wiring between them?

They have the natural clock and a DQS clock.  Where is this supposed to be wired from/to?  What kind of buffer?  How do you generate the DQS output clock, my way or some other way?  What does the addressing do and why do you have or use them?  At least, Altera's 'mem_io_phy' explains this in a 20 page document going over every feature and how & where you can wire the IO and what the waveforms do...
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #277 on: September 11, 2022, 10:56:38 pm »
That would make me scratch my head too.
What happens if you bypass the IBUF.

I tried that :) You get the same error, in fact commenting out the input path altogether still gives the same error. Something is wrong with the output connectivity, but I'm not sure I can see *how* let alone where.

I got briefly excited when I commented out the TBUF and made it do:

Code: [Select]
   assign DDR3_DQ[x] = (gowin_dq_oe) ? gowin_dq_out : 1'bz

... which seemed to work, in as much as I got errors about the next section (DQS) instead of this one, but a few re-runs later it was back to complaining about the DQ path. Somehow the seed-value made it evaluate DQS before DQ for a few goes, I think. Yes, I know that messes with the timing as well, but we're at the point of "kitchen sink" time here...

I looked at the Gowin data sheet you linked to last week.
Is this all you get?
I even read the DDR memory interface modules.  With such lack of description, never mind figuring out how they work, but, how are you supposed to work out the wiring between them?

They have the natural clock and a DQS clock.  Where is this supposed to be wired from/to?  What kind of buffer?  How do you generate the DQS output clock, my way or some other way?  What does the addressing do and why do you have or use them?  At least, Altera's 'mem_io_phy' explains this in a 20 page document going over every feature and how & where you can wire the IO and what the waveforms do...

Yeah. It's not ... exhaustively ... documented.

At this point I think I'll ping the FAE and see if there's any answers forthcoming. I have a feeling they're going to day "use our provided IP if you want to talk to a DDR3 chip" but we'll see.
« Last Edit: September 11, 2022, 10:58:32 pm by SpacedCowboy »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7852
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #278 on: September 12, 2022, 01:34:11 am »
Here is an absolutely stupid idea:

Code: [Select]
assign dqs_clk[x] = (OE_DQS[x]) ? DDR_CLK : DDR_CLK_RDQ ;
Now, use ' dqs_clk[ x ] ' as a single clock for both the IDDR and ODDR together.

I have an extra cycle clearance in the OE as the DDR3 has a large minimum bus-turn-around cycle time and we have room for 1/2 additional clock turn-on cycle.

So long as the IO buffer's source clock selection hardware is tied to logic whether you manually hard selected it or not, this might have 0 impact on design performance.  If Gowin manually manipulates the PLL to use specific on-chip routing for the DDR, then this will not work.
« Last Edit: September 12, 2022, 01:38:38 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #279 on: September 12, 2022, 02:54:08 am »
Here is an absolutely stupid idea:

Code: [Select]
assign dqs_clk[x] = (OE_DQS[x]) ? DDR_CLK : DDR_CLK_RDQ ;
Now, use ' dqs_clk[ x ] ' as a single clock for both the IDDR and ODDR together.

I have an extra cycle clearance in the OE as the DDR3 has a large minimum bus-turn-around cycle time and we have room for 1/2 additional clock turn-on cycle.

So long as the IO buffer's source clock selection hardware is tied to logic whether you manually hard selected it or not, this might have 0 impact on design performance.  If Gowin manually manipulates the PLL to use specific on-chip routing for the DDR, then this will not work.

So that worked for synthesis (!) I haven't checked simulation yet, but using ...

Code: [Select]
    for (x=0; x<DDR3_WIDTH_DQS; x = x + 1)
        begin : Gowin_DQ_Strobes

             wire gowin_dqs_out;                 // Internal: ODDR->IOBUF
             wire gowin_dqs_in;                  // Internal: IOBUF->IDDR
             wire gowin_dqs_tx;                  // Internal: OE on input to ODDR

             assign dqs_clk[x] = (OE_DQS[x]) ? DDR_CLK : DDR_CLK_RDQ;

             ODDR gowin_dqs_oddr_inst 
                (
                .Q0(gowin_dqs_out),             // ODDR -> LVDS
                .Q1(gowin_dqs_tx),              // 1'b0 => output
                .D0(1'b0),                      // Input data [SDR]
                .D1(1'b1),                      // Input data [SDR]
                .TX(~OE_DQS[x]),                // Input 'output enable' 0=output
                .CLK(dqs_clk[x])                 // DDR clock
                );
 
             IDDR gowin_dqs_iddr_inst 
                (
                .Q0(RDQS_pl[x]),                // SDR to app #0
                .Q1(RDQS_ph[x]),                // SDR to app #1
                .D(gowin_dqs_in),               // DDR input signal
                .CLK(dqs_clk[x]) // read clock
                );

            TLVDS_IOBUF gowin_dqs_lvds_iobuf_inst
                (
                .O(gowin_dqs_in),               // LVDS -> IDDR
                .IO(DDR3_DQS_p[x]),             // +ve LVDS pad
                .IOB(DDR3_DQS_n[x]),            // -ve LVDS pad
                .I(gowin_dqs_out),              // ODDR -> LVDS
                .OEN(gowin_dqs_tx)              // input when 1'b1
                );
           
             assign RDQS_nl[x] = ~RDQS_pl[x];
             assign RDQS_nh[x] = ~RDQS_ph[x];
        end

I get 1 odd warning I hadn't expected:

Code: [Select]
WARN  (NL0001) : Sweep user defined dangling instance "DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/Gowin_DQ_Strobes[0].gowin_dqs_iddr_inst"
but it might be an artifact of me temporarily commenting out the DQ section. Synthesis completes without errors though.

Can we do the same for DQ, in terms of the slack available in the system ?
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7852
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #280 on: September 12, 2022, 03:06:24 am »
Can we do the same for DQ, in terms of the slack available in the system ?

Yes, go right ahead....
DQ and DQS slack is identical.  (Note that I already turn on DQS an additional half-clock early, we may just need to do so for the DQ.)
We just might need to add 1 or 2 settings to my code, but for now, test away.
« Last Edit: September 12, 2022, 03:15:19 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #281 on: September 12, 2022, 04:40:04 pm »
Ok, so doing the 'dynamically switch clock' for both DQ and DQS compiles fine - somehow I'm still getting

Code: [Select]
WARN  (NL0002) : The module "BrianHG_DDR3_GEN_tCK" instantiated to "BHG_DDR3_GEN_tCK" is swept in optimizing
.. but I haven't yet managed to place the RS232 stuff at the top-level, so there's probably at least some reset weirdness going on. With just the BrianHG_DDR3_PLL and BrianHG_DDR3_PHY_SEQ_v16 instantiated, it all compiles for synthesis, and the simulation passes. Today is a busy meetings-day (Mondays, ugh!) so this is as far as it's likely to get until this evening...
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7852
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #282 on: September 12, 2022, 06:57:27 pm »
Ok, so doing the 'dynamically switch clock' for both DQ and DQS compiles fine - somehow I'm still getting

Code: [Select]
WARN  (NL0002) : The module "BrianHG_DDR3_GEN_tCK" instantiated to "BHG_DDR3_GEN_tCK" is swept in optimizing

Hmmm, 'BrianHG_DDR3_GEN_tCK' takes in the DDR3 parameters and clock rate parameters.
It spits out a piles of constants which define how many every type of required delay time should be, in the number of required clock cycles and how to set the MRS control registers.

You can say the DDR3 cannot function without it.

I know that a good 50% of it I never needed to use, but, the rest is absolutely important.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #283 on: September 13, 2022, 02:56:40 pm »
Well, I got "everything" into the project last night, and I'm still seeing a couple of confusing 'sweep' warnings where modules are being removed...

Code: [Select]
GowinSynthesis start
Running parser ...
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_CMD_SEQUENCER_v16.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_GEN_tCK.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_IO_PORT_ALTERA.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_PHY_SEQ_v16.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_PLL.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\ddr3_io_port_gowin.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\gowin_ddr_clocking.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\rs232_DEBUGGER.v'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\sync_rs232_uart.v'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv'
Compiling module 'top'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv":3)
Compiling module 'BrianHG_DDR3_PLL(FPGA_VENDOR="Gowin",FPGA_FAMILY="GW2A-18",CLK_KHZ_IN=27000,CLK_IN_MULT=15,CLK_IN_DIV=1,INTERFACE_SPEED="Quarter")'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_PLL.sv":24)
Compiling module 'gowin_ddr_clocking(CLK_KHZ_IN=27000,CLK_IN_MULT=15,CLK_IN_DIV=1)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\gowin_ddr_clocking.sv":25)
Compiling module 'BrianHG_DDR3_PHY_SEQ_v16(FPGA_VENDOR="Gowin",FPGA_FAMILY="GW2A-18",BHG_OPTIMIZE_SPEED=1'b1,BHG_EXTRA_SPEED=1'b1,CLK_KHZ_IN=27000,CLK_IN_MULT=15,CLK_IN_DIV=1,INTERFACE_SPEED="Quarter",DDR3_CK_MHZ=405,DDR3_SPEED_GRADE="-125",DDR3_SIZE_GB=1,DDR3_NUM_CK=1,DDR3_WIDTH_ADDR=13,DDR3_WIDTH_DM=2,DDR3_WIDTH_DQS=2,DDR3_MAX_REF_QUEUE=5'b01000,IDLE_TIME_uSx10=8'b00000010,SKIP_PUP_TIMER=1'b0,PORT_VECTOR_SIZE=5,PORT_ADDR_SIZE=27,USE_TOGGLE_CONTROLS=1'b1)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_PHY_SEQ_v16.sv":53)
Compiling module 'BrianHG_DDR3_GEN_tCK(DDR3_CK_MHZ=405,DDR3_SPEED_GRADE="-125")'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_GEN_tCK.sv":54)
Compiling module 'DDR3_IO_PORT_GOWIN(FPGA_VENDOR="Gowin",BHG_EXTRA_SPEED=1'b1,CLK_KHZ_IN=27000,CLK_IN_MULT=15,CLK_IN_DIV=1,DDR3_NUM_CK=1,DDR3_WIDTH_ADDR=13,DDR3_WIDTH_DM=2,DDR3_WIDTH_DQS=2,DDR3_RWDQ_BITS=128,CMD_ADD_DLY=1'b1)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\ddr3_io_port_gowin.sv":24)
Extracting RAM for identifier 'PIN_OE_WDQ'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\ddr3_io_port_gowin.sv":184)
Compiling module 'BrianHG_DDR3_CMD_SEQUENCER_v16(USE_TOGGLE_ENA=1'b1,USE_TOGGLE_OUT=1'b0,DDR3_WIDTH_ROW=13,DDR3_RWDQ_BITS=128,PORT_VECTOR_SIZE=5,BHG_EXTRA_SPEED=1'b1)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_CMD_SEQUENCER_v16.sv":39)
Extracting RAM for identifier 'bank_row_mem'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_CMD_SEQUENCER_v16.sv":117)
Extracting RAM for identifier 'vector_pipe_mem'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_CMD_SEQUENCER_v16.sv":180)
Compiling module 'DDR3_CMD_ENCODE_BYTE(addr_size=5)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv":571)
WARN  (EX3670) : Actual bit length 8 differs from formal bit length 128 for port 'data_in'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv":393)
WARN  (EX3670) : Actual bit length 1 differs from formal bit length 16 for port 'mask_in'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv":394)
Compiling module 'DDR3_CMD_DECODE_BYTE(addr_size=5)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv":625)
WARN  (EX3670) : Actual bit length 8 differs from formal bit length 128 for port 'data_out'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv":413)
WARN  (EX3073) : Port 'rx_sample_pulse' remains unconnected for this instance("C:\Users\simon\Documents\verilog\ddr3-gowin\src\rs232_DEBUGGER.v":221)
Compiling module 'rs232_debugger(CLK_IN_HZ=101250000,ADDR_SIZE=24,READ_REQ_1CLK=1)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\rs232_DEBUGGER.v":30)
Compiling module 'sync_rs232_uart(CLK_IN_HZ=101250000)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\sync_rs232_uart.v":15)
NOTE  (EX0101) : Current top module is "top"
WARN  (EX0211) : The output port "phase_done" of module "BrianHG_DDR3_PLL(FPGA_VENDOR="Gowin",FPGA_FAMILY="GW2A-18",CLK_KHZ_IN=27000,CLK_IN_MULT=15,CLK_IN_DIV=1,INTERFACE_SPEED="Quarter")" has no driver("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_PLL.sv":69)
WARN  (NL0001) : Sweep user defined dangling instance "DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/Gowin_DQ_Strobes[1].gowin_dqs_iddr_inst"("C:\Users\simon\Documents\verilog\ddr3-gowin\src\ddr3_io_port_gowin.sv":452)
[5%] Running netlist conversion ...
Running device independent optimization ...
[10%] Optimizing Phase 0 completed
[15%] Optimizing Phase 1 completed
[25%] Optimizing Phase 2 completed
Running inference ...
[30%] Inferring Phase 0 completed
[40%] Inferring Phase 1 completed
[50%] Inferring Phase 2 completed
[55%] Inferring Phase 3 completed
Running technical mapping ...
[60%] Tech-Mapping Phase 0 completed
[65%] Tech-Mapping Phase 1 completed
[75%] Tech-Mapping Phase 2 completed
[80%] Tech-Mapping Phase 3 completed
[90%] Tech-Mapping Phase 4 completed
WARN  (NL0002) : The module "BrianHG_DDR3_GEN_tCK" instantiated to "BHG_DDR3_GEN_tCK" is swept in optimizing("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_PHY_SEQ_v16.sv":336)
[95%] Generate netlist file "C:\Users\simon\Documents\verilog\ddr3-gowin\impl\gwsynthesis\ddr3-gowin.vg" completed
[100%] Generate report file "C:\Users\simon\Documents\verilog\ddr3-gowin\impl\gwsynthesis\ddr3-gowin_syn.rpt.html" completed
GowinSynthesis finish


Given that it simulates correctly, I expect there's a signal I missed somewhere that's the snowball that starts the avalanche of removal. Now begins the process of going through the changes and trying to figure out where I went wrong...

There's a *small* voice in the back of my mind wondering if the NL002-type warning (after tech-mapping) is because it realized it could reduce the module down to constants at compile-time, but that's probably wishful thinking. I have no idea yet why the IDDR of only DQS[1] ought to be removed...
 
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7852
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #284 on: September 13, 2022, 06:19:00 pm »
There's a *small* voice in the back of my mind wondering if the NL002-type warning (after tech-mapping) is because it realized it could reduce the module down to constants at compile-time, but that's probably wishful thinking. I have no idea yet why the IDDR of only DQS[1] ought to be removed...
Do not worry about the DQS[1] as I only use one of the source DQSs to verify read phase.  Everything else is tuned off of the DQ port.

Yes, the module "BrianHG_DDR3_GEN_tCK" is the only real surprise for me.


Did you get an FMAX reading?
How about a logic cell usage count?
« Last Edit: September 13, 2022, 06:21:40 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #285 on: September 13, 2022, 07:21:44 pm »
I'll spend some more time this evening trying to track down what I've screwed up - it wasn't anything too obvious (at least to my eyes) because I spent a fair amount of time last night looking as well without finding it. What's being passed into BrianHG_DDR3_PHY_SEQ_v16 seems ... reasonable.

I'm cautious about stats until I know it's all being synthesized, but the current resource usage is below. The fMax is a bit disappointing, I think - also below, but bear in mind this is just push-button synthesis.

It's complaining about a lack of timing paths for some things, and it certainly isn't properly constrained yet - of all the black magic within the realm of FPGAs, clock constraints are the most opaque to me, in fact the only timing constraints being applied are those inferred by the clock-rates in the PLL instantiations.

I stuck with a 'multiple of the base clock' as you recommended, and the nearest to 400MHz was 405 from the 27MHz base clock, but it didn't get even close to that as you can see.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7852
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #286 on: September 13, 2022, 08:09:25 pm »
Try these PLL settings:

Code: [Select]
// ****************  System clock generation and operation.
parameter int        CLK_KHZ_IN              = 27000,            // PLL source input clock frequency in KHz.
parameter int        CLK_IN_MULT             = 24,               // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int        CLK_IN_DIV              = 2,                // Divide factor.  When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.
parameter int        DDR_TRICK_MTPS_CAP      = 0,              // 0=off, Set a false PLL DDR data rate for the compiler to allow FPGA overclocking.  ***DO NOT USE.

parameter string     INTERFACE_SPEED         = "Quarter",        // Either "Full", "Half", or "Quarter" speed for the user interface clock.
                                                                 // This will effect the controller's interface CMD_CLK output port frequency.

And then try 'CLK_IN_MULT  = 22'.

Your register count seems a little low, did you include the 'RS232 debugger'.   However, this may just be since you are using a smaller DDR3 ram chip.

Exactly which chip are you using?
For example, an Altera Cyclone/Max with a -8 suffix can only do ~300MHz.  An Altera Cyclone/Max with a -6 suffix can do ~400MHz.  This doesn't mean I cant cheat and overclock or use a -8 as a -6 with a good heatsink, but these 300/400MHz are the official plain vanilla setup.

Yes, compiler effort to achieve a good FMAX comes with the generation of a proper .sdc file.
If Gowin supports the Synopsis Design Constraints file syntax, then is should match mine with the exception of the primary PLL clock names and the optimum output delay timing values I have chosen.

(Note that your designed has inferred some logic registers as SSRAM16, if this has been done in the IO port section, forcing Gowing to use logic via appropriate attribute may remove these speed bottlenecks as logic should be much faster than memory blocks.)
« Last Edit: September 13, 2022, 08:17:12 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #287 on: September 13, 2022, 09:54:40 pm »
So, attached are the mult=24 and mult=22 results for fMax. It did better, but still didn't manage to get the 50% clock, in both cases.

I did include the rs232 debugger - my current code is basically a "gowin-ified clone" of your BrianHG_DDR3_DECA_PHY_SEQ_only_v16 folder.

The chip is a GW2A-LV18PG256C8/I7 so not quite their fastest (6=slowest, 9=fastest). I'm really not sure how to interpret the figures for the internal measurements in 'speed-grades' below (the column is entitled 'speed grade' but has no distinction between grade!), but there do seem to be some distinctions for external switching...

The SDC file format is "supported" but virtually none of your directives are actually legal syntax. The only legal syntax in the .sdc file is:
  • create_clock
  • create_generated_clock
  • set_clock_latency
  • set_clock_uncertainty
  • set_clock_groups
  • set_input_delay
  • set_output_delay
  • set_max_delay/ set_min_delay
  • set_false_path
  • set_multicycle_path
  • report_timing
  • report_high_fanout_nets
  • report_route_congestion
  • report_min_pulse_width
  • report_max_frequency
  • report_exceptions

There's no variables, no derive_*... I have a clock named 'DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/dq_clk[0]' so I tried converting

Code: [Select]
set_input_delay  -clock [get_clocks {*DDR3_PLL5*clk[2]}] -max -add_delay $tSU [get_ports {DDR3_DQ*[*]}]

... to ...
Code: [Select]
set_input_delay  -clock [get_clocks {*dq_clk[0]}] -max -add_delay 0.5 [get_ports {DDR3_DQ*[*]}]

but it rejects it with:
ERROR  (TA2003) : "ddr3-gowin.sdc":13 | Can't set timing constraint to object

As for SSRAM16's, according to the synthesis log, there are 3 places where it happens:
  • Extracting RAM for identifier 'PIN_OE_WDQ'("ddr3_io_port_gowin.sv":184)
  • Extracting RAM for identifier 'bank_row_mem'("BrianHG_DDR3_CMD_SEQUENCER_v16.sv":117)
  • Extracting RAM for identifier 'vector_pipe_mem'("BrianHG_DDR3_CMD_SEQUENCER_v16.sv":180)

I don't see any obvious way in the GowinSynthesis User Guide to tell it *not* to infer a RAM - just ways to help it infer one. I tried marking the PIN_OE_WDQ logic as /*synthesis syn_keep=1 */, hoping that might be a hint that I don't want it changed, but it made no difference.

Got a meeting to attend, so can't take any more time right now, I'll have a look this evening...
« Last Edit: September 13, 2022, 09:58:16 pm by SpacedCowboy »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #288 on: September 13, 2022, 09:55:45 pm »
Gah. Forgot the attachments, and you can't 'edit' them in :)
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7852
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #289 on: September 13, 2022, 10:16:33 pm »
Code: [Select]
set_input_delay  -clock [get_clocks {*DDR3_PLL5*clk[2]}] -max -add_delay $tSU [get_ports {DDR3_DQ*[*]}]

... to ...
Code: [Select]
set_input_delay  -clock [get_clocks {*dq_clk[0]}] -max -add_delay 0.5 [get_ports {DDR3_DQ*[*]}]

but it rejects it with:
ERROR  (TA2003) : "ddr3-gowin.sdc":13 | Can't set timing constraint to object


The '-clock [get_clocks {*dq_clk[0]}]' needs to be a source clock name, not a net name.

What you want is something along the line:
Code: [Select]
set_input_delay  -clock [get_clocks {*ddr3_pll1/CLKOUTP*}] -max -add_delay $tSU [get_ports {DDR3_DQ*[*]}]

If Gowin wont allow the wild card for the source clock, then you will need to spell it all out.  Even in Quartus, this is a hassle as well.  Derive clocks is an Altera thing, you may need to see how Gowin labels their PLL clocks.  However, variables should be supported.  Otherwise, defining a group of IO to the same figure would be an absolute hassle when you need to change a specification globally.  Doesn't Gowin provide example DDR .sdc files to look at?

Can I see the slack report for the failed 1/2 freq ddr3_pll1/CLKOUTD clock as well.  We are close and if you are using a -7 Gowin, 324Mhz may be close to the max, maybe 351MHz may be achievable.  You can try compiling for a -9 to see what happens.

Another help may be to set a bidirectional 'falsepath' between '*ddr3_pll1/CLKOUTP*' and '*ddr3_pll2/CLKOUTP*'.  Remember we have a clock switch between these 2 on the DQ DDRIO, however, they actually never talk to each other.  The falsepath should help timing as the compiler will no longer connect the 2 clocks when optimizing the design, but for now, the '*ddr3_pll1/CLKOUTD*' is your bottleneck.
« Last Edit: September 13, 2022, 10:21:22 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #290 on: September 14, 2022, 01:00:26 am »
Ok, so I have some progress here. I sort of muddled my way through converting over your constraints file...

Code: [Select]
#**************************************************************
# Input clock from the board
#**************************************************************
create_clock -name clk -period 37.037 -waveform {0 18.518} [get_ports {clk}]

#**************************************************************
# Create Generated Clocks
#**************************************************************
create_generated_clock -name clk_ddr3 -source [get_ports {clk}] -master_clock clk -divide_by 2 -multiply_by 22 -duty_cycle 50 [get_pins {BHG_DDR3_PLL/gowin_ddr_clocks/ddr3_pll1/CLKOUT}]
create_generated_clock -name clk_ddr3_rd -source [get_ports {clk}] -master_clock clk -divide_by 2 -multiply_by 22 -duty_cycle 50 -phase 90 [get_pins {BHG_DDR3_PLL/gowin_ddr_clocks/ddr3_pll1/CLKOUTP}]
create_generated_clock -name clk_ddr3_50 -source [get_ports {clk}] -master_clock clk -divide_by 4 -multiply_by 22 -duty_cycle 50 [get_pins {BHG_DDR3_PLL/gowin_ddr_clocks/ddr3_pll1/CLKOUTD}]

create_generated_clock -name clk_ddr3_wr -source [get_ports {clk}] -master_clock clk -divide_by 2 -multiply_by 22 -duty_cycle 50 -phase 270 [get_pins {BHG_DDR3_PLL/gowin_ddr_clocks/ddr3_pll2/CLKOUTP}]
#create_generated_clock -name clk_ddr3_25 -source [get_ports {clk}] -master_clock clk -divide_by 8 -multiply_by 22 -duty_cycle 50 [get_pins {BHG_DDR3_PLL/gowin_ddr_clocks/ddr3_pll2/CLKOUTD}]

#**************************************************************
# Set Input Delay
#**************************************************************

# tSU = 0.5
set_input_delay -clock clk_ddr3_rd -max -add_delay             0.500 [get_ports {ddr_dq*[*]}]
set_input_delay -clock clk_ddr3_rd -max -add_delay -clock_fall 0.500 [get_ports {ddr_dq*[*]}]

set_input_delay -clock clk_ddr3_50 -max 0.5  [get_ports {uart_rxd}]

# tH  = 2.0
set_input_delay -clock clk_ddr3_rd -min -add_delay             2.000 [get_ports {ddr_dq*[*]}]
set_input_delay -clock clk_ddr3_rd -min -add_delay -clock_fall 2.000 [get_ports {ddr_dq*[*]}]

set_input_delay -clock clk_ddr3_50 -min 2.000  [get_ports {uart_rxd}]


#**************************************************************
# Set Output Delay
#**************************************************************

# tCO = -7.5 (?)
set_output_delay -clock clk_ddr3 -max -add_delay             -7.5 [get_ports {ddr*}]
set_output_delay -clock clk_ddr3 -max -add_delay -clock_fall -7.5 [get_ports {ddr*}]

set_output_delay -clock clk_ddr3_50 -max -7.5 [get_ports {led[*]}]
set_output_delay -clock clk_ddr3_50 -max -7.5 [get_ports {uart_txd}]

# tCOm = -3.8 (?)
set_output_delay -clock clk_ddr3 -min -add_delay             -3.8 [get_ports {ddr*}]
set_output_delay -clock clk_ddr3 -min -add_delay -clock_fall -3.8 [get_ports {ddr*}]

set_output_delay -clock clk_ddr3_50 -min -3.8 [get_ports {led[*]}]
set_output_delay -clock clk_ddr3_50 -min -3.8 [get_ports {uart_txd}]


#**************************************************************
# Set False Path
#**************************************************************
set_false_path -from [get_clocks {clk_ddr3_rd}] -to  [get_clocks {clk_ddr3_wr}]
set_false_path -from [get_clocks {clk_ddr3_wr}] -to  [get_clocks {clk_ddr3_rd}]

#**************************************************************
# Report more timing errors (default is 25)
#**************************************************************
report_timing -setup -max_paths 100 -max_common_paths 1

Most of these numbers are simply transcribed from your .sdc, I took a guess at 90 degrees for the read-clock phase, the test bench seems to take 3 or 4 passes in the calibration phase before it locks.  The clock-namess seem to have to be those defined by 'create_clock' or 'create_generated_clock', just putting in the expression you use (abbreviated or not) to create the clock doesn't work, it seems to want an actual clock name.

With this, and after setting the "try a bit harder" option in the project configuration, I'm reliably seeing fMax as below, so it's hitting the target (just :)) and it looks like I could possibly squeeze 300 MHz out of it, but I am getting setup and hold violations still, as below. Not sure if those are because my numbers are bogus or what...

FWIW, if I claim to have a C9/I8 part, then it still won't push much higher - I tried with a clock multiplier of 26 in both top.sv and the .sdc (for a 350/175/87.5 clock setup) and the 50% clock only managed 157 MHz (not 175). That's not much over the C8/I7 part's 150-and-change.

I just figured out how to get the timing report based on the clock, so I've attached that as well :)
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7852
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #291 on: September 14, 2022, 01:16:37 am »
Read clock must be 0 degrees, otherwise massive optimization can never be done properly.  You will be stuck with many hold or setup violations.

My code had a 4 clock window between the RD clock and the main DDR clock.  It had the cycle timing designed to center the read latch in the middle +/-1 clock bypassing any metastability issues.

However, you must specify the correct write clock phase.

Also, don't forget setting the false path between the read and write clock domains.
Also setting false paths between you 27MHz clock in and the clk_25 and clk_50 domains will also help improve FMAX.

« Last Edit: September 14, 2022, 01:19:16 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #292 on: September 14, 2022, 03:09:09 am »
Read clock must be 0 degrees, otherwise massive optimization can never be done properly.  You will be stuck with many hold or setup violations.

My code had a 4 clock window between the RD clock and the main DDR clock.  It had the cycle timing designed to center the read latch in the middle +/-1 clock bypassing any metastability issues.

However, you must specify the correct write clock phase.

Gotcha. Changed.

Also, don't forget setting the false path between the read and write clock domains.
Also setting false paths between you 27MHz clock in and the clk_25 and clk_50 domains will also help improve FMAX.

Yep, I'd done the first of those - it was towards the bottom of the settings above though, so not easily visible :) Added both directions for clk<->ddr3_clk_25 and clk<->ddr3_clk_50 as well. Together with the phase=0 on the read clock, that was enough to boost the timings to the attached.

The vast majority of the setup violations are on the read-clock, and I think what you're saying is that the clock-centering will mean that those violations aren't really going to matter. What about the write ones ?
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7852
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #293 on: September 14, 2022, 03:20:59 am »
The report you are showing me are violations between the write clock and the IO buffer.
This is not like due to the 270deg phase, but all the set_output_delays in the .sdc file.

I have a specific tweak in those values which allows Altera's Quartus the best grace when fitting the design based on the IO cells of their FPGA's.

Test step 1, comment out all the set_output_delays and see what happens.

Step 2, determine the best tsu and hold values for Gowin's DDR outputs.  Since in my design, everything about the DDR3 is source generated from the FPGA, our goal is the most comfortable tsu and hold which will lie on the buffer's natural timing.  Since I have a multi-level pipe to each DDR buffer, we should be able to set a nice tight figure allowing 0 slack violations, which also allows for a better FMAX in the report as well.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #294 on: September 14, 2022, 03:42:35 am »
So what happens after commenting out all the set_output_delay lines appears to be "absolutely nothing". The timings table for the clk_ddr3_wr clock is identical from top to bottom with or without the set_output_delay. I cleared out the previous-run data first, just to make sure it wasn't stale.

Changing the phase of the clock (just to test it) did make some small difference - the worst path slack went from -2.496 to -2.6...

[edit: however, all the hold violations disappeared when I took out the set_output_delay statements]
« Last Edit: September 14, 2022, 03:44:42 am by SpacedCowboy »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7852
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #295 on: September 14, 2022, 03:49:27 am »

[edit: however, all the hold violations disappeared when I took out the set_output_delay statements]

Ok, are you saying the design compiled with 0 violations?

If so, then we just need to tune the tsu settings.
If so, then try lowering every tsu on every set_output_delay by 2ns and recompile and check the violations slack.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #296 on: September 14, 2022, 03:55:52 am »
Not quite [grin]

I still have all the setup violations - and they're the same figures as above in the image posted. I just have no hold violations any more, now there's no "set_output_delay" in the .sdc file.

Here's the 1st-path info:

Code: [Select]
Report Command:report_timing -setup -from_clock [get_clocks {clk_ddr3_wr}]

Path1

Path Summary:

Slack -2.496
Data Arrival Time 83.077
Data Required Time 80.582
From DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/PIN_WDATA_PIPE_l[0]_6_s0
To DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[14].gowin_dq_oddr_inst
Launch Clk clk_ddr3_wr:[R]
Latch Clk DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/dq_clk[0]:[R]
Data Arrival Path:

AT DELAY TYPE RF FANOUT LOC NODE
79.966 79.966 active clock edge time
79.966 0.000 clk_ddr3_wr
79.966 0.000 tCL RR 86 PLL_R[0] BHG_DDR3_PLL/gowin_ddr_clocks/ddr3_pll2/CLKOUTP
81.240 1.274 tNET RR 1 R36C30[0][A] DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/PIN_WDATA_PIPE_l[0]_6_s0/CLK
81.472 0.232 tC2Q RF 2 R36C30[0][A] DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/PIN_WDATA_PIPE_l[0]_6_s0/Q
83.077 1.605 tNET FF 1 IOB44[A] DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[14].gowin_dq_oddr_inst/D1
Data Required Path:

AT DELAY TYPE RF FANOUT LOC NODE
80.000 80.000 active clock edge time
80.000 0.000 DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/dq_clk[0]
80.000 0.000 tCL RR 32 R40C27[2][A] DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/dq_clk_0_s0/F
80.782 0.782 tNET RR 1 IOB44[A] DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[14].gowin_dq_oddr_inst/CLK
80.747 -0.035 tUnc DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[14].gowin_dq_oddr_inst
80.582 -0.165 tSu 1 IOB44[A] DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[14].gowin_dq_oddr_inst
Path Statistics:

Clock Skew -0.492
Setup Relationship 0.034
Logic Level 1
Arrival Clock Path Delay cell: 0.000, 0.000%; route: 1.274, 100.000%
Arrival Data Path Delay cell: 0.000, 0.000%; route: 1.605, 87.373%; tC2Q: 0.232, 12.627%
Required Clock Path Delay cell: 0.000, 0.000%; route: 0.782, 100.000%
« Last Edit: September 14, 2022, 03:58:48 am by SpacedCowboy »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7852
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #297 on: September 14, 2022, 04:04:01 am »
Can you test a 90degree WDQ clock?
You need to change your PLL (i forgot if it is auto...), the parameter 'DDR3_WDQ_PHASE' at the top of your top hierarchy, and your .sdc file.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7852
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #298 on: September 14, 2022, 04:17:30 am »
Also find out the output buffer's delay element range and test a write data clock of 0 deg.  If 0 deg really helps and the programmable delay element can be made large enough to shift the write data by 90 deg, then we might have to go that route.
« Last Edit: September 14, 2022, 04:19:59 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #299 on: September 14, 2022, 04:33:41 am »
Yup, will do.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf