EEVblog Electronics Community Forum

Electronics => FPGA => Topic started by: promach on May 26, 2021, 03:57:45 am

Title: DDR3 initialization sequence issue
Post by: promach on May 26, 2021, 03:57:45 am
Could anyone point out if there are issues with this DDR3 initialization sequence (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L883-L1092) ?

Note: The DDR3 RAM memory controller is not working yet so far.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 26, 2021, 05:03:35 am
Simulate your code driving Mircron's DDR3 Verilog Model.  It will tell you every success and violation for every DDR3 command you send.

A successful powerup should look something like this:

Code: [Select]
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.file_io_open: at time                    0 WARNING: no +model_data option specified, using /tmp.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file.0.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file.1.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file.2.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file.3.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file.4.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file.5.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file.6.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.open_bank_file.7.
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /BrianHG_DDR3_PHY_SEQ_tb/sdramddr3_0/#ASSIGN#542 File: ddr3.v Line: 542
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /BrianHG_DDR3_PHY_SEQ_tb/sdramddr3_0/#ASSIGN#543 File: ddr3.v Line: 543
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /BrianHG_DDR3_PHY_SEQ_tb/sdramddr3_0/#ASSIGN#544 File: ddr3.v Line: 544
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001913000.0 ps INFO: Load Mode 2
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001913000.0 ps INFO: Load Mode 2 Partial Array Self Refresh = Bank 0-7
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001913000.0 ps INFO: Load Mode 2 CAS Write Latency =           6
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001913000.0 ps INFO: Load Mode 2 Auto Self Refresh = Disabled
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001913000.0 ps INFO: Load Mode 2 Self Refresh Temperature = Normal
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001913000.0 ps INFO: Load Mode 2 Dynamic ODT = Disabled
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001921000.0 ps INFO: Load Mode 3
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001921000.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001921000.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Disabled
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001929000.0 ps INFO: Load Mode 1
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001929000.0 ps INFO: Load Mode 1 DLL Enable = Enabled
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001929000.0 ps INFO: Load Mode 1 Output Drive Strength =          40 Ohm
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001929000.0 ps INFO: Load Mode 1 ODT Rtt =          40 Ohm
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001929000.0 ps INFO: Load Mode 1 Additive Latency = 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001929000.0 ps INFO: Load Mode 1 Write Levelization = Disabled
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001929000.0 ps INFO: Load Mode 1 TDQS Enable = Disabled
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001929000.0 ps INFO: Load Mode 1 Qoff = Enabled
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001937000.0 ps INFO: Load Mode 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001937000.0 ps INFO: Load Mode 0 Burst Length =  8
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001937000.0 ps INFO: Load Mode 0 Burst Order = Sequential
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001937000.0 ps INFO: Load Mode 0 CAS Latency =           7
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001937000.0 ps INFO: Load Mode 0 DLL Reset = Reset DLL
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001937000.0 ps INFO: Load Mode 0 Write Recovery =           8
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001937000.0 ps INFO: Load Mode 0 Power Down Mode = DLL on
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001961000.0 ps INFO: ZQ        long = 1
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001961000.0 ps INFO: Initialization Sequence is complete
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1002993000.0 ps INFO: Load Mode 3
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1002993000.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1002993000.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Enabled
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1003017000.0 ps INFO: Read      bank 0 col 000, auto precharge 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003030000.0 ps READ @ DQS MultiPurpose Register 0, col = 0,  data = 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003031000.0 ps READ @ DQS MultiPurpose Register 0, col = 1,  data = 1
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003032000.0 ps READ @ DQS MultiPurpose Register 0, col = 2,  data = 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003033000.0 ps READ @ DQS MultiPurpose Register 0, col = 3,  data = 1
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003034000.0 ps READ @ DQS MultiPurpose Register 0, col = 4,  data = 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003035000.0 ps READ @ DQS MultiPurpose Register 0, col = 5,  data = 1
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003036000.0 ps READ @ DQS MultiPurpose Register 0, col = 6,  data = 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003037000.0 ps READ @ DQS MultiPurpose Register 0, col = 7,  data = 1
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1003141000.0 ps INFO: Load Mode 3
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1003141000.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1003141000.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Disabled
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1003253000.0 ps INFO: Activate  bank 7 row 0004
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1003267000.0 ps INFO: Write     bank 7 col 000, auto precharge 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1003275000.0 ps INFO: Write     bank 7 col 010, auto precharge 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 1003275000.0 ps INFO: Sync On Die Termination Rtt_NOM =         40 Ohm
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003280000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000000 data = eeff
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003281000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000001 data = ccdd
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003282000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000002 data = aabb
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003283000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000003 data = 8899
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003284000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000004 data = 6677
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003285000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000005 data = 4455
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003286000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000006 data = 2233
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003287000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000007 data = 0011
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003288000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000010 data = eeff
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003289000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000011 data = ccdd
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003290000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000012 data = aabb
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003291000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000013 data = 8899
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003292000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000014 data = 6677
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003293000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000015 data = 4455
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003294000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000016 data = 2233
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003295000.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 00000017 data = 0011
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 1003297000.0 ps INFO: Sync On Die Termination Rtt_NOM =          0 Ohm
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1003303000.0 ps INFO: Read      bank 7 col 000, auto precharge 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1003311000.0 ps INFO: Read      bank 7 col 010, auto precharge 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 1003316000.0 ps INFO: READ @ DQS= bank = 7 row = 0004 col = 00000000 data = eeff
.....


The warnings & divide by zero at the top are just that I do not have a memory preset data files, IE the model powers up with the ram initialized to h'xxxx instead of fake data.

You need to get to line:
Code: [Select]
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 1001961000.0 ps INFO: Initialization Sequence is complete
Without any errors.  Also, all read and writes, activates and prechargs and refreshes should never produce an error, otherwise your DDR3 code has timing bugs, or, you have entered the wrong tCK figures from the data sheet.  The writes actually place data in the simulated DDR3 ram model and the reads should return the data you have written.

Title: Re: DDR3 initialization sequence issue
Post by: promach on May 26, 2021, 08:55:44 am
However, my current verilog code does not yet support DLL on mode for which the Micron simulation model requires.

Do you have some other alternative method or I have to use Micron simulation model (https://www.micron.com/products/dram/ddr3-sdram/part-catalog/mt41j128m16jt-125) ?

Besides, the Micron simulation model is for HSpice software for which I do not have access with.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 26, 2021, 09:19:38 am
For the sim, just turn on the DLL.  Other than requiring a to spec clock (minimum 303MHz), the remainder of the functionality is identical.
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 26, 2021, 09:29:54 am
I do not have access to HSpice software
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 26, 2021, 10:17:45 am
HSpice?  I'm using ModelSim.  ModelSim comes with every FPGA vendor's eda for free.  And it also should work directly within Xilinx.  It should also work with Active-HDL.

See attachments...

DDR3 Init...
(https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/?action=dlattach;attach=1222291)

DDR3 2x consecutive Write BL8...
(https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/?action=dlattach;attach=1222293)

DDR3 3x consecutive Read BL8 to a 1x Write BL8...
(https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/?action=dlattach;attach=1222295)
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 26, 2021, 10:29:40 am
Do you mean that the DDR3 memory controller code does not need to adhere to the minimum clock spec whenever DLL mode is turned OFF ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 26, 2021, 10:54:24 am
When the DLL is on, if you are not at minimum 303MHz, at every clock cycle in during the simulation, the DDR3 Model will spit out a:

Code: [Select]
BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 3961667.0 ps INFO: Load Mode 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 3961667.0 ps INFO: Load Mode 0 Burst Length =  8
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 3961667.0 ps INFO: Load Mode 0 Burst Order = Sequential
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 3961667.0 ps INFO: Load Mode 0 CAS Latency =           5
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 3961667.0 ps INFO: Load Mode 0 DLL Reset = Reset DLL
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 3961667.0 ps INFO: Load Mode 0 Write Recovery =           5
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 3961667.0 ps INFO: Load Mode 0 Power Down Mode = DLL on
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 4001667.0 ps INFO: ZQ        long = 1
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 4001667.0 ps INFO: Initialization Sequence is complete
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5668333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5671667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5675000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5678333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5681667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5685000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5688333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5691667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5695000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5698333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5701667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5705000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5708333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5711667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5715000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5718333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 5721667.0 ps INFO: Load Mode 3
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 5721667.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 5721667.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Enabled
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5721667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5725000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5728333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5731667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5735000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5738333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5741667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5745000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5748333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5751667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5755000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5758333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task: at time 5761667.0 ps INFO: Read      bank 0 col 000, auto precharge 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5761667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5765000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5768333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5771667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5775000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 5776667.0 ps READ @ DQS MultiPurpose Register 0, col = 0,  data = 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 5778333.0 ps READ @ DQS MultiPurpose Register 0, col = 1,  data = 1
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5778333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 5780000.0 ps READ @ DQS MultiPurpose Register 0, col = 2,  data = 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 5781667.0 ps READ @ DQS MultiPurpose Register 0, col = 3,  data = 1
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5781667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 5783333.0 ps READ @ DQS MultiPurpose Register 0, col = 4,  data = 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 5785000.0 ps READ @ DQS MultiPurpose Register 0, col = 5,  data = 1
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5785000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 5786667.0 ps READ @ DQS MultiPurpose Register 0, col = 6,  data = 0
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.data_task: at time 5788333.0 ps READ @ DQS MultiPurpose Register 0, col = 7,  data = 1
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5788333.0 ps ERROR: tCK(avg) maximum violation by 33.332031 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5791667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5795000.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.
......


I set my clock to 300MHz here in this example.  The Model tests and verifies every timing spec in the data sheet and reports every mistake you make.


The 1 exception to the rule is if you skip out on your power-up reset time, you get this warning instead of an error:

Code: [Select]
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.reset at time 2540000.0 ps WARNING: 200 us is required before RST_N goes inactive.
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.cmd_task at time 3641000.0 ps WARNING: 500 us is required after RST_N goes inactive before CKE goes active.


This is done to allow you to skip on the power-up timers for a quick simulation, otherwise, simulating 700us takes around 1 minute every-time you re-start your simulation.  IE, they have done/allowed this to allow a quick 1 second recompile/sim during development/debugging.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 26, 2021, 11:06:15 am
Note that even with the violation, the DDR3 Model still actually works, you just get to read all those damn violations.  You might be able to change the tCK max figure in the Model code if you want to use slower clocks as the Model code is plain open ascii Verilog to get rid of those damn errors.

Check for where they store the min and max tCK(avg) in the definitions file (contains all the min and max figures for all the speed grades of all the DDR3s available with every timing figure which is in the data sheets in look-up tables...) to allow you to run the sim without error at a slower speed.  This way you are not modifying the bulk code.

These files:
1024Mb_ddr3_parameters.vh
2048Mb_ddr3_parameters.vh
4096Mb_ddr3_parameters.vh
8192Mb_ddr3_parameters.vh

Choose 1 file, the one of the memory size you are simulating, and change this line:

    parameter TCK_MAX          =    3300; // tCK        ps    Maximum Clock Cycle Time

Make it 4000, then you can run the ram down at 250MHz without all those printed violations.
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 26, 2021, 11:26:40 am
Could the 2GB DDR3 work with 'ck' signal of 12.5MHz frequency in pure verilog simulation ?

Besides, can I turn DLL mode OFF for the entire DDR3 model simulation ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 26, 2021, 11:40:48 am
The slowest I got working fine is 200MHz.  However, there are limitations elsewhere in my own code which may be preventing the sim from fully functioning as when I tried 50MHz, there was no error from the DDR3 Model, but, it could not pass the read-calibration at power-up and got stuck trying to tune the PLL read phase.

Title: Re: DDR3 initialization sequence issue
Post by: promach on May 26, 2021, 11:43:14 am
can I turn DLL mode OFF for the entire DDR3 model simulation ?

What do you exactly mean by "there was no error from the DDR3 Model, but, it could not pass the read-calibration at power-up and got stuck trying to tune the PLL read phase." ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 26, 2021, 11:48:13 am
I mean that I did not see the hundreds of:
Code: [Select]
# BrianHG_DDR3_PHY_SEQ_tb.sdramddr3_0.main: at time 5671667.0 ps ERROR: tCK(avg) maximum violation by 33.333984 ps.When I fed a 200MHz clock or a 50MHz clock.

DLL off does nothing unless you are entering low power sleep mode where you are allowed slow clocks to maintain a refresh.  The DDR3 IO is dead in this circumstance.

If what you are trying to do is operate the DDR3 with the DLL in disable mode, that is something different.  I cannot get the DDR3 Model to initialize with the DLL in disable mode.  You will need to experiment on your own here.

Power-down DLL Off/On and DLL Disable mode are 2 different things.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 26, 2021, 12:04:54 pm
Power-down DLL Off/On and DLL Enable mode are 2 different things.
Ooops...

It's in MRS1...
Title: Re: DDR3 initialization sequence issue
Post by: asmi on May 26, 2021, 01:54:34 pm
Running DDR3 with DLL off is a bad idea. Here is a telling quote from Micron's datasheet:
Quote
The DRAM is not tested to check—nor does Micron warrant compliance with—normal mode timings or functionality when the DLL is disabled. An attempt has been made to have the DRAM operate in the normal mode where reasonably possible when the DLL has been disabled; however, by industry standard, a few known exceptions are defined:
• ODT is not allowed to be used
• The output data is no longer edge-aligned to the clock
• CL and CWL can only be six clocks
When the DLL is disabled, timing and functionality can vary from the normal operation specifications when the DLL is enabled (see DLL Disable Mode (page 123)). Disabling the DLL also implies the need to change the clock frequency (see Input Clock Frequency Change (page 127)).
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 26, 2021, 01:58:42 pm
@asmi :

Turning on DLL means I need to adhere to minimum clock frequency spec (which is above 300MHz), which will need to use PLL. 
Please correct me if wrong.

What if I turn on DLL, but uses only 12.5MHz frequency for 'ck' signal ?



@BrianHG :

I saw ddr3.v which I suppose is the Micron simulation model.
But do I really need tb.v   ?  I suppose my https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v already done the same thing ?

And what is the purpose of subtest.vh ?

Code: [Select]
[phung@archlinux DDR3 SDRAM Verilog Model]$ ls -al
total 492
drwxr-xr-x 2 phung users   4096 May 26 21:39 .
drwxrwxr-x 9 phung  1000   4096 May 26 21:39 ..
-rw-r--r-- 1 phung users  54986 Aug 21  2015 1024Mb_ddr3_parameters.vh
-rw-r--r-- 1 phung users  54042 Aug 21  2015 2048Mb_ddr3_parameters.vh
-rw-r--r-- 1 phung users  54042 Aug 21  2015 4096Mb_ddr3_parameters.vh
-rw-r--r-- 1 phung users  38899 Aug 21  2015 8192Mb_ddr3_parameters.vh
-rw-r--r-- 1 phung users 165396 Sep 10  2015 ddr3.v
-rw-r--r-- 1 phung users  17369 Jun 20  2015 ddr3_dimm.v
-rw-r--r-- 1 phung users   4213 Jun 20  2015 ddr3_mcp.v
-rw-r--r-- 1 phung users  34845 Jun 20  2015 ddr3_module.v
-rw-r--r-- 1 phung users   9421 Jun 20  2015 readme.txt
-rw-rw-r-- 1 phung users  14463 Jun 20  2015 subtest.vh
-rw-r--r-- 1 phung users  20002 Jun 20  2015 tb.v
[phung@archlinux DDR3 SDRAM Verilog Model]$
Title: Re: DDR3 initialization sequence issue
Post by: asmi on May 26, 2021, 02:44:38 pm
@asmi :

Turning on DLL means I need to adhere to minimum clock frequency spec (which is above 300MHz), which will need to use PLL. 
Please correct me if wrong.

What if I turn on DLL, but uses only 12.5MHz frequency for 'ck' signal ?
Of course. But the entire point of using DDR3 is bandwidth, so running it so grossly underclocked makes no sense. There are simpler memories that can be used if you don't need high bandwidth, and they will have smaller access latencies too - like "classic" SDRAM, LPSDR or LPDDR1, or even HyperRAM memory (which is much easier to use because it hides all DRAM complexity inside, and it appears almost as regular synchronous SRAM as far as controller is concerned). All of these memories do not require complex controller like DDR3 does, and are much easier do design. Micron provides sim models for all these DRAMs, and Cypress provides a model for HyperRAM memory too.
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 26, 2021, 02:50:04 pm
12.5MHz is only for early development stage.
Using slower clock is easier for debugging work.

So, what will actually happen if I turn on DLL, but uses only 12.5MHz frequency for 'ck' signal ?
Title: Re: DDR3 initialization sequence issue
Post by: asmi on May 26, 2021, 03:08:23 pm
12.5MHz is only for early development stage.
Using slower clock is easier for debugging work.
How is that any easier? You are supposed to be debugging in the sim, so frequency does not matter.

So, what will actually happen if I turn on DLL, but uses only 12.5MHz frequency for 'ck' signal ?
I have no idea. But I do know that running devices outside their specifications is always a bad idea, because even if it works with one device, there is no guarantee that other devices will work as well.
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 26, 2021, 03:41:08 pm
ok, now I am using Micron simulation model.

but why do I need tb.v when my own DDR controller is doing the same thing ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 26, 2021, 07:07:09 pm
ok, now I am using Micron simulation model.

but why do I need tb.v when my own DDR controller is doing the same thing ?

You make your own promach_ddr3_testbench.sv.

In that testbench file, you place your project's top DDR3 controller, and wire it's DDR3 IO pins to Micron's ddr3.v.  The idea is that your test-bench should not be attached to you main FPGA project so you may do compile development and tests outside the FPGA project, then when you do a normal compile for the FPGA, you should expect your logic to function there on real hardware.


Mircon's tb.v is only there for an example setup.  You do not need it.  For 1 ram chip, you only need the ddr3.v and the xxx.ddr3_parameter files.  If your controller hooks up to DDR3 dim modules, then you need the xxx_dimm.v as your top inside your testbench file with the changed include file.  Same for multi-chip hookups or normal DDR3 modules.

IE: for 1 DDR3 chip at top of your testbench code:
Code: [Select]
//************************************************************************************************************************************************************
//************************************************************************************************************************************************************
//************************************************************************************************************************************************************
//*** DDR3 Verilog model from Micron Required for this test-bench.
//*** The required DDR3 SDRAM Verilog Model V1.74 available at:
//*** [url]https://media-www.micron.com/-/media/client/global/documents/products/sim-model/dram/ddr3/ddr3-sdram-verilog-model.zip?rev=925a8a05204e4b5c9c1364302de60126[/url]
//*** From the 'DDR3 SDRAM Verilog Model.zip', only these 2 files are required in the main simulation test-bench source folder:
//*** ddr3.v
//*** 4096Mb_ddr3_parameters.vh
//************************************************************************************************************************************************************
// Tell Micron's DDR3 Verilog model which ram chip we expect to have connected to the test bench.
//************************************************************************************************************************************************************
`define den4096Mb
`define sg093
`define x16
//************************************************************************************************************************************************************
//************************************************************************************************************************************************************
//************************************************************************************************************************************************************
`timescale 1 ps/ 1 ps // 1 picosecond steps, 1 picosecond precision.

After your testbench IOs are configured and you initiated your own DDR3 controller, add:
Code: [Select]
//************************************************************************************************************************************************************
//*** DDR3 Verilog model from Micron Required for this test-bench.
//************************************************************************************************************************************************************
`include "ddr3.v"
//************************************************************************************************************************************************************
//*** DDR3 Verilog model from Micron Required for this test-bench.
//*** The required DDR3 SDRAM Verilog Model V1.74 available at:
//*** [url]https://media-www.micron.com/-/media/client/global/documents/products/sim-model/dram/ddr3/ddr3-sdram-verilog-model.zip?rev=925a8a05204e4b5c9c1364302de60126[/url]
//*** From the 'DDR3 SDRAM Verilog Model.zip', only these 2 files are required in the main simulation test-bench source folder:
//*** ddr3.v
//*** 4096Mb_ddr3_parameters.vh
//************************************************************************************************************************************************************
    // component instantiation
    ddr3 sdramddr3_0 (
        .rst_n      ( DDR3_RESET_n ),
        .ck         ( DDR3_CK_p[0] ),
        .ck_n       ( DDR3_CK_n[0] ),
        .cke        ( DDR3_CKE     ),
        .cs_n       ( DDR3_CS_n    ),
        .ras_n      ( DDR3_RAS_n   ),
        .cas_n      ( DDR3_CAS_n   ),
        .we_n       ( DDR3_WE_n    ),
        .dm_tdqs    ( DDR3_DM      ),
        .ba         ( DDR3_BA      ),
        .addr       ( DDR3_A       ),
        .dq         ( DDR3_DQ      ),
        .dqs        ( DDR3_DQS_p   ),
        .dqs_n      ( DDR3_DQS_n   ),
        .tdqs_n     (              ),
        .odt        ( DDR3_ODT     )
    );
//************************************************************************************************************************************************************
//************************************************************************************************************************************************************
//************************************************************************************************************************************************************

The remainder of your testbench code should be programmed to generate your clock and first initial reset pulse and states and whatever other signals your DDR3 controller requires to opperate.
It is then the job of your DDR3 controller to feed Micron's DDR3.v correctly.  Then, in the simulation console, Micron's DDR3.v prints out every transaction your DDR3 controller makes as they happen.

This is the bare minimum.  In my test bench code, I wait for my top DDR3_PHY_SEQ.sv inside my testbench DDR3_PHY_SEQ_tb.sv to operate Micron DDR3.sv until it does a power-up sequence, then my testbench allows me to send a normal address and read or write commands with data to my DDR3_PHY_SEQ.sv which in turn runs Micron's DDR3.v thinking it is connected to a real external ram chip on the IO pins.
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 27, 2021, 04:45:21 am
Why the following timescale related error for https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L28 ?

(https://i.imgur.com/onAssYX.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 27, 2021, 05:31:23 am
This is my setup script 'setup_phy.do':

Code: [Select]
transcript on
if {[file exists work]} {
vdel -lib work -all
}
vlib work
vmap work work
vlog -sv -work work {altera_gpio_lite.sv}
vlog -sv -work work {BrianHG_DDR3_GEN_tCK.sv}
vlog -sv -work work {BrianHG_DDR3_PLL.sv}
vlog -sv -work work {BrianHG_DDR3_PLL_tb.sv}
vlog -sv -work work {BrianHG_DDR3_IO_PORT_ALTERA.sv}
vlog -sv -work work {BrianHG_DDR3_PHY_SEQ.sv}
vlog -sv -work work {BrianHG_DDR3_PHY_SEQ_tb.sv}

#Enable this line for MAX 10 parts.
vsim -t 1ps -L altera_ver -L lpm_ver -L sgate_ver -L altera_mf_ver -L altera_lnsim_ver -L fiftyfivenm_ver -L work -voptargs="+acc"  BrianHG_DDR3_PHY_SEQ_tb

restart -force -nowave
# This line shows only the varible name instead of the full path and which module it was in
config wave -signalnamewidth 1

add  wave /BrianHG_DDR3_PHY_SEQ_tb/*

do run_phy.do

Part 2, the script 'run_phy.do':
Code: [Select]
vlog -sv -work work {altera_gpio_lite.sv}
vlog -sv -work work {BrianHG_DDR3_GEN_tCK.sv}
vlog -sv -work work {BrianHG_DDR3_PLL.sv}
vlog -sv -work work {BrianHG_DDR3_PLL_tb.sv}
vlog -sv -work work {BrianHG_DDR3_IO_PORT_ALTERA.sv}
vlog -sv -work work {BrianHG_DDR3_PHY_SEQ.sv}
vlog -sv -work work {BrianHG_DDR3_PHY_SEQ_tb.sv}

restart -force
run -all

wave cursor active
wave refresh
wave zoom range 6370ns 6420ns
view signals

In the console, once I am in the right directory, I just type:
do setup_phy.do
To setup and run the sim.
I also type:
do run_phy.do
For a quick re-compile of any code changes and sim re-run.  If too many big changes are made, then occasionally I need to re: do setup_phy.do.


Now, the vsim line:
vsim -t 1ps -L altera_ver -L lpm_ver -L sgate_ver -L altera_mf_ver -L altera_lnsim_ver -L fiftyfivenm_ver -L work -voptargs="+acc"  BrianHG_DDR3_PHY_SEQ_tb

All the '-L xxx' are mostly there for modelsim to load Altera's hardware IP which you should not need, but you can see that only the final testbench module & file  'BrianHG_DDR3_PHY_SEQ_tb' + the '-L work -voptargs="+acc"' are all that's needed while the ' -t 1ps ' sets an overarching default time scale for Modelsim.

The DDR3.v requires : `timescale 1 ps/ 1 ps // 1 picosecond steps, 1 picosecond precision.
To operate accurately.

It should only be required in the testbench module.  Modelsim will see the modules your TB instances call and will hunt through your 'vlog' files to find the modules and include them in the compile.

Note: the '-sv' option in my vlog lines tells ModelSim to force SystemVerilog for those source files.  You may omit that option.


If you are looking for a small generic 'no vendor specific' Modelsim setup I made awhile back using do scripts and code, go to this thread: https://www.eevblog.com/forum/fpga/systemverilog-example-testbench-which-saves-a-bmp-picture-and-executes-a-script/msg3436876/#msg3436876 (https://www.eevblog.com/forum/fpga/systemverilog-example-testbench-which-saves-a-bmp-picture-and-executes-a-script/msg3436876/#msg3436876)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 27, 2021, 05:49:00 am
Hmmm, funny, I never used the project feature before.  Check my screenshot...
Though, it compiles and simulates my source code fine.

The 'library' tab has my 'work' folder with all the source files listed in there ( from that vlog command in my setup script ) except for Micron's model as it is auto added from the testbench source file's `include.  The library also has all of Altera's sources unless I use model sim from another FPGA vendor.
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 27, 2021, 06:57:56 am
I run "compile all" again, and the timescale issue is gone.  Probably I had updated the code and run the modelsim simulation without first recompiling the files again

However, when I issue command "run 1ns" , modelsim simulator seems forever running and no output simulation waveform is ready ?

(https://i.imgur.com/dNBu8Ut.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 27, 2021, 07:21:04 am
Your simulation doesn't seem to be doing anything at all.
In your test-bench, are you generating any signals, like a basic clock?
The sim should be counting up time in the status bar until you click stop, or your testbench code generates a $stop or $end.

This is what the key portion of my TB looks like:
Code: [Select]
localparam      period   = 500000000/CLK_KHZ_IN ;

logic [2:0]  phase_cs;
logic        phase_step,phase_updn;
logic        phase_done;

// *********************************************************************************************
// This module generates the master reference clocks for the entire memory system.
// *********************************************************************************************
BrianHG_DDR3_PLL  #(.FPGA_VENDOR    (FPGA_VENDOR),    .INTERFACE_SPEED (INTERFACE_SPEED),  .DDR_TRICK_MTPS_CAP       (DDR_TRICK_MTPS_CAP),
                    .CLK_KHZ_IN     (CLK_KHZ_IN),     .CLK_IN_MULT     (CLK_IN_MULT),      .CLK_IN_DIV               (CLK_IN_DIV),
                    .DDR3_WDQ_PHASE (DDR3_WDQ_PHASE), .DDR3_RDQ_PHASE  (DDR3_RDQ_PHASE)
) DUT_DDR3_PLL (    .RST_IN         (RST_IN),         .RST_OUT         (RESET),            .CLK_IN    (CLK_IN),      .DDR3_CK    (DDR3_CLK),
                    .DDR3_CLK_WDQ   (DDR3_CLK_WDQ),   .DDR3_CLK_RDQ    (DDR3_CLK_RDQ),     .CMD_CLK   (CMD_CLK),     .PLL_LOCKED (PLL_LOCKED),

                    .phase_cs       ( phase_cs   ),   .phase_step      ( phase_step ),     .phase_updn  ( phase_updn ),
                    .phase_sclk     ( CLK_IN ),       .phase_done      ( phase_done ) );


// ******************************************************************************************************
// This module receives the commands from the multi-port ram controller and sequences the DDR3 IO pins.
// ******************************************************************************************************
BrianHG_DDR3_PHY_SEQ    #(.FPGA_VENDOR         (FPGA_VENDOR),         .FPGA_FAMILY         (FPGA_FAMILY),        .INTERFACE_SPEED    (INTERFACE_SPEED),
                          .CLK_KHZ_IN          (CLK_KHZ_IN),          .CLK_IN_MULT         (CLK_IN_MULT),        .CLK_IN_DIV         (CLK_IN_DIV),
                         
                          .DDR3_CK_MHZ         (DDR3_CK_MHZ),         .DDR3_SPEED_GRADE    (DDR3_SPEED_GRADE),   .DDR3_SIZE_GB       (DDR3_SIZE_GB),
                          .DDR3_WIDTH_DQ       (DDR3_WIDTH_DQ),       .DDR3_NUM_CHIPS      (DDR3_NUM_CHIPS),     .DDR3_NUM_CK        (DDR3_NUM_CK),
                          .DDR3_WIDTH_ADDR     (DDR3_WIDTH_ADDR),     .DDR3_WIDTH_BANK     (DDR3_WIDTH_BANK),    .DDR3_WIDTH_CAS     (DDR3_WIDTH_CAS),
                          .DDR3_WIDTH_DM       (DDR3_WIDTH_DM),       .DDR3_WIDTH_DQS      (DDR3_WIDTH_DQS),     .DDR3_ODT_RTT       (DDR3_ODT_RTT),
                          .DDR3_RZQ            (DDR3_RZQ),            .DDR3_TEMP           (DDR3_TEMP),          .DDR3_WDQ_PHASE     (DDR3_WDQ_PHASE),
                          .DDR3_RDQ_PHASE      (DDR3_RDQ_PHASE),      .DDR3_MAX_REF_QUEUE  (DDR3_MAX_REF_QUEUE), .IDLE_TIME_uSx10    (IDLE_TIME_uSx10),
                          .POWR_UP_TIMER_uS    (POWR_UP_TIMER_uS),    .BANK_ROW_ORDER      (BANK_ROW_ORDER),

                          .PORT_VECTOR_SIZE    (PORT_VECTOR_SIZE),    .PORT_ADDR_SIZE      (PORT_ADDR_SIZE)

) DUT_PHY_SEQ (           // *** DDR3_PHY_SEQ Clocks & Reset ***
                          .RST_IN              (RESET),              .DDR_CLK       (DDR3_CLK),   .DDR_CLK_WDQ (DDR3_CLK_WDQ), .DDR_CLK_RDQ (DDR3_CLK_RDQ),

                          // *** DDR3 Ram Chip IO Pins ***           
                          .DDR3_RESET_n        (DDR3_RESET_n),       .DDR3_CK_p     (DDR3_CK_p),  .DDR3_CKE    (DDR3_CKE),     .DDR3_CS_n   (DDR3_CS_n),
                          .DDR3_RAS_n          (DDR3_RAS_n),         .DDR3_CAS_n    (DDR3_CAS_n), .DDR3_WE_n   (DDR3_WE_n),    .DDR3_ODT    (DDR3_ODT),
                          .DDR3_A              (DDR3_A),             .DDR3_BA       (DDR3_BA),    .DDR3_DM     (DDR3_DM),      .DDR3_DQ     (DDR3_DQ),
                          .DDR3_DQS_p          (DDR3_DQS_p),         .DDR3_DQS_n    (DDR3_DQS_n), .DDR3_CK_n   (DDR3_CK_n),

                          // *** Command port input ***             
                          .SEQ_CMD_CLK         (CMD_CLK),            .SEQ_CMD_ENA_t (SEQ_CMD_ENA_t),      .SEQ_WRITE_ENA      (SEQ_WRITE_ENA),
                          .SEQ_ADDR            (SEQ_ADDR),           .SEQ_WDATA     (SEQ_WDATA),          .SEQ_WMASK          (SEQ_WMASK),
                          .SEQ_RDATA_VECT_IN   (SEQ_RDATA_VECT_IN),                                       .SEQ_refresh_hold   (SEQ_refresh_hold),

                          // *** Command port results ***                                                 
                          .SEQ_BUSY_t          (SEQ_BUSY_t),         .SEQ_RDATA_RDY_t (SEQ_RDATA_RDY_t),  .SEQ_RDATA          (SEQ_RDATA),
                          .SEQ_RDATA_VECT_OUT  (SEQ_RDATA_VECT_OUT),                                      .SEQ_refresh_queue  (SEQ_refresh_queue),

                          // *** Diagnostic flags ***                                                 
                          .SEQ_CAL_PASS        (SEQ_CAL_PASS),       .DDR3_READY    (DDR3_READY) );
// ***********************************************************************************************


//************************************************************************************************************************************************************
//*** DDR3 Verilog model from Micron Required for this test-bench.
//************************************************************************************************************************************************************
`include "ddr3.v"
//************************************************************************************************************************************************************
//*** DDR3 Verilog model from Micron Required for this test-bench.
//*** The required DDR3 SDRAM Verilog Model V1.74 available at:
//*** [url]https://media-www.micron.com/-/media/client/global/documents/products/sim-model/dram/ddr3/ddr3-sdram-verilog-model.zip?rev=925a8a05204e4b5c9c1364302de60126[/url]
//*** From the 'DDR3 SDRAM Verilog Model.zip', only these 2 files are required in the main simulation test-bench source folder:
//*** ddr3.v
//*** 4096Mb_ddr3_parameters.vh
//************************************************************************************************************************************************************
    // component instantiation
    ddr3 sdramddr3_0 (
        .rst_n      ( DDR3_RESET_n ),
        .ck         ( DDR3_CK_p[0] ),
        .ck_n       ( DDR3_CK_n[0] ),
        .cke        ( DDR3_CKE     ),
        .cs_n       ( DDR3_CS_n    ),
        .ras_n      ( DDR3_RAS_n   ),
        .cas_n      ( DDR3_CAS_n   ),
        .we_n       ( DDR3_WE_n    ),
        .dm_tdqs    ( DDR3_DM      ),
        .ba         ( DDR3_BA      ),
        .addr       ( DDR3_A       ),
        .dq         ( DDR3_DQ      ),
        .dqs        ( DDR3_DQS_p   ),
        .dqs_n      ( DDR3_DQS_n   ),
        .tdqs_n     (              ),
        .odt        ( DDR3_ODT     )
    );
//************************************************************************************************************************************************************
//************************************************************************************************************************************************************
//************************************************************************************************************************************************************

logic       [7:0] WDT_COUNTER;                                                       // Wait for 15 clocks or inactivity before forcing a simulation stop.
logic             WAIT_IDLE        = 0;                                              // When high, insert a idle wait before every command.
localparam int    WDT_RESET_TIME   = 255;                                            // Set the WDT timeout clock cycles.
localparam int    SYS_IDLE_TIME    = WDT_RESET_TIME-64;                              // Consider system idle after 12 clocks of inactivity.
localparam real   DDR3_CK_MHZ_REAL = CLK_KHZ_IN * CLK_IN_MULT / CLK_IN_DIV / 1000 ;  // Generate the DDR3 CK clock frequency.
localparam real   DDR3_CK_PERIOD   = 1000 / DDR3_CK_MHZ_REAL ;                       // Generate the DDR3 CK period in nanoseconds.

initial begin
WDT_COUNTER       = WDT_RESET_TIME  ; // Set the initial inactivity timer to maximum so that the code later-on wont immediately stop the simulation.
SEQ_CMD_ENA_t     = 0 ;
SEQ_WRITE_ENA     = 0 ;
SEQ_ADDR          = 0 ;
SEQ_WDATA         = 0 ;
SEQ_WMASK         = 0 ;
SEQ_RDATA_VECT_IN = 0 ;
SEQ_refresh_hold  = 0 ;

phase_cs   = 3'b000 ;
phase_step = 1'b0 ;
phase_updn = 1'b0 ;

RST_IN = 1'b1 ; // Reset input
CLK_IN = 1'b0 ;
#(50000);
RST_IN = 1'b0 ; // Release reset at 50ns.

while (!PLL_LOCKED) @(negedge CMD_CLK);
execute_ascii_file(TB_COMMAND_SCRIPT_FILE);

end

always #period                  CLK_IN = !CLK_IN;                                             // create source clock oscillator
always @(posedge CLK_IN)   WDT_COUNTER = (SEQ_BUSY_t!=SEQ_CMD_ENA_t) ? WDT_RESET_TIME : (WDT_COUNTER-1'b1) ;   // Setup a simulation inactivity watchdog countdown timer.
always @(posedge CLK_IN) if (WDT_COUNTER==0) begin
                                             Script_CMD  = "*** WDT_STOP ***" ;
                                             $stop;                                           // Automatically stop the simulation if the inactivity timer reaches 0.
                                             end


As you can see, I define a clock period and have my PLL, DDR3_PHY_SEQencer & Micron's ddr3.v wired together.

Then at initialize, I set the power-up defaults.

Then I wait for the PLL lock.

Then I execute my command script, (you may ignore this)

Then, at 'always #period                  CLK_IN = !CLK_IN;                                             // create source clock oscillator' I am synthesizing the clock.

The next 2 lines, I am counting clock cycles and if there is no activity on the DDR3 sequencer command for enough clock cycles, I tell the simulator to $stop.   At this point, the simulation waveform is viewable.


Title: Re: DDR3 initialization sequence issue
Post by: promach on May 27, 2021, 08:00:24 am
I restarted Modelsim and rerun simulation, then I found the following error about tck_avg

I searched the contents of all Micron simulation model files, but it seems that tck_avg is not assigned any value at all, which might means that it defaults to have value of 0  ?  Please correct me if wrong.

Code: [Select]
[phung@archlinux DDR3 SDRAM Verilog Model]$ grep -n tck_avg *
ddr3.v:167:    real    tck_avg;
ddr3.v:543:    assign TZQCS   = max( 64, ceil( 80000/tck_avg));
ddr3.v:544:    assign TZQINIT =  max(512, ceil(640000/tck_avg));
ddr3.v:545:    assign TZQOPER =  max(256, ceil(320000/tck_avg));
ddr3.v:1350:       mode_reg[0][8] <= #($rtoi(tck_avg)) 1'b0;
ddr3.v:1742:                    if (dq_in_valid && dll_locked && ($time - tm_dqs_neg[i] < $rtoi(TDSS*tck_avg)))
ddr3.v:1752:                        if ((tm_tdqss < tck_avg/2.0) && (tm_tdqss > TDQSS*tck_avg))
ddr3.v:1927:                    if (dqsck_max > dqsck[i] + TQH*tck_avg + TDQSQ) begin
ddr3.v:1928:                        dqsck_max = dqsck[i] + TQH*tck_avg + TDQSQ;
ddr3.v:1931:                    if (dqsck_min < dqsck[i] - TQH*tck_avg - TDQSQ) begin
ddr3.v:1932:                        dqsck_min = dqsck[i] - TQH*tck_avg - TDQSQ;
ddr3.v:1940:                    if (dqsq_min < dqsck[i] - TQH*tck_avg) begin
ddr3.v:1941:                        dqsq_min = dqsck[i] - TQH*tck_avg;
ddr3.v:1950:                    dqs_out_en_dly[i] <= #(tck_avg/2) dqs_out_en;
ddr3.v:1951:                    dqs_out_dly[i]    <= #(tck_avg/2 + dqsck[i]) dqs_out;
ddr3.v:1954:                            dq_out_en_dly[i*`DQ_PER_DQS + j] <= #(tck_avg/2) dq_out_en;
ddr3.v:1956:                                dq_out_dly   [i*`DQ_PER_DQS + j] <= #(tck_avg/2 + dqsq_min) dq_out[i*`DQ_PER_DQS + j];
ddr3.v:1958:                                dq_out_dly   [i*`DQ_PER_DQS + j] <= #(tck_avg/2 + $dist_uniform(seed, dqsq_min, dqsq_max)) dq_out[i*`DQ_PER_DQS + j];
ddr3.v:2022:                    tjit_per_rtime = $time - tm_ck_pos - tck_avg;
ddr3.v:2024:                    tjit_per_rtime = $time - tm_ck_neg - tck_avg;
ddr3.v:2067:                        if (ceil(write_recovery*tck_avg) < TWR)
ddr3.v:2068:                            $display ("%m: at time %t ERROR: Write Recovery = %d is illegal @tCK(avg) = %f", $time, write_recovery, tck_avg);
ddr3.v:2072:                                5 : if (tck_avg < 2500.0)                          $display ("%m: at time %t ERROR: CWL = %d is illegal @tCK(avg) = %f", $time, cas_write_latency, tck_avg);
ddr3.v:2073:                                6 : if ((tck_avg < 1875.0) || (tck_avg >= 2500.0)) $display ("%m: at time %t ERROR: CWL = %d is illegal @tCK(avg) = %f", $time, cas_write_latency, tck_avg);
ddr3.v:2074:                                7 : if ((tck_avg < 1500.0) || (tck_avg >= 1875.0)) $display ("%m: at time %t ERROR: CWL = %d is illegal @tCK(avg) = %f", $time, cas_write_latency, tck_avg);
ddr3.v:2075:                                8 : if ((tck_avg < 1250.0) || (tck_avg >= 1500.0)) $display ("%m: at time %t ERROR: CWL = %d is illegal @tCK(avg) = %f", $time, cas_write_latency, tck_avg);
ddr3.v:2076:                                9 : if ((tck_avg < 1071.0) || (tck_avg >= 1250.0)) $display ("%m: at time %t ERROR: CWL = %d is illegal @tCK(avg) = %f", $time, cas_write_latency, tck_avg);
ddr3.v:2077:                                10: if ((tck_avg < 937.5) || (tck_avg >= 1071.0)) $display ("%m: at time %t ERROR: CWL = %d is illegal @tCK(avg) = %f", $time, cas_write_latency, tck_avg);
ddr3.v:2078:                                default :                                          $display ("%m: at time %t ERROR: CWL = %d is illegal @tCK(avg) = %f", $time, cas_write_latency, tck_avg);
ddr3.v:2189:                    tck_avg = tck_avg - tck_sample[ck_cntr%PERTCKAVG]/$itor(PERTCKAVG);
ddr3.v:2190:                    tck_avg = tck_avg + tck_i/$itor(PERTCKAVG);
ddr3.v:2192:                    tjit_per_rtime = tck_i - tck_avg;
ddr3.v:2198:                            terr_nper_rtime = terr_nper_rtime + tck_sample[i] - tck_avg;
ddr3.v:2221:                        if (TCK_MIN - tck_avg >= 1.0)
ddr3.v:2222:                            $display ("%m: at time %t ERROR: tCK(avg) minimum violation by %f ps.", $time, TCK_MIN - tck_avg);
ddr3.v:2223:                        if (tck_avg - TCK_MAX >= 1.0)
ddr3.v:2224:                            $display ("%m: at time %t ERROR: tCK(avg) maximum violation by %f ps.", $time, tck_avg - TCK_MAX);
ddr3.v:2227:                        if (tm_ck_neg - $time < TCL_ABS_MIN*tck_avg)
ddr3.v:2228:                            $display ("%m: at time %t ERROR: tCL(abs) minimum violation on CLK by %t", $time, TCL_ABS_MIN*tck_avg - tm_ck_neg + $time);
ddr3.v:2229:                        if (tcl_avg < TCL_AVG_MIN*tck_avg)
ddr3.v:2230:                            $display ("%m: at time %t ERROR: tCL(avg) minimum violation on CLK by %t", $time, TCL_AVG_MIN*tck_avg - tcl_avg);
ddr3.v:2231:                        if (tcl_avg > TCL_AVG_MAX*tck_avg)
ddr3.v:2232:                            $display ("%m: at time %t ERROR: tCL(avg) maximum violation on CLK by %t", $time, tcl_avg - TCL_AVG_MAX*tck_avg);
ddr3.v:2240:                    duty_cycle = $rtoi(tch_avg*100/tck_avg);
ddr3.v:2254:                        if ($time - tm_ck_pos < TCH_ABS_MIN*tck_avg)
ddr3.v:2255:                            $display ("%m: at time %t ERROR: tCH(abs) minimum violation on CLK by %t", $time, TCH_ABS_MIN*tck_avg - $time + tm_ck_pos);
ddr3.v:2256:                        if (tch_avg < TCH_AVG_MIN*tck_avg)
ddr3.v:2257:                            $display ("%m: at time %t ERROR: tCH(avg) minimum violation on CLK by %t", $time, TCH_AVG_MIN*tck_avg - tch_avg);
ddr3.v:2258:                        if (tch_avg > TCH_AVG_MAX*tck_avg)
ddr3.v:2259:                            $display ("%m: at time %t ERROR: tCH(avg) maximum violation on CLK by %t", $time, tch_avg - TCH_AVG_MAX*tck_avg);
ddr3.v:2331:                        odt_state_dly <= #(TAOF*tck_avg) odt_state;
ddr3.v:2348:                dyn_odt_state_dly <= #(TADC*tck_avg) dyn_odt_state;
ddr3.v:2745:                        if ($time - tm_dqs_pos[i] < $rtoi(TWPRE*tck_avg))
ddr3.v:2748:                        if ($time - tm_dqs_neg[i] < $rtoi(TWPST*tck_avg))
ddr3.v:2751:                        if ($time - tm_dqs_neg[i] < $rtoi(TDQSL*tck_avg))
ddr3.v:2858:            if ($time - tm_dqs_pos[i] < $rtoi(TDQSH*tck_avg))
ddr3.v:2859:                $display ("%m: at time %t ERROR: tDQSH violation on DQS bit %d by %t", $time, i, tm_dqs_pos[i] + TDQSH*tck_avg - $time);
ddr3.v:2864:                    if ($time - tm_dqs_pos[i] < $rtoi(TDQSH*tck_avg))
ddr3.v:2866:                    if ($time - tm_ck_pos < $rtoi(TDSH*tck_avg))
[phung@archlinux DDR3 SDRAM Verilog Model]$


(https://i.imgur.com/ndnjdjs.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 27, 2021, 08:06:56 am
Don't worry about those 3 errors, I get the same 3 divide by 0 errors.  It does not stop the simulation from working...

It appears that the initial startup works.

If you run the simulation for 1ps, then your waveform will be too short to see anything on the waveform.
Try running it for 1000ns, or 1us... Or even 10us...

Note that if you have a powerup timer set to the multi-ms range, then you need to run the simulation that long to get to that point.
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 27, 2021, 08:20:19 am
I have added $stop;  as well as using "run 1us" , but the simulation seems to be forever running

(https://i.imgur.com/qSBaLFJ.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 27, 2021, 08:32:27 am
Beginner mistake I luckily caught, see photo...

Also, you period figures should be in picoseconds, not nanoseconds...
IE, multiply by 1000...
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 27, 2021, 08:50:02 am
thanks for reminding me to use "always #PERIOD clk = ~clk;"
It helps to solve the simulation forever running issue, but the simulation output waveform is empty (no data). Why ?

Besides, what do you exactly mean by "period figures should be in picoseconds, not nanoseconds." ?

I am already using :    `timescale 1ns / 10ps  // time-unit = 1 ns, precision = 10 ps

(https://i.imgur.com/qJRecrb.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 27, 2021, 09:34:28 am
The timescale of 1ps/1ps is the specified value to use in Micron's ddr3.v, so, I wouldn't play with that.
I used the 1ps/1ps & the sim is accurate.

As for no data, step 1, go to your test_DDR3_controller on the 'sim' tab on the left.
In the objects window, all the wires which are there should show up.
Right Click on your 'clk' and 'resetn' and add-wave.

Restart & run the sim, and those 2 should show in your waveform.

I assume the 'no-data' may just be un-wired ports, or some changes you recently made has altered the base names of the ports.  They may just need to be cleared from the waveform and re-placed the same way you just did it with the 'clk' and 'resetn' signals.
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 27, 2021, 11:22:47 am
Why the reset timing failed the 200us timing requirement (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L190) ?

(https://i.imgur.com/eeXt3xg.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 27, 2021, 03:55:13 pm
I have solved the 200us reset timing violation in previous post.

Now, I am having tIs violation inside STATE_INIT_MRS_2 (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L985) .  Do you have any idea how to get around this particular timing violation ?


Code: [Select]
# test_ddr3_memory_controller.mem.main: at time 924036300.0 ps ERROR:   tIS violation on CKE by 350.0 ps
# test_ddr3_memory_controller.mem.cmd_task: at time 924036300.0 ps ERROR: NOP or Deselect is required when CKE goes active.
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on CS_N    by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on RAS_N   by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on CAS_N   by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR  0 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR  1 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR  2 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR  3 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR  4 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR  5 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR  6 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR  7 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR  8 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR  9 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR 10 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR 11 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR 12 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR 13 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR 14 by 350.0 ps
# test_ddr3_memory_controller.mem.main: at time 924247500.0 ps ERROR:   tIS violation on ADDR 15 by 350.0 ps


Note: I have attached a zip file containing the Modelsim timing violation log file reported by Micron simulation model as well as the vcd waveform file exported out from Modelsim.


(https://i.imgur.com/ysYTA45.png)

(https://i.imgur.com/QlkArJ1.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 27, 2021, 07:23:41 pm
Did you search the Micron DDR3 datasheet to see what the tIS timing should be?
Did you zoom into your output waveform to see if the timing is met?

I got you to the point where the DDR3.v model seems to be working.  It is now your job to make your DDR3 ram controller function without any violations.

Look at my initialize sequence screenshot on the previous forum page.  It is zoomed in enough to show you what the waveform should look like with all the DDR3 pins in the middle except for CLKn.  And if you compare it to the visual diagrams in the DDR3 datasheet, and look at the time scale in my screenshot, you will see a compliant match.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 27, 2021, 11:52:49 pm
Why the reset timing failed the 200us timing requirement (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L190) ?


Your DDR3 controller should be generating the DDR3's RESETn signal.  Not your testbench source file.  Your test bench should just make a short initial ~10-100ns system reset pulse, it's your FPGA controller code which should time and clean the DDR3's reset.  That is unless you can guarantee whatever hardware which is powering up and feeding you a reset signal will always be at minimum 200us after/longer than the FPGA has booted and that you will never need a software controlled reset of the DDR3 memory at any time.
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 28, 2021, 03:06:49 am
Quote
Your DDR3 controller should be generating the DDR3's RESETn signal.  Not your testbench source file.  Your test bench should just make a short initial ~10-100ns system reset pulse,

In my coding, there is input  resetn signal  and  an output reset_n which is for DDR
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 28, 2021, 06:55:48 am
As for tIs timing violation, it happened inside MR2 state in my coding which does not really makes any sense if compared to the following figure 48.

tIs requirement is about tXPR before MR2 state.

(https://i.imgur.com/JClPQ6G.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 28, 2021, 07:39:53 am
Do multiple searches in the data sheet.  Look at table 45 in the data sheet, tIS is the setup time for a command, IE the time the command on the address[]/ba[]/ras/cas/cke.. must be set prior to the rising of the positive clock CK going into the DDR3 ram chip. (READ the next ~4 pages including the entire page on where table 45 is located.  It tells you how your commands should look like to the DDR3 ram chip in relation to the CK & CK# clock inputs.  It is all about the tIS and tIH, it's not just the first command, but everything you send to the DDR3.)

Now, go back to your simulation.
zoom in, way in so that you only see around 2-4 nanoseconds from left to right.
Then double click in the console on one of the tIS violation errors so that the waveform view will automatically re-center to that time position so you may inspect the error.

Look at your rising clock edge and when you send the commands, are those IOs set to their valid figures before your DDR3 positive CK pin goes from low to high by at least by the tIS picoseconds listed on table 45 the datasheet?

Now, take a look at my screenshot of my power-up sequence and look at when my DDR3_CK pin goes high.  Look at my address[]/ba[]/ras/cas/cke...  Are all those IOs properly setup and steady before my DDR3_CK pin transitions from low to high by the picosecond specification listed in table 45?  Also, what about the tIH, the hold-time required after the clock transitions from low to high?

I've attached a closeup of my first MR2 so you can see the timing difference between my command & clock transition compared to yours.  (Note, my clock in the sim is actually 500MHz, but this has no bearing on your tIS problem.  Your timing should be far easier to achieve as you clock is slower...)
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 28, 2021, 09:24:18 am
I have zoomed into the range of 200ps around the timestep point where tIs is reported. See below:

I suppose the DDR command signals must remain stable (and not XXXX unknown) before ck posedge transition.
Let me try modifying the code and retry.

(https://i.imgur.com/j8rUCHS.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 28, 2021, 09:40:19 am
(READ the next ~4 pages including the entire page on where table 45 is located.  It tells you how your commands should look like to the DDR3 ram chip in relation to the CK & CK# clock inputs.  It is all about the tIS and tIH, it's not just the first command, but everything you send to the DDR3.)
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 29, 2021, 01:09:02 pm
I have solved both the 200us initial reset and tIs setup timing violation.
Now, I am having "illegal CAS latency = 4" error.

I tried changing CAS latency field value from 5 to 6 on MR0 (Mode Register 0) (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1151), however it still resulted in CAS latency = 4 as reported by Modelsim.

(https://i.imgur.com/LAuIsNh.png)

(https://i.imgur.com/FDfxqgv.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 29, 2021, 08:17:00 pm
Please organize your sim so that your DDR3 IOs are organized the same as in my illustration labeled (A).
Then expand your address bus like seen in my illustration labeled (B).
Then select the 'Error' and zoom in with enough room to see a just a bit to the left and right as in my illustration labeled (C).

Then provide a proper screenshot of the waveform.

EXTRA: It may also visually help if your waveform grid preferences to matche your DDR3 clock cycle.
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 30, 2021, 02:23:15 am
@BrianHg Please see below:

(https://i.imgur.com/A9GJHxP.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 30, 2021, 02:50:53 am
@promach Please see below:

(https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/?action=dlattach;attach=1223392)
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 30, 2021, 02:57:04 am
I also understand your concern, however load mode 0 (Mode register 0) should not happen just after the start of tXPR
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 30, 2021, 03:02:51 am
Yes, that too.
It is your code generating the Load Mode command right there...
Take a look at the command lines...  It's a load mode BA=0 right there in front of your eyes.
It was generated by your ram controller...
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 30, 2021, 04:03:10 am
Make an on-off parameter option to shrink your power-up reset time to 1us.  Recompile and simulate in Modelsim without waiting for those 700us to simulate only takes a second to render your waveform results instead of ~45 seconds for every shot.

Only turn on the full 700us power-up timer after you get everything else right.

Micron's DDR3 model will allow you to get away with this with only 2 warnings right at the beginning of your sim.
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 30, 2021, 06:44:00 am
Based on the calculation below, it seems that the Modelsim simulation waveform fulfills tXPR timing requirement ?

120000ps ÷ 3300ps/cycle = 36.363636 'ck' cycles
132111ps ÷ 3300ps/cycle = 40.033636 'ck' cycles

(https://i.imgur.com/70YG0lY.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 30, 2021, 07:39:16 am
tXPR timing violation is solved. I was using timing data for 1GB memory specification when I am simulating for 2GB memory.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 30, 2021, 08:04:25 am
That's 132ns is only valid for the 1GB DDR3.  Have you set Micron's model to 1GB, or a different ram size?  I've been using 4GB, so in my sim my controller chooses 270ns for tXPR.

IE: Line-
`define den4096Mb
or
`define den1024Mb

The other thing is your CSn output stays low.  That is a regular NOP operation.  Since you are transitioning from a CKE low to CKE high, you may need to use the 'Self refresh exit' style 'Device DESELECTED' prior to CKE going high.  This means the CSn may need to be high before CKE goes high.

Check table 70, command truth table.

Also see tCKSRX on power-up Figure 48 and Figure 94.  If needs to be 10ns or 5tCK minimum prior to CKE going from low to high.  Your CSn and the rest of the commands are already low up until the point where CKE goes from low to high instead of being high 5tCK clocks early as required by tCKSRX.

Figure 94 has a better view of tCKSRX.

Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 30, 2021, 08:05:55 am
tXPR timing violation is solved. I was using timing data for 1GB memory specification when I am simulating for 2GB memory.
Ok, well, I'll leave what I wrote above anyways as it is a good practice to get everything right in case other brands of DDR3 chips are more stringent than Micron's DDR3 model which let this one slip through as it may be allowed only during the first power-up, but not during the middle of operation.

Title: Re: DDR3 initialization sequence issue
Post by: promach on May 30, 2021, 09:45:49 am
As for tCKSRX , I do not really understand why need to issue SRE and SRX self-refresh commands during initialization sequence ?

By the way, why am I having the following tMRD violation at cursor 2 location instead of at either cursor 1 or 3 location ?

(https://i.imgur.com/JQ7AHJh.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 30, 2021, 04:10:05 pm
As for tCKSRX , I do not really understand why need to issue SRE and SRX self-refresh commands during initialization sequence ?
The figure is in the power-up sequence as well as enter/exit refresh as well as reset and 'power-down' entry and exit.  So, I choose to personally obey the setup of 5 tCK and never worry about any issues.
Quote
By the way, why am I having the following tMRD violation at cursor 2 location instead of at either cursor 1 or 3 location ?
Look at figure 49.  This is what you have to make the MR#s look like where you must follow tMRD. Not what you have.  Just like the power-up sequence has cycles missing between each MR# commands replaced with the wiggly line, that wiggly line by the tCKSRX in the power-up sequence figure also illustrates some missing pattern which is visible on the power-up / power-down, DLL enable/Disable figures which should be followed.
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 30, 2021, 04:38:31 pm
Quote
The figure is in the power-up sequence as well as enter/exit refresh as well as reset and 'power-down' entry and exit.  So, I choose to personally obey the setup of 5 tCK and never worry about any issues.

1) Is power-up sequence == initialization sequence (Figure 48) ?
2) enter/exit refresh == Figure 94 ?    'power-down' entry and exit == Figure 95 ?    reset == Figure 106 ?


As for tMRD violation, it is solved by issuing NOP commands in between two MRS commands.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 30, 2021, 10:11:34 pm
Ok, Ok, Ok,,, I can't believe I'm doing this.

Question: In the datasheet, the you have a '#' after the CS#, RAS#, CAS#, WE# and no '#' after the CKE.  What does that '#' mean?
Title: Re: DDR3 initialization sequence issue
Post by: promach on May 31, 2021, 02:24:05 am
As for Figure 48, tCKSRX is already fulfilled.
As for Figure 94, I have not implemented SRE and SRX commands yet, so not yet applicable to my coding.
As for Figure 95, I do not see tCKSRX timing though ?
As for Figure 106, it is almost similar to Figure 48 which means tCKSRX is also fulfilled.

As for # , it means active when asserted at logic low.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on May 31, 2021, 03:06:52 am
Ok, if the # = active low, then to disable those commands, the default power-up state should be high, right?

The main problem issue you had since you just got the DDR3 model working is that when powering up and doing nothing is that you kept these active low controls low.  Why not default them high & only pulse low for the 1 clock when you need to issue a command.  By that logic, you would have save a day or two of work and you output would meet all the datasheet's requirements.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 01, 2021, 06:25:03 am
Why tRP violation ?
I confirmed with datasheet for DDR3-1600-125, it is just 13.75ns which already is fulfilled by the waveform, note the time between 2 cursors

(https://i.imgur.com/klb4Q4P.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 01, 2021, 07:17:45 am
tRP is the time between a PRECHARGE and an ACTIVATE.  Where is your PRECHARGE after the write.  Or, if you are using AUTO-PRECHARGE, then you need to assume where the AUTO-PRECHARGE took place and call that the beginning time for the tRP.  Remember, you are ACTIVATING a new row in BANK 0 where you came from a previous row in BANK 0.  You need to PRECHARGE the old bank first.  Without the precharge, you may activate a different BANK where you did, but if you are going this route, make sure your ram controller can keep track of which banks are each individually doing what and when as you will occasionally need different precharges at different places.

Note that in my code, I'm use 100% manual precharge with smart recognition of what banks are doing what to allow random access across banks for flexibility.  If you want raw throughput speed, but no bank management, you may per-engineer your sequence for the optimal command placement to run non-stop through all the banks and close one behind the other knowing that your reads will always be in straight line instead of the optimal read/write scatter capability in my controller which was designed to adapt to X&Y graphics burst and copy/fill commands.  (IE, The raster width address spacing Y coordinates are each on the next adjacent bank.)

Take a look at my attached image.  (See figure 90)

I have an:
ACTIVATEh'0000,B#1        0ns
WRITE BL8                     +14ns
WRITE BL8                     +8ns  (2 consecutive WRITE BL8s)
time to last written byte  +20ns
PRECHARGE B#1            +16ns   this is tWR.  (If you are using AUTO-PRECHARGE, this is where it will be, though your not sending the command, it is happening internally in the DDR3.)
ACTIVATEh'0400,B#1      +14ns  this is tRP, time between AUTO-PRECHARGE to ACTIVATE.


This is completely different than a read.  The PRECHARGE is permitted even before the read data is returned. (See figure 72)  If you go the complex route like my controller, you will need to analyze the next instruction coming in before even the current one is going out to know if the precharge is needed, or there is and access in the same row or different bank to achieve the best possible command sequence minimizing wait states.


Your sim looks like you used the read command timing with AUTO-PRECHARGE to ACTIVATE for a write command.  Also, your write command appears to have way too many cycles.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 01, 2021, 07:30:27 am
Look at your Model output, it says the DDR3 did an auto-precharge and when.
Subtract that time from your activate command time and you get 13152ps, or 13.1ns.  That is too short.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 01, 2021, 07:53:14 am
Quote
Subtract that time from your activate command time and you get 13152ps, or 13.1ns.  That is too short.

How exactly did you subtract ?  Could you point to locations inside my waveform ?

By the way, I am using PREA command which means the DDR3 memory will precharge ALL banks
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 01, 2021, 08:31:08 am
It's in your simulation transcript, see: (I'm sure you can do the math...)

(https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/?action=dlattach;attach=1223979)

Double click on each message and your cursor will go there in the waveform.


I did not see a PREA command in your waveform.  If you send a PREA, you can set it to 1 bank or all banks.


Only precharge the banks you want to precharge.  PREA-ALL is special just for when you want to issue a refresh command next as all banks need to be precharged first before a refresh.  The only other situation you may use a precharge-all is if you do not plan on having different banks active, closing and switching 1 while leaving the other ones open.  A precharge-all releases all banks.

Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 01, 2021, 08:47:00 am
A complete ACT-WRITE_BL8-PRE-ACT.
I'm using manual PRE on my write command, so, my ram controller has to place the PRE command there as seen in the snapshot.

Ok, off to sleep...

Warning, do not use AUTO-PREA with the read & write commands if you intend to manually send the PREA command like I do.  If you are using the AUTO-PREA, you will still need to calculate it's hidden position to be able to tell when you can send an ACT.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 01, 2021, 03:49:35 pm
Quote
If you are using the AUTO-PREA, you will still need to calculate it's hidden position to be able to tell when you can send an ACT.

Could you give some hint on calculating the hidden position ?

I have already checked tWL , tWPRE , tBURST timing parameters, what did I miss ?

(https://i.imgur.com/UVJICYl.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 01, 2021, 08:08:56 pm
Time to PRE after a WRITE_BL8 -> AL+CWL+4+tWR   (See figure 90)

Time from that PRE to ACT -> tRP

Remember to round 'each' time up to the nearest CK clock.

4 is for a BL8, it would be a 2 when running a BC4. (See figure 91)
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 02, 2021, 02:02:24 am
Thanks for your advice in previous post.

The following waveform is a bit strange ...
By the way, why would the modelsim logs data=xxxx when there are actual dq bits ?

(https://i.imgur.com/aKMzHbc.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 02, 2021, 02:28:58 am
Thanks for your advice in previous post.

The following waveform is a bit strange ...
By the way, why would the modelsim logs data=xxxx when there are actual dq bits ?
Have you assigned the DQ[x:x] output to equal anything?
Also, what is your MASK[x:x] set to.  It needs to be low to write data into the byte location, otherwise the data will not be written.

High mask = mask out data, IE, no data written for that byte position...
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 02, 2021, 06:08:37 am
Thanks for the advice.

Before I proceed further to determine the location for falling transition of dm mask signals, may I know why am I having alternating {3, 0} dq bits  (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L301)?

(https://i.imgur.com/1hQWPue.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 02, 2021, 08:01:32 am
The DQM is in perfect synchronous timing as you write data.  Dead perfect, parallel.  It should be on the same DDR transmitter clocks.  The only difference is that you may keep those outputs enabled if you like.

When you have more than 8 bits on a DDR3, you get 2x DQM, 1 for the lower 8 bits and 1 for the upper 8 bits.

The idea is to give the ram controller a method of writing only 8 bits at a time, bytes, so that every time your CPU wants to write a byte instead of a full 16 or 32 or 64 bit word, it does not need to read the ram, modify the contents in the cpu, the write the new word back to the ram.

I think GDDR ram just has bit-for-bit data mask capabilities.  With maybe a second read port.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 02, 2021, 09:20:04 am
I was asking about the alternating {3, 0} dq bits which I also see in your modelsim waveform as well.

Let me check my memory controller verilog coding to see what is generating the repeating {3, 0} data pattern

Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 02, 2021, 09:41:16 am
In my sim, the continuous alternating {3,0} IE 2bits high, 2 bits low is not the DQM, it is the DQS, the strobe which I need to send out so that the ram knows when to latch the data I send on the DQ&DQM.  That strobe is in parallel with the CK clock and also has an DQS# negative differential signal not shown in my screenshot.

The DQM I have beginning disabled at 3, then going low to 0 for 8 words, then back to 3 to disable again shows the center of the valid data capture region.  Zooming in, you will see the perfect 90 degree phase offset my DQ output has compared to the DQS output.  Just like the DDR3 best-case scenario recommendation in the data sheet.


Title: Re: DDR3 initialization sequence issue
Post by: promach on June 02, 2021, 12:40:06 pm
Something is so peculiar with the alternating {3, 0} dq bits

The waveform should not have value of 0 for dq signal.

https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L827-L828

(https://i.imgur.com/QSmKAn0.png)
Title: Re: DDR3 initialization sequence issue
Post by: asmi on June 02, 2021, 01:27:54 pm
I commend BrianHG's patience, and I think you are abusing it. Why don't you actually go in and read the damn datasheet, instead of asking us to do that for you over and over again? All questions you asked so far in this thread can be easily answered by reading the documentation. For example, compare how write burst is supposed to look like (from the datasheet), to what it looks like in your case, and see if you can spot any difference.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 02, 2021, 01:53:55 pm
@asmi  in the previous post, I am trying to find out why when ldq_w and udq_w are carrying value of 2, 4 or 6, dq carry a value of 0 ?
Title: Re: DDR3 initialization sequence issue
Post by: asmi on June 02, 2021, 02:03:22 pm
@asmi  in the previous post, I am trying to find out why when ldq_w and udq_w are carrying value of 2, 4 or 6, dq carry a value of 0 ?
Why don't you find out first why your DQS starts immediately with the command, as opposed to when it should (after write latency - 1 cycle)?
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 02, 2021, 02:22:04 pm
@asmi I understand your concern about DQS preamble bit, however look carefully in the timing diagram by Micron, it is marked as DONT_CARE.

But let me try yoru suggestion to maybe get around the tRP violation for WRAP command.

However, this had nothing to with why DQ carries value of 0 when ldq_w is not carrying value of 0
Title: Re: DDR3 initialization sequence issue
Post by: asmi on June 02, 2021, 03:34:09 pm
Code: [Select]
`ifdef USE_x16
wire [(DQ_BITWIDTH >> 1)-1:0] ldq_w = data_to_ram;  // input data stream of 'data_to_ram' is NOT serialized
wire [(DQ_BITWIDTH >> 1)-1:0] udq_w = data_to_ram;  // input data stream of 'data_to_ram' is NOT serialized
assign dq_w = {udq_w, ldq_w};
`else
assign dq_w = data_to_ram;  // input data stream of 'data_to_ram' is NOT serialized
`endif
your ldq_w and udp_w are both referring to first 8 bits of data_to_ram vector.

Also why your dqs'es are so wide?
wire [DQS_BITWIDTH-1:0] dqs = {udqs, ldqs};
wire [DQS_BITWIDTH-1:0] dqs_n = {udqs_n, ldqs_n};

They are supposed to be 1 bit per byte, or 2 in case of x16 DDR3 memory you seem to be using.

Your code is extremely hard to read due to a ton of conditional directives, so it's hard to tell if a piece of code is actually used or not.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 02, 2021, 03:39:52 pm
@asmi  DQS_BITWIDTH and DQ_BITWIDTH are two different parameter.  Look carefully
Title: Re: DDR3 initialization sequence issue
Post by: asmi on June 02, 2021, 04:01:35 pm
@asmi  DQS_BITWIDTH and DQ_BITWIDTH are two different parameter.  Look carefully
My first comment about dq still stands. Both ldq and udq are assigned the same values, I don't know if it's intentional, but something tells me it's not.
Also please show us the waveform that is coming into the actual memory, as opposed to what comes out of your controller. These two are not necessarily the same.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 02, 2021, 04:47:20 pm
Just so we are clear, the absolute bottom section of my sim, the nodes under 'DDR3_PHY DataPath' is not what I'm transmitting to the actual DQS/DQ IO pins just above.  Those nodes are Altera's DDR input buffer reading the IO pins values above and separating it into the low and high 2x wide, 1/2 speed path.  The values I send into Altera's DDR output pin driver are hidden/not visible in my sim screenshot.  However, it is actually what you see coming out, flipped upside-down, 2 CKs earlier with the 90 degree shift.

Now as for the 'Don't Care' in the datasheet.  Ok, Ok, Ok...  Another lesson in I can't believe I'm saying this:

Looking at the datasheet and seeing 'Don't Care' is not permission to generate sloppy loose code and ugly waveforms.  It is there for a reason.  The DDR3 allows you to share / interleave multiple (not just 2, but even 4 or more) DDR3 rams on the exact same buss.  This is why you have a CS# and it should only go low for the 1 clock you issue a command as multiple CS# coming out of your FPGA would each go to a different DDR3 chip to allow individual commanding of each chip sharing every single other wire/IO.  The 'Don't Care' positions is the free area where you can control the other ram chips as you plan their commands so the their data buss access each fit in that spare space.  This means your code should be sharp and fit everything in it's place where it belongs exactly as what's in the datasheet, do not bleed into the 'Don't Care' zones.

Ok, it looks as if you have enough understanding of how to use the Sim & DDR3 model as well as debug it's given violations to you.  I have some finishing cleanup touches on work I'm doing here as I want to publish my public domain code within a few days.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 04, 2021, 11:57:05 am
From the modelsim console log, it seems that the data loopback(readback) works.

However, how to simulate "inout" dqs (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L845-L847) signals correctly inside modelsim ?
Note: I tried changing 1'b0 to 1'bz  , but it does not help.

(https://i.imgur.com/Iu930Af.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 04, 2021, 09:52:49 pm
For tristate, you need to assign a 'z' for every wire in the bus.
1'bz  will only make 1 wire, bit io_pins[0:0] in the bus tristate, while the rest, io_pins[bus_size-1:1] will be undefined. IE 'x'  IO error conflicts on a shared tristate bus I believe are an 'X'.  Read the modelsim manual about the waveform legend and what symbols mean what.

Properly done non-conflicting tristate in modelsim should show a 'BLUE' trace in the middle and when you place the cursor on it, it should read 'zzzzzzzzzzz'...

Title: Re: DDR3 initialization sequence issue
Post by: promach on June 05, 2021, 02:06:55 am
I tried your suggestion and there are no more XXXX for dqs, but why Micron DDR3 simulation model does not assert the dqs signals during RDAP command ?

https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L847

(https://i.imgur.com/9xvEaeo.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 05, 2021, 02:39:38 am
Code: [Select]
assign dqs = {udqs, ldqs};
assign dqs_n = {udqs_n, ldqs_n};
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 05, 2021, 02:58:58 am
Yes, but in your sim, you need to show the 'PINS' in between you DDR3 controller and Micron's ram Model.

Mine seems to work, see:
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 05, 2021, 03:16:49 am
Here you go:

(https://i.imgur.com/1eXhiRv.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 05, 2021, 03:39:46 am
What do the pin definitions look like in your test-bench top?

Are they an 'inout'?

This is my TB:
Code: [Select]
// ********** Results from DDR3_PHY_SEQ.
inout  logic                       DDR3_RESET_n;  // DDR3 RESET# input pin.
inout  logic [DDR3_NUM_CK-1:0]     DDR3_CK_p;     // DDR3_CK ****************** YOU MUST SET THIS IO TO A DIFFERENTIAL LVDS or LVDS_E_3R
inout  logic [DDR3_NUM_CK-1:0]     DDR3_CK_n;     // DDR3_CK ****************** YOU MUST SET THIS IO TO A DIFFERENTIAL LVDS or LVDS_E_3R
                                                  // ************************** port to generate the negative DDR3_CK# output.
                                                  // ************************** Generate an additional DDR_CK_p pair for every DDR3 ram chip.

inout  logic                       DDR3_CKE;      // DDR3 CKE

inout  logic                       DDR3_CS_n;     // DDR3 CS#
inout  logic                       DDR3_RAS_n;    // DDR3 RAS#
inout  logic                       DDR3_CAS_n;    // DDR3 CAS#
inout  logic                       DDR3_WE_n;     // DDR3 WE#
inout  logic                       DDR3_ODT;      // DDR3 ODT

inout  logic [DDR3_WIDTH_ADDR-1:0] DDR3_A;        // DDR3 multiplexed address input bus
inout  logic [DDR3_WIDTH_BANK-1:0] DDR3_BA;       // DDR3 Bank select
inout  logic [DDR3_WIDTH_DM-1  :0] DDR3_DM;       // DDR3 Write data mask. DDR3_DM[0] drives write DQ[7:0], DDR3_DM[1] drives write DQ[15:8]...
                                                  // ***on x8 devices, the TDQS is not used and should be connected to GND or an IO set to GND.

inout  logic [DDR3_WIDTH_DQ-1:0]   DDR3_DQ;       // DDR3 DQ data IO bus.
inout  logic [DDR3_WIDTH_DQS-1:0]  DDR3_DQS_p;    // DDR3 DQS ********* IOs. DQS[0] drives DQ[7:0], DQS[1] drives DQ[15:8], DQS[2] drives DQ[23:16]...
inout  logic [DDR3_WIDTH_DQS-1:0]  DDR3_DQS_n;    // DDR3 DQS ********* IOs. DQS[0] drives DQ[7:0], DQS[1] drives DQ[15:8], DQS[2] drives DQ[23:16]...
                                                  // ****************** YOU MUST SET THIS IO TO A DIFFERENTIAL LVDS or LVDS_E_3R
                                                  // ****************** port to generate the negative DDR3_DQS# IO.

Title: Re: DDR3 initialization sequence issue
Post by: promach on June 05, 2021, 03:48:44 am
See https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L184-L187

Code: [Select]
inout ldqs, // lower byte data strobe
inout ldqs_n,
inout udqs, // upper byte data strobe
inout udqs_n
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 05, 2021, 04:01:27 am
That's your controller.  What about your testbench source and it's wiring to Mircon's ddr3.v?
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 05, 2021, 04:04:11 am
I had also done the same for testbench, see https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L125-L128 which I had also already used inout for the dqs signals

So, something else is wrong, maybe with the mode registers settings ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 05, 2021, 04:16:11 am
I don't know.  It should work.  It seems to be outputting data.  I usually see both coming up together with a preamble on the DQS.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 05, 2021, 05:59:39 am
Ok, the following waveform shows that Micron DDR3 simulation model outpus the dqs signals correctly.

So, this line of coding https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L845-L847 for udqs is still wrong

Code: [Select]
assign udqs = ((main_state == STATE_WRITE) || (main_state == STATE_WRITE_AP) ||
   (main_state == STATE_WRITE_DATA)) ?
udqs_w : {(DQS_BITWIDTH >> 1){1'bz}};  // dqs value of 1'bz is for input

(https://i.imgur.com/2WBibLZ.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 05, 2021, 06:38:37 am
Query in the sim the value of DQS_BITWIDTH .
Why are you dividing it by 2?
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 05, 2021, 06:45:32 am
you mean   (DQS_BITWIDTH >> 1)    ?

Because of https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L191

dqs consists of udqs and ldqs
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 05, 2021, 07:56:39 am
I tried changing to the following, but still does not change anything.

Code: [Select]
wire [DQS_BITWIDTH-1:0] dqs
= ((main_state == STATE_WRITE) || (main_state == STATE_WRITE_AP) ||
       (main_state == STATE_WRITE_DATA)) ?
{udqs, ldqs} : {DQS_BITWIDTH{1'bz}};  // dqs value of 1'bz is for input

wire [DQS_BITWIDTH-1:0] dqs_n
= ((main_state == STATE_WRITE) || (main_state == STATE_WRITE_AP) ||
       (main_state == STATE_WRITE_DATA)) ?
{udqs_n, ldqs_n} : {DQS_BITWIDTH{1'bz}};  // dqs value of 1'bz is for input
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 05, 2021, 09:00:21 am
But you need to read the data from the wires DQS & DQS_n, not ldqs/udqs.  Those are buried before the INOUT port.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 05, 2021, 09:05:28 am
So, how should I modify https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L191 ?

Code: [Select]
wire [DQS_BITWIDTH-1:0] dqs = {udqs, ldqs};
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 05, 2021, 09:54:12 am
Issue resolved, see https://github.com/promach/DDR/commit/5724b1018d1fb6274c5afe1c3ce2487a9d20e60b
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 07, 2021, 09:06:41 am
I have finished simulating the Micron DDR3 controller,

the DDR3 schematics and verilog code are located at https://github.com/promach/DDR (https://github.com/promach/DDR)

However, I have concern on implementing it on the Spartan-6_XC6SLX16_FTG256 FPGA.

Besides ODDR2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=224) , what other primitives do I need in this case ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 08, 2021, 12:33:40 pm
Besides ODDR2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=224) , what other primitives do I need in this case ?

If your design can run at the DDR3 clock speed then ODDR, IDDR and IDELAY is all you need (not counting clock generation).

If not, you'll need OSERDES and ISERDES for IO.

Title: Re: DDR3 initialization sequence issue
Post by: promach on June 09, 2021, 02:26:37 am
For IODELAY2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=132),  how to generate 90 degree phase shift on the incoming DQ data bits using integer range from 0 to 255 for the IDELAY_VALUE attribute ?

(https://i.imgur.com/rmDY3rY.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 09, 2021, 03:09:04 am
It is complicated. Xilinx has an app note which explains how to use SERDES and calibrate delays in Spartan-6: xapp1064.

Title: Re: DDR3 initialization sequence issue
Post by: promach on June 09, 2021, 06:23:05 am
In xapp1064 (https://www.xilinx.com/support/documentation/application_notes/xapp1064.pdf#page=4) ,

1) Why there is a master and a slave ?

2) Should I do DDR Data Reception using methods in Figure 5 or Figure 6 ?


(https://i.imgur.com/bwKR6kt.png)

(https://i.imgur.com/VTz519X.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 09, 2021, 01:51:58 pm
ug381 explains these. Masters are associated with P pins. Slaves are associated with N pins. For differential signals they work together.

I don't think you can use DQS strobes to run PLL.

DQS may be able to work as a clock, but you would need to transfer the results to a continuous clock domain somehow.

Spartan-6 has MCBs - special hardware blocks for DDR memory. I don't know what they do and how they work. But because they exist, doing it on your own is not the mainstream approach. Therefore you need to be creative.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 09, 2021, 03:07:54 pm
It seems that https://www.xilinx.com/support/documentation/white_papers/wp249.pdf#page=5 (https://www.xilinx.com/support/documentation/white_papers/wp249.pdf#page=5) explains better.  Could you comment about the DPA state machine to handle dynamic skew/jitter ?

As for IODELAY2 primitive, I have some confusion on the delay calculation equation (https://www.xilinx.com/support/answers/35783.html).

I did some calculation using TTAP8 and n=7 (https://www.xilinx.com/support/documentation/data_sheets/ds162.pdf#page=47) , but could not get the result of 13.57nS

Besides, how shall I make use of the bitslip function (https://www.xilinx.com/support/documentation/application_notes/xapp1208-bitslip-logic.pdf) existed in ISERDES primitive ?



Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 09, 2021, 04:00:44 pm
It seems that https://www.xilinx.com/support/documentation/white_papers/wp249.pdf#page=5 (https://www.xilinx.com/support/documentation/white_papers/wp249.pdf#page=5) explains better.  Could you comment about the DPA state machine to handle dynamic skew/jitter ?

DQ lines are synchronized to DQS by design. They will take different paths within FPGA, so you need to find an appropriate delay. Once you find the appropriate delay, they're very unlikely to dessynchronize because they move in the same direction and therefore temperature and voltage variations will affect them both roughly the same.

The relationship between DQS and your internal clock will depend on the round-trip delay, so it will vary. I noticed roughly 50-100 ps variations with SODIMM as the chip gets warmer. But the synchronization between them doesn't need to be perfect.

IMHO, if you find good delay values once, you can use them indefinitely. So, you can do the calibration once during startup, or you can do it separately and remember the suitable delay values somewhere.

As for IODELAY2 primitive, I have some confusion on the delay calculation equation (https://www.xilinx.com/support/answers/35783.html).

I did some calculation using TTAP8 and n=7 (https://www.xilinx.com/support/documentation/data_sheets/ds162.pdf#page=47) , but could not get the result of 13.57nS

Looks correct to me:

424/8 * 256 = 13.57 ns

Besides, how shall I make use of the bitslip function (https://www.xilinx.com/support/documentation/application_notes/xapp1208-bitslip-logic.pdf) existed in ISERDES primitive ?

Bitslip is used when the data you receive is off by one bit (or several bits). I don't think you need it for DDR3 because you know precisely when the transmission starts, so you simply bring the ISERDES out of reset at the exact moment.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 09, 2021, 09:05:37 pm
Bitslip is used when the data you receive is off by one bit (or several bits). I don't think you need it for DDR3 because you know precisely when the transmission starts, so you simply bring the ISERDES out of reset at the exact moment.

My DDR3 controller read-logic operates on a separate adjustable PLL phase, using the DQS coming from the DDR3s to re-align and determine the starting byte position based on the preamble and continuous 0101 pattern.  I have an allowable adjustable window region of +/- 1 CK.  The reason I need that range is the 'precise-start' is not guaranteed based on the length of tracks between DDR3 and FPGA, plus the power-up tuning of my reference RDQ_CK sampling PLL.

The story is different you are using the DQS input as your actual capture clock.

Pros and cons: Using DQS as your read clock may more easily allow higher speed transfers above 400MHz, but your FPGA will require dedicated DQS circuitry tied to dedicated DQ lanes.

Sampling the DQS in parallel with DQ, the method I am using, relegates me to slower speeds, or having 1-4 DDR ram on PCB, or 1 SODIM module, but, I can now run DDR3 on Cyclone III/IV which only have a single unbalanced DQS IO since I can access the DDR3 DQS line as regular DQ lines in software simulated differential.  The newer Cyclone V,10,MAX 10 have differential DQS lines designed for DDR3&4.  My solution also works on these FPGAs as well, no change in code...
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 10, 2021, 01:27:43 am
Quote
I have an allowable adjustable window region of +/- 1 CK.  The reason I need that range is the 'precise-start' is not guaranteed based on the length of tracks between DDR3 and FPGA, plus the power-up tuning of my reference RDQ_CK sampling PLL.

The story is different you are using the DQS input as your actual capture clock.

What do you mean by RDQ_CK sampling PLL ?  Are you not using DQS as your actual capture clock ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 10, 2021, 02:59:50 am
Quote
I have an allowable adjustable window region of +/- 1 CK.  The reason I need that range is the 'precise-start' is not guaranteed based on the length of tracks between DDR3 and FPGA, plus the power-up tuning of my reference RDQ_CK sampling PLL.

The story is different you are using the DQS input as your actual capture clock.

What do you mean by RDQ_CK sampling PLL ?  Are you not using DQS as your actual capture clock ?
No.  What is going on is since the read DQS comes in parallel with the DQ and my outgoing CK clock,  I have a 1 tunable PLL output tuned to the optimum read phase using the MPR System Read Calibration (see figure 59) during power-up.  With this setup, your chosen FPGA does not require the use of dedicated DQS circuitry, just the use of any DDR IO for the DQS pins as well as DQS pins still being compatible.  (Even DQ Groups may be ignored, however, I still recommend wiring them properly)  The con is that I need 3 PLL outputs to run my system.  1 for CK and logic, 1 at 90 degree phase for generating the writing DQ & DQM outputs, and 1 tunable PLL output for the read sampling.  So long as your FPGA has simple DDR or SERDES IOs can handle at least 600mbps, my controller will work on hardware and properly simulate with any FPGA vendor's DDR IOBUF ip without resorting to special simulation bypass code.

The only con is that the length of your CK, DQS, DQ & DQM need to be matched.  This means 1 or 2 DDR3 ICs.  You can get away with 4 wired in 1 row (so long as the CK is routed to the middle of the ram chips), use a laptop SODIM module, with 4 on top, 4 underneath with the CK routed from the center, single or dual rank.  My controller can also output multiple CK pairs if you want support for more DDR3, or to place 2 DDR3 on one side of the FPGA and another 2 on the other side.

Without write levelization, I cannot guarantee single or multiple PC memory memory modules with 8 or 16 ram chips.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 10, 2021, 03:43:45 am
Since you have done MPR calibration during initial power-up, may I know why would you need "1 tunable PLL output for the read sampling" ?

May I also know how do you phase shift your incoming DQ data bits such that it is sampled at its middle by incoming DQS strobe ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 10, 2021, 04:11:05 am
Since you have done MPR calibration during initial power-up, may I know why would you need "1 tunable PLL output for the read sampling" ?

May I also know how do you phase shift your incoming DQ data bits such that it is sampled at its middle by incoming DQS strobe ?
I am not using the DQS strobe as a clock for sampling the DQ.  I'm using the DQS inputs as a 'data_enable' DDR input where the 'preamble' is used as a sync/reset read buffer position.  Remember, when reading data, the DQS is in perfect sync with the read DQ.

The tunable PLL output clock goes to the 'input clock' for the DQ & DQS DDR input buffers and subsequent read data FIFO's input clock.

The PLL has the following optional inputs:

Phase_select,                (Selects which one of the many outputs of a single PLL which you may wish to adjust the phase)
Phase_step_enable,       (Steps the selected PLL output's phase by 1/16th to 1/64th of the PLL's reference clock output, IE 64 steps will shift the selected output by a perfect 360 degrees.)
Phase_direction,          ( Step left or right.)

I'm only using 1 PLL, it just that I have 3 outputs enabled and I am adjusting the phase of clk #2 while I set the parameter that clk #1 is at 90 degrees and clk #0 is my reference 0 degree and it's the system clock.  (So far, the power-up default phase 0 has always been chosen.  I doubt it would move until you have the memory a few inches away from the FPGA, or you are going through a connector to a memory module.  Then, I only expect the phase to move by 1-2 steps to the right.)

All Altera, Xilinx, & Lattice PLL have the same feature to step adjust in real time each of their PLLs multiple outputs individually with just 3 control signals.

My DDR3 controller (coming soon) is fully vetted and fully functional on real hardware.  It's currently running on Arrow's 37$ DECA board seen here: https://www.eevblog.com/forum/fpga/arrow-deca-max-10-board-for-$37/msg3453256/#msg3453256 (https://www.eevblog.com/forum/fpga/arrow-deca-max-10-board-for-$37/msg3453256/#msg3453256)

For playing, it is well worth the 37$ as it has so much on it including a 150$ MAX 10 FPGA with 512MB DDR3 ram, and a shit load of peripherals like Ethernet and HDMI, with demo code for running each.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 10, 2021, 06:58:40 am
Quote
The tunable PLL output clock goes to the 'input clock' for the DQ & DQS DDR input buffers and subsequent read data FIFO's input clock.

The PLL has the following optional inputs:

Phase_select,                (Selects which one of the many outputs of a single PLL which you may wish to adjust the phase)
Phase_step_enable,       (Steps the selected PLL output's phase by 1/16th to 1/64th of the PLL's reference clock output, IE 64 steps will shift the selected output by a perfect 360 degrees.)
Phase_direction,          ( Step left or right.)

The issue is that Xilinx ISE clock wizard coregen for PLL (https://www.xilinx.com/support/documentation/ip_documentation/clk_wiz/v3_5/clk_wiz_gsg521.pdf) does not actually have the capability to do the above quoted phase stepping.  Please correct me if wrong.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 10, 2021, 07:15:17 am
Ever hear of Google?

Anyways, see here:
https://www.google.com/search?client=firefox-b-e&q=Xilinx+ISE+PLL+clock+phase+stepping (https://www.google.com/search?client=firefox-b-e&q=Xilinx+ISE+PLL+clock+phase+stepping)

Follow the links...
Get to the document XAPP888....
Go to table 18 as an example.
The clock wizard probably can generate for you a default set of settings and you just address the controls you want to dynamically change.

The difference between Xilinx and Altera is for Xilinx, you provide an integer number for the divide from the main PLL oscillator frequency and an integer for the phase and duty cycle instead of a step & direction.  (Though, Altera also has full PLL reconfiguration which boils down to these same controls...)  Lattice also re-configures just like Xilinx, just different address locations for the settings and different integers.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 10, 2021, 07:20:38 am
XAPP888 had removed support for Xilinx ISE since year 2014.

I had also confirmed that MMCM IP is not available inside ISE coregen.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 10, 2021, 07:29:28 am
What about the following inputs for the PLL core:

Code: [Select]
PSCLK    Input  Dynamic Phase Shift Clock: Clock for use in dynamic phase shifting.
PSEN     Input  Dynamic Phase Shift Enable: Starts a dynamic phase shift transaction.
PSINCDEC Input  Dynamic Phase Shift increment/decrement:When ’1’; increments the phase shift of the output clock, when ’0’, decrements the phase shift.
PSDONE   Output Dynamic Phase Shift Done: Completes a dynamic phase shift transaction.

(LOL, Dead identical to Altera other than the 'exact' name of each IO port...)

I don't think Xilinx would remove a fundamental feature of any PLL technology.
Read the latest user-guides on the their spartan 6 / 7 PLL's functions.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 10, 2021, 07:34:25 am
Did you try enabling 'Dynamic Phase Shift' in the clock generation tool? ? ?
It's on the FIRST PAGE OF THE SETUP WIZARD!

The IOs I mentioned are right there in your documentation...

Time to read...
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 10, 2021, 07:38:04 am
I enabled Dynamic Phase Shift Ports (https://www.xilinx.com/support/documentation/ip_documentation/clk_wiz/v3_5/clk_wiz_gsg521.pdf#page=10) inside ISE clocking wizard coregen, however I have the following issues about unsupported frequencies marked as XXX in the table:

(https://i.imgur.com/Rjr3bZw.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 10, 2021, 07:41:52 am
305 doesn't divide evenly into your source clock, that is unless you are feeding the PLL a 305MHz crystal.
Try 400MHz, or 350MHz, or 300MHz, or 500MHz.  (All easily compatible with a 50MHz source clock)

You need to read on the limitations of the PLL.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 10, 2021, 07:59:22 am
100MHz, 200MHz, 300MHz, 400MHz, 500MHz are all not supported when Dynamic Phase Shift is enabled.

Note: I am using 50MHz source clock
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 10, 2021, 08:00:30 am
You will need to ask a Xilinx user why.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 11, 2021, 02:06:25 am
Someone told me that dynamic phase shift and clock multiplication (DFS) are mutual exclusive DCM options.

(https://i.imgur.com/qL8ZgR7.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 11, 2021, 08:43:51 am
@NorthGuy

Quote
Bitslip is used when the data you receive is off by one bit (or several bits). I don't think you need it for DDR3 because you know precisely when the transmission starts, so you simply bring the ISERDES out of reset at the exact moment.

Do I really need PLL if I already had ISERDES ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 11, 2021, 03:01:21 pm
@NorthGuy

Quote
Bitslip is used when the data you receive is off by one bit (or several bits). I don't think you need it for DDR3 because you know precisely when the transmission starts, so you simply bring the ISERDES out of reset at the exact moment.

Do I really need PLL if I already had ISERDES ?

ISERDES will need a clock. You can either bring an external clock from DQS and route it through BUFIO, or generate it with PLL/DCM and rounte it through internal clock buffers.

You will need to adjust the clock phase somehow.

For reading, you can use DQS to sample DQs (the canonical way), or you can use a clock generated by DCM (the BrianHG's way). Either way, you need to produce a phase shift.

The phase shift may be accomplished by IODELAY or by DCM.

For DQS, you can apply IODELAY to the DQS input. Obviously, you cannot use DCM to phase shift DQS.

For BrianHG's method, you certainly can use DCM phase shift mechanism. Routing the clock through IODELAY is another possibility. I don't know if you can route the internal clock through IODELAY with Sparatn-6. I know that you can do this with 7-series.

You do not need to calibrate dynamically. You can calibrate once, then hard-code the phase shift/delay into your design. This will work for one board only - moving to a different board will require re-calibration. But you can have a separate design for calibration only. Such design will produce the numbers which you plug into your main design.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 11, 2021, 03:14:04 pm
For page 5 and 17 of https://www.xilinx.com/support/documentation/application_notes/xapp1064.pdf (https://www.xilinx.com/support/documentation/application_notes/xapp1064.pdf) , I have few questions.

1) In https://github.com/mithro/soft-utmi/blob/master/hdl/third_party/XAPP1064-serdes-macros/Verilog_Source/Macros/serdes_1_to_n_data_ddr_s8_diff.v#L216-L256 (https://github.com/mithro/soft-utmi/blob/master/hdl/third_party/XAPP1064-serdes-macros/Verilog_Source/Macros/serdes_1_to_n_data_ddr_s8_diff.v#L216-L256) and Figure 6, may I know how does this phase detection state machine works ? 
and which portion of the rest of the code belongs to calibration state machine ?

2) For figure 18, what do "USE_DOUBLER=TRUE"  and  "I_INVERT=TRUE"  mean ?  What is the purpose of the dotted line labelled as "Serdes Strobe" ?

3) For Figure 6, how does the signal "User BITSLIP" (https://www.xilinx.com/support/documentation/application_notes/xapp1208-bitslip-logic.pdf) work for data reception ?  In the upper block, why is "Master IDELAY" connected to two-inputs "BUFIO2_2CLK"  ?

4) How do I turn on DDR mode for ISERDES primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=136) ?  and how is this different from IDDR primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=123) ?

(https://i.imgur.com/0GQoacK.png)

(https://i.imgur.com/ruJWNxA.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 11, 2021, 04:54:09 pm
1) In https://github.com/mithro/soft-utmi/blob/master/hdl/third_party/XAPP1064-serdes-macros/Verilog_Source/Macros/serdes_1_to_n_data_ddr_s8_diff.v#L216-L256 (https://github.com/mithro/soft-utmi/blob/master/hdl/third_party/XAPP1064-serdes-macros/Verilog_Source/Macros/serdes_1_to_n_data_ddr_s8_diff.v#L216-L256) and Figure 6, may I know how does this phase detection state machine works ? 
and which portion of the rest of the code belongs to calibration state machine ?

I don't know the details. You probably can figure this out from the source code. Also there's a huge chapter "Phase Detector Calibration Mechanisms" in ug381.

2) For figure 18, what do "USE_DOUBLER=TRUE"  and  "I_INVERT=TRUE"  mean ?  What is the purpose of the dotted line labelled as "Serdes Strobe" ?

USE_DOUBLER and I_INVERT are attributes of the BUFIO2 primitive. You can read about BUFIO2 in ug382.

SERDESSTROBE is an output from the clock buffer which produces a 1-clock-wide strobe for ISERDES which allows ISERDES to distingush where is the boundary between serial words. It feeds to IOCE input of the SERDES

3) For Figure 6, how does the signal "User BITSLIP" (https://www.xilinx.com/support/documentation/application_notes/xapp1208-bitslip-logic.pdf) work for data reception ?  In the upper block, why is "Master IDELAY" connected to two-inputs "BUFIO2_2CLK"  ?

You can produce a pulse on BITSLIP. This will shift outputs of ISERDES by 1 bit. So, you can keep doing this until you get the perfect alignment.

BUFIO2_2CLK is the same as BUFIO2, only differential - two inputs are non-inverted and inverted clocks.

4) How do I turn on DDR mode for ISERDES primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=136) ?  and how is this different from IDDR primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=123) ?

Set the DATA_RATE attribute of the ISERDES primitive to "DDR". You also will need to supply 2 clocks to it CLK0 and CLK1 (for rising and falling edges respectively).

IDDR is a simpler primitive - it doesn't have CLKDIV, IOCE etc.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 12, 2021, 01:11:40 am
@NorthGuy Thanks for your technical explanation.

I need some time to digest your reply.

By the way, I have found the calibration state machine coding at https://github.com/mithro/soft-utmi/blob/master/hdl/third_party/XAPP1064-serdes-macros/Verilog_Source/Macros/serdes_1_to_n_data_ddr_s8_diff.v#L141-L204

I might need to consult @mithro about the state machines mechanism if I could not fully understand the xilinx documents
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 13, 2021, 04:14:05 am
Quote
BUFIO2_2CLK is the same as BUFIO2, only differential - two inputs are non-inverted and inverted clocks.

@NorthGuy

For Figure 6, can I replace the BUFIO2_2CLK primitive with just a BUFIO2 primitive ?  If not, why ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 13, 2021, 02:54:15 pm
Quote
BUFIO2_2CLK is the same as BUFIO2, only differential - two inputs are non-inverted and inverted clocks.

@NorthGuy

For Figure 6, can I replace the BUFIO2_2CLK primitive with just a BUFIO2 primitive ?  If not, why ?

I don't know. There must be a mechanism which keeps both the delays the same. I suspect BUFIO2_2CLK is necessary for this purpose. Once you understand how the calibration works, you will know if it is possible to replace it with BUFIO2.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 16, 2021, 04:54:24 am
DDR's DQ signals are already in parallel, what is the purpose of using IOSERDES primitives then for series-to-parallel (https://github.com/mithro/soft-utmi/blob/master/hdl/third_party/XAPP1064-serdes-macros/Verilog_Source/Macros/serdes_1_to_n_data_ddr_s8_diff.v) and parallel-to-series (https://github.com/mithro/soft-utmi/blob/master/hdl/third_party/XAPP1064-serdes-macros/Verilog_Source/Macros/serdes_n_to_1_ddr_s8_diff.v) bits conversion ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 16, 2021, 07:39:05 am
Someone told me that I could send a write of 8w bits to the memory controller, and the memory controller issues 8 writes of w bits to the memory, where w is the data width of your memory interface.

This literally means SERDES_RATIO=8   if I interpret the mechanism correctly. 
Please correct me if wrong.

By the way, any idea how this https://github.com/promach/DDR/blob/main/phase_detector.v does phase detection work ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 16, 2021, 02:12:23 pm
Someone told me that I could send a write of 8w bits to the memory controller, and the memory controller issues 8 writes of w bits to the memory, where w is the data width of your memory interface.

This literally means SERDES_RATIO=8   if I interpret the mechanism correctly. 
Please correct me if wrong.

THe transmission is in bursts. Full burst is 8-bit. You also can do half-burst - 4-bit - but there's no performance gain when you do this.

This means that serdes ratio of 4 is the most convenient. 8 is the best if you don't do half-bursts. Although, nothing precludes you from  using any other ratio.

The ratio dictates the speed of the clock which you use to communicate with SERDES. For example, if your DDR3 clock is 600 MHz, then 4:1 SERDES will dictate 300 MHz clock. 8:1 SERDES will dictate 150 MHz clock.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 17, 2021, 01:57:01 pm
What is the actual purpose (https://www.xilinx.com/support/documentation/application_notes/xapp1064.pdf#page=17) of the serdesstrobeb signal which is the B serdes strobe from BUFIO2 (https://github.com/promach/DDR/blob/main/clock_generator_ddr_s8_diff.v#L67) ?

(https://i.imgur.com/KRxVbPj.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 17, 2021, 04:18:28 pm
What is the actual purpose (https://www.xilinx.com/support/documentation/application_notes/xapp1064.pdf#page=17) of the serdesstrobeb signal which is the B serdes strobe from BUFIO2 (https://github.com/promach/DDR/blob/main/clock_generator_ddr_s8_diff.v#L67) ?

SERDESSTROBE is a CE signal to be used as IOCE pin in the ISERDES. ug381 should have a schematics of ISERDES which shows how it's used.

When you produce clock for ISERDES with BUFIO2 (or other clock buffers), they produce SERDESSTROBE for you which you connect along with the clock to the ISERDES.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 18, 2021, 01:44:04 am
Let me rephrase my question:

Do I need serdesstrobeb when serdesstrobea is available ?
In other words, is serdesstrobeb == serdesstrobea ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 18, 2021, 01:57:20 pm
1) Do anyone know how this https://github.com/promach/DDR/blob/main/phase_detector.v (https://github.com/promach/DDR/blob/main/phase_detector.v) works internally ?

2) What does it mean by MAX in Figure 9 of XAPP1064 appnote (https://www.xilinx.com/support/documentation/application_notes/xapp1064.pdf#page=9) ?

3) Could anyone explain what it means by Early Data Sampling and Late Data Sampling  (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=88)?

4) As for why is it 5 bits wide for pdcounter , someone told me that the verilog code only supports 32 (which is equivalent to 25) steps, but that is 1/8 of the total possible delay steps (256 delay taps) ?

(https://i.imgur.com/m5Bzjey.png)

(https://i.imgur.com/UuQziSQ.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 18, 2021, 03:25:12 pm
Let me rephrase my question:

Do I need serdesstrobeb when serdesstrobea is available ?
In other words, is serdesstrobeb == serdesstrobea ?

The code on your picture seems to produce two clocks - A and B. Each has associsated SERDESSTROBE - A and B, so they're not the same.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 18, 2021, 04:27:17 pm
1) Do anyone know how this https://github.com/promach/DDR/blob/main/phase_detector.v (https://github.com/promach/DDR/blob/main/phase_detector.v) works internally ?

I haven't looked code. I can only comment on the general principle.

Master and slave data are delayed with an offset and sampled with the same clock. When there's an edge in data the VALID signal produces a pulse and INCDEC is set to '1' or '0' depending on the master and slave samples being different or the same. The code is supposed to aggregate these readings. If there is a skew (that is INCDEC is either always '1' or '0') most of the time, your code is supposed to correct the delay.

The diagram you posted is supposed to be read from top to bottom - the top two lines produce the same reading for both master and slave thus INCDEC is set to '0'. The delay is then decremented which shifts data relative to the clock - look at the second set of lines. But the master and slave readings are still the same so INCDEC is still '0'. So, they decrement the delay again - look at the third set of lines. Still the same, so they do another decrement - look at the forth (last) set of the lines. This time, master and slave readings are different, so INCDEC is now '1'. Therefore they increment the delay by '1' going back to the third set of lines. This continues indefinitely keeping the data aligned to the clock.

The pdcount simply adds some stability. Instead of doing delay increments/decrements immediately, they accumulate INCDEC values with the pdcount register. If the INCDEC values are mixed (meaning alignment is close to optimum), pdcount varies around "10000". If ones start to prevail (pdcount goes to "11111") and they do the increment. If zeroes start to prevail, pdcount moves to "00000"  and they do the decrement. Of course you can use any number of bits - the more bits you use the more stable and less responsible the calibration becomes.

The drawback is you need data to maintain calibration. If data stops, the calibration stops too. As applied to DDR3, this means that you need to read data periodically, or you lose calibration. That's where calibrated delays in 7-series come handy.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 19, 2021, 10:24:22 am
@NorthGuy Thanks for the explanation.  Please allow me some time to digest your technical explanation.

By the way, what is wrong with the user_desired_extra_read_or_write_cycles (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L139) signal ?

(https://i.imgur.com/F6ZBjjy.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 19, 2021, 01:43:47 pm
By the way, what is wrong with the user_desired_extra_read_or_write_cycles (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L139) signal ?

I guess you drive IBUFDS with this signal along with other things. IBUFx/OBUFx/IOBUFx blocks are positioned on the input pad. They are, sort of, bridges between the FPGA internals and outside world. A wire which drives IBUF is outside of the FPGA and cannot drive anything inside the FPGA. This may have something to do with UCF files.

There are lots of people on this forum who still use ISE on daily basis. They may not look at this thread. I suggest you open a separate thread for ISE entanglements.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 20, 2021, 08:20:37 am
Quote
The pdcount simply adds some stability. Instead of doing delay increments/decrements immediately, they accumulate INCDEC values with the pdcount register. If the INCDEC values are mixed (meaning alignment is close to optimum), pdcount varies around "10000". If ones start to prevail (pdcount goes to "11111") and they do the increment. If zeroes start to prevail, pdcount moves to "00000"  and they do the decrement. Of course you can use any number of bits - the more bits you use the more stable and less responsible the calibration becomes.

Wait, I suppose DDR protocol is high-speed protocol ?  Why accumulate instead of calibrate immediately ?

What do you exactly mean by INCDEC values are mixed (meaning alignment is close to optimum) ?

Why exactly the more bits you use, the more stable and less responsible the calibration becomes ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 20, 2021, 01:33:22 pm
Wait, I suppose DDR protocol is high-speed protocol ?  Why accumulate instead of calibrate immediately ?

You don't have to. You can calibrate at every step. But this will produce jitter because you'll change the delays every edge. The data from the DDR3 chip have their own jitter too - you do not want to adjust your settings based on this - you only need to adjust if the data if there's a lasting shift in data position.

Temperature-related changes are very slow compared to DDR3 speed.

What do you exactly mean by INCDEC values are mixed (meaning alignment is close to optimum) ?

Mixed means a mix of '0' and '1'. As opposed to all '0' or all '1', which would indicate that something has changed.

Why exactly the more bits you use, the more stable and less responsible the calibration becomes ?

5 bits mean you're going to calibrate if the difference between the count of '0' and '1' is 15. 6 means you need the difference of 31. 7 means 63. Therefore the more bits, the longer you will wait until you get enough difference between '0' and '1'. Longer wait means more stability and less responsiveness.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 22, 2021, 12:17:42 pm
According to https://www.xilinx.com/support/documentation/application_notes/xapp1064.pdf#page=20 (https://www.xilinx.com/support/documentation/application_notes/xapp1064.pdf#page=20) , When the SerDes factor is 5 to 8 or the phase detector function is selected, two ISERDES2s are required and the pin next to the active input (which must be a master or _p pin) is blocked from use as a synchronous input since the necessary logic is already occupied.

There is some underlying hardware limitation which disallows the use of phase detection of IODELAY2 primitive (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=72) in the case of ISERDES2 primitive for DDR3 memory controller application.

https://github.com/promach/DDR/blob/main/phase_detector.v#L73-L77 (https://github.com/promach/DDR/blob/main/phase_detector.v#L73-L77)

Code: [Select]
[phung@archlinux DDR]$ grep -n IODELAY *.v
phase_detector.v:23://        - State machine changed slightly to enable individual control of INC pins on IODELAY2s
phase_detector.v:67:input    [D-1:0]        busy ;            // BUSY inputs from IODELAY2s
phase_detector.v:73:output            cal_master ;        // Output to cal pins on master IODELAY2s
phase_detector.v:74:output            cal_slave ;        // Output to cal pins on slave IODELAY2s
phase_detector.v:75:output            rst_out ;        // Output to rst pins on master & slave IODELAY2s
phase_detector.v:76:output    [D-1:0]        ce ;              // Outputs to ce pins on IODELAY2s
phase_detector.v:77:output    [D-1:0]        inc ;              // Outputs to inc pins on IODELAY2s
phase_detector.v:140:          if (enable == 1'b1) begin                // Wait for IODELAY to be available
phase_detector.v:156:       4'h2 :     begin                            // Now RST master and slave IODELAYs needed for simulation, not for the silicon
phase_detector.v:168:       4'h4 :     begin                            // Wait for IODELAY to be available
phase_detector.v:193:       4'h8 : begin                            // Wait for all IODELAYs to be available, ie CAL command finished
serdes_1_to_n_clk_ddr_s8_diff.v:70:wire         ddly_m;             // Master output from IODELAY1
serdes_1_to_n_clk_ddr_s8_diff.v:71:wire         ddly_s;             // Slave output from IODELAY1
serdes_1_to_n_clk_ddr_s8_diff.v:95://        IODELAY for the differential inputs.
serdes_1_to_n_clk_ddr_s8_diff.v:97:IODELAY2 #(
serdes_1_to_n_clk_ddr_s8_diff.v:125:IODELAY2 #(
serdes_1_to_n_data_ddr_s8_diff.v:89:wire     [D-1:0]        ddly_m;             // Master output from IODELAY1
serdes_1_to_n_data_ddr_s8_diff.v:90:wire     [D-1:0]        ddly_s;             // Slave output from IODELAY1
serdes_1_to_n_data_ddr_s8_diff.v:145:IODELAY2 #(
serdes_1_to_n_data_ddr_s8_diff.v:172:IODELAY2 #(
[phung@archlinux DDR]$
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 23, 2021, 02:43:59 am
According to https://www.xilinx.com/support/documentation/application_notes/xapp1064.pdf#page=20 (https://www.xilinx.com/support/documentation/application_notes/xapp1064.pdf#page=20) , When the SerDes factor is 5 to 8 or the phase detector function is selected, two ISERDES2s are required and the pin next to the active input (which must be a master or _p pin) is blocked from use as a synchronous input since the necessary logic is already occupied.

It might be impossible to calibrate delays for DDR3. You need to study what they have in details and figure out what can be done. Xilinx won't have any DDR3 examples because they have DDR3 API which uses dedicated blocks. Most people use Xilinx's IP, so it's very few with any experience.

If you find it impossible, then you need to find a different approach (such as BrianHG's method) or a different chip (such as Xilinx 7-series).
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 23, 2021, 12:51:52 pm
Why phase detector (https://github.com/promach/DDR/blob/main/phase_detector.v) requires TWO ISERDES2 primitives (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=84) ?

(https://i.imgur.com/oJGOpjr.png)


Quote
If you find it impossible, then you need to find a different approach (such as BrianHG's method) or a different chip (such as Xilinx 7-series).

BrianHG's method (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3585485/#msg3585485) might not work for frequencies higher than 303MHz.   Please correct me if wrong.

Why use 7-series ?  How would it exactly solve my current issue ?

As for Altera MAX10 DECA development board (https://www.arrow.com/en/products/deca/arrow-development-tools), it seems that there is no dynamic phase detection/calibration state machine verilog coding available ?

Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 23, 2021, 01:54:50 pm
BrianHG's method (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3585485/#msg3585485) might not work for frequencies higher than 303MHz.   Please correct me if wrong.

Not necessarily. Basically, his method assumes that DQS to CK phase relationship is fixed, so if DQS wonders away (for example because of changes in temperature or voltage) it'll stop working. This makes the window narrower, but there's no way to figure out how big the effect is. May work at higher frequencies as well.

Anyway, Spartan-6 will have a limit of how fast you can go, which may be not too far from 300 MHz. Their hardware memory block is rated at 800 Mb/s - 400 MHz. Their DDR LVDS when clocked with BUFG is also rated at 400 MHz. The datasheet says that you can go to 540 MHz with BUFIO/ISERDES, but only if ratio is 4:1 or above. And this is with -3 speed grade.

Why use 7-series ?  How would it exactly solve my current issue ?

Auto-calibrated delays, memory mode ISERDES, higher speed.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 23, 2021, 04:21:56 pm
Quote
Why phase detector requires TWO ISERDES2 primitives ?

Someone told me the following:

Quote
The "master" is used for the actual data sampling. The delay is adjusted so the sampling is in the middle of the eye.
The "slave" is used for the continuous delay adjustment. The delay is adjusted to be where the data is expected to change.
Every time there is an edge on the data (= the data bit changed), is is checked if the slave sampled the old or the new value (early or late sampling).
When early or late dominates, the pdcounter will eventually reach one of it's limits, and both ISERDES2 primitives are ordered to increment or decrement the delay by one tap.

The master can't be used for the timing adjustment, because the need for a delay change wouldn't be detected until the data had already been corrupted.

But I am confused as in what it exactly means by "The master can't be used for the timing adjustment, because the need for a delay change wouldn't be detected until the data had already been corrupted." ?

Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 23, 2021, 09:25:40 pm
But I am confused as in what it exactly means by "The master can't be used for the timing adjustment, because the need for a delay change wouldn't be detected until the data had already been corrupted." ?

You use the data signal to get your data. If the signal goes out of alignment, the data is lost. Therefore, you need to do the adjustments when the signal only begins to go out of alignment.

Therefore, they use a different signal, which is the copy of the original signal shifted by 90 degrees. If the data starts to shift (either way), this second signal (which has a good deal of shift incorporated into it already) will get out of alignment much sooner. This gives you an opportunity to correct the alignment before the data is corrupted.

I don't know whether the mechanism is suitable for DDR3. DQS is a differential signal, so it uses two pins and hence there are two ISERDES blocks. Therefore the lack of ISERDES blocks per se is not a problem in aligning DQS to CK. But of course there may be routing problems, configuration problems etc. The devil is in the details.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 24, 2021, 12:31:17 am
Quote
I am not using the DQS strobe as a clock for sampling the DQ.  I'm using the DQS inputs as a 'data_enable' DDR input where the 'preamble' is used as a sync/reset read buffer position.  Remember, when reading data, the DQS is in perfect sync with the read DQ.

The tunable PLL output clock goes to the 'input clock' for the DQ & DQS DDR input buffers and subsequent read data FIFO's input clock.

@BrianHG Why your phase calibration approach (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3585485/#msg3585485) does not require use of SERDES block ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 24, 2021, 01:55:12 am
Quote
I am not using the DQS strobe as a clock for sampling the DQ.  I'm using the DQS inputs as a 'data_enable' DDR input where the 'preamble' is used as a sync/reset read buffer position.  Remember, when reading data, the DQS is in perfect sync with the read DQ.

The tunable PLL output clock goes to the 'input clock' for the DQ & DQS DDR input buffers and subsequent read data FIFO's input clock.

@BrianHG Why your phase calibration approach (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3585485/#msg3585485) does not require use of SERDES block ?

Here, my ser/des block:
The serial shift in from the simplest of DDR input buffer:
Code: [Select]
                                            RDQ_CACHE_h[0]    <= RDQ_h;                       // Shift in the input read data
                                            RDQ_CACHE_l[0]    <= RDQ_l;                       // Shift in the input read data
    for (int i=0; i<(3+RDQ_SYNC_CHAIN);i++) RDQ_CACHE_h[i+1]  <= RDQ_CACHE_h[i];              // Shift the input across the cache
    for (int i=0; i<(3+RDQ_SYNC_CHAIN);i++) RDQ_CACHE_l[i+1]  <= RDQ_CACHE_l[i];              // Shift the input across the cache
                                            RDQS_CACHE_h[0]   <= RDQS_ph[0];                  // Shift in the DQS status
                                            RDQS_CACHE_l[0]   <= RDQS_pl[0];                  // Shift in the DQS status
    for (int i=0; i<(3+RDQ_SYNC_CHAIN);i++) RDQS_CACHE_h[i+1] <= RDQS_CACHE_h[i];             // Shift the status across the cache
    for (int i=0; i<(3+RDQ_SYNC_CHAIN);i++) RDQS_CACHE_l[i+1] <= RDQS_CACHE_l[i];             // Shift the status across the cache

Snapping the serial word from the above serial pipe of data:
Code: [Select]
if ( RDATA_store ) begin                                                                // When a RDATA_store is received, copy the RDQ_CACHE fifo into the RDATAt output and toggle the RDQ_toggle status flag.
    for (int i=0;i<4;i++)   begin
                            RDATA[ ((i)*2+0)*DQ_WIDTH  +: DQ_WIDTH  ] <= RDQ_CACHE_h[i+RDQ_SYNC_CHAIN] ; // Big Endian BL8 burst
                            RDATA[ ((i)*2+1)*DQ_WIDTH  +: DQ_WIDTH  ] <= RDQ_CACHE_l[i+RDQ_SYNC_CHAIN] ;
                            end
                            RDATA_toggle <= !RDATA_toggle ;
    end

RDATA is the full parallel BL8 word/chunk.
RDATA_store is generated by a 2 bit counter, reset by the pattern coming through the parallel sampling RDQS_CACHE_h/l pipe, and externally enabled by a read command.

    Even an Altera -8 can run this code above 320MHz while a -6 can push the 450MHz limit, IE 900MHz DDR3 even though Altera's DDR3 IO limit is 320/640MHz and for some stupid reasons their CK port is limited a lower to 300/600MHz, IE 300MHz CK# output speed.  (I am not using any of the advanced fine time tuning capabilities available to the Max10 on every DDR buffer pin.  Doing so would make my code incompatible with older CycloneIV devices, or would need re-working for Lattice and Xilinx which should officially support above 400MHz and above as their DDR buffers are as fast as Altera's Arria FPGAs.)  RDATA_toggle is the secret to crossing 1 clock domain boundary into another, especially is that clock shifts around when tuned compared to your master system DDR Clock.

My write just does the opposite.  It snaps in a parallel chunk while continuously shifting out, with mask in parallel.

Why is this so difficult for you to just write like I did above?
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 24, 2021, 02:07:22 am
Quote
My write just does the opposite.  It snaps in a parallel chunk while continuously shifting out, with mask in parallel.

Why is this so difficult for you to just write like I did above?

Let me confirm one thing.  The pure verilog code above serves as serdes block without using any vendor built-in hard primitives ?  And it can run above 303MHz without any timing violations ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 24, 2021, 03:49:02 am
IMHO, the speed depends only on the acquisition. You need to acquire fast DDR signal with a narrow window which gets more and more difficult as the clock speed increases. Once you get the data, there's no problem to dispatch it through FPGA. Therefore, if you want higher speed, you need to concentrate on sampling.

There are few problems with that. One problem is clock. Spartan-6 BUFIO can work up to 540 MHz (-3 grade) while BUFG can only go to 400 MHz. Hence, if you want to go above 400 MHz, you must sample with BUFIO.

Other problem is the signal: SI, including ODT calibration, length matching, timing, clock quality etc. This produces a window where you can sample (google "eye pattern" for visual). You need a window of certain width to be able to sample reliably. As the speed increases, the window shrinks. If your window is not wide enough, it will limit your speed. If you can get to the limit imposed by the clock you archived the goal - cannot go any faster. If you cannot get to the limit imposed by the clock, you need to work on widening your window.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 24, 2021, 04:26:54 am
Quote
My write just does the opposite.  It snaps in a parallel chunk while continuously shifting out, with mask in parallel.

Why is this so difficult for you to just write like I did above?

Let me confirm one thing.  The pure verilog code above serves as serdes block without using any vendor built-in hard primitives ?  And it can run above 303MHz without any timing violations ?
Yes, remember, that code I showed you is running on the DDR buffer's clock in.  Not the system clock.
It also can run at the >500MHz limit of the Altera Cyclones.  It is just too simple having a 2 bit counter and 2 bit compare.  The problem arises when connecting the RDATA and RDATA_toggle output to your system clock.  Within this bridge, you may need to setup a multicycle setup of 2 and hold of 1, but that shouldn't be necessary as your compiled setup PLL for that channel should be at 0 degree phase with respect to your system controller clock.  When sampling the RDATA and RDATA_toggle on your system controller clock, delay the RDATA_toggle by 2 clocks to guarantee when you capture the RDATA that all the bits are valid when you capture on the delayed toggle.  Since my system tunes the read clock, this shifts the entire bit of code I sent you and you want to ensure the RDATA when transferred over to the system clock's logic is always true.  (Remember, with BL8s, on your system clock, the RDATA_toggle flips once every 4 clocks.  If you are operating with BC4s, now timing will get more tight and my code would need some adaptation.)
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 24, 2021, 05:27:00 am
Quote
Yes, remember, that code I showed you is running on the DDR buffer's clock in.  Not the system clock.

This still requires a vendor PLL core to generate a frequency >= 303MHz to act as the DDR (both input and output ?) buffer's clock in.

Quote
The problem arises when connecting the RDATA and RDATA_toggle output to your system clock.  Within this bridge, you may need to setup a multicycle setup of 2 and hold of 1,

What is the purpose of the RDATA_toggle signal ?
Multicycle setup of 2 is because of the 2 preamble bits ?
What about multicycle hold of 1 ?

Quote
If you are operating with BC4s, now timing will get more tight and my code would need some adaptation.)

I do not think the timing will get tighter.  Why do you say such ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 24, 2021, 06:24:31 am
#1: Yes, the PLL needs to generate the DDR_CK frequency.  I don't know about Xilinx, but with Altera, running a DDR buffer at 303MHz means the data come in/out at 606MHz, bu on the internal bus side, the data runs synchronous to 303MHz, but it's 2x wide.

#2:  A multicycle in the .sdc allows you to describe how many clocks of slack is allowed to transition from source to destination before the data must be valid.  As a 'falsepath' means you don't care about the timing.  This allows the compiler's fitter to optimize logic placement which can allow it to focus on parts of the design which must be fast and ignore to paths you choose, IE a global slow reset control, which you can specify it may be allowed to take up to 2/3 or more clocks before it reaches it's end.

I use the '_toggle' signal when my data out is ready allowing me to run my system clock at 150.15MHz (half rate mode), where it monitors for a toggle on that line, and when it does, it know a read data came in and latches it.  If I used a single 'data_ready' pulse, IE high for 1 single 303MHz clock cycle, the 150m side, or even 75m (Quarter rate) side would never see that portion of a clock pulse.  On the read clock side, serial delaying the '_toggle' by a clock or 2 allows me to set such a multicycle in the .sdc further removing the stringent timing between the DDR read clock and the rest of the system since the RDATA would have the data change to it's new contents 1 clock before, and hold it's contents steady for 3 clocks after the '_toggle' has toggled.

#3: BC4 read 2 clocks, plus a 1 clock break before the next BC4 or BL8 is permitted.  So, a BC4 after a BC4 has a 3 CK cycle.  BL8s are permitted every 4 clocks, IE a 4 clock cycle.  That's 1 clock shorter timing.  If you are not making a smart ram controller, or no consecutive bursting capabilities, your command spacing will never have these adjacent bursts, so you never need to worry about such tight cycles.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 24, 2021, 08:09:25 am
Quote
running a DDR buffer at 303MHz means the data come in/out at 606MHz, bu on the internal bus side, the data runs synchronous to 303MHz, but it's 2x wide.

Does it mean the vendor PLL core needs to generate 606MHz ?
Or generate just 303MHz and apply DDR buffer similar to IDDR2 (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=123) ?


Quote
I use the '_toggle' signal when my data out is ready allowing me to run my system clock at 150.15MHz (half rate mode), where it monitors for a toggle on that line, and when it does, it know a read data came in and latches it.  If I used a single 'data_ready' pulse, IE high for 1 single 303MHz clock cycle, the 150m side, or even 75m (Quarter rate) side would never see that portion of a clock pulse.  On the read clock side, serial delaying the '_toggle' by a clock or 2 allows me to set such a multicycle in the .sdc further removing the stringent timing between the DDR read clock and the rest of the system since the RDATA would have the data change to it's new contents 1 clock before, and hold it's contents steady for 3 clocks after the '_toggle' has toggled.

The code snippet you posted is a bit confusing especially the actual purpose of half-rate mode and quarter-rate mode.  Would you be able to post a full implementation of your own SERDES block on github repository ?

Besides, I also have something on multicycle for sdc constraint file (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.sdc#L21-L24), but your explanation on its purpose is a bit confusing to me.  Would you be able have some simple waveform drawing to illustrate your text explanation ?


Quote
BC4 read 2 clocks, plus a 1 clock break before the next BC4 or BL8 is permitted.  So, a BC4 after a BC4 has a 3 CK cycle.  BL8s are permitted every 4 clocks, IE a 4 clock cycle.  That's 1 clock shorter timing.

I do not see such phenomenon though in the following waveform.  Please correct me if wrong.

(https://i.imgur.com/kDW1w1Y.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 24, 2021, 08:40:11 am
Quote
running a DDR buffer at 303MHz means the data come in/out at 606MHz, bu on the internal bus side, the data runs synchronous to 303MHz, but it's 2x wide.

Does it mean the vendor PLL core needs to generate 606MHz ?
Or generate just 303MHz and apply DDR buffer similar to IDDR2 (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=123) ?


Quote
I use the '_toggle' signal when my data out is ready allowing me to run my system clock at 150.15MHz (half rate mode), where it monitors for a toggle on that line, and when it does, it know a read data came in and latches it.  If I used a single 'data_ready' pulse, IE high for 1 single 303MHz clock cycle, the 150m side, or even 75m (Quarter rate) side would never see that portion of a clock pulse.  On the read clock side, serial delaying the '_toggle' by a clock or 2 allows me to set such a multicycle in the .sdc further removing the stringent timing between the DDR read clock and the rest of the system since the RDATA would have the data change to it's new contents 1 clock before, and hold it's contents steady for 3 clocks after the '_toggle' has toggled.

The code snippet you posted is a bit confusing especially the actual purpose of half-rate mode and quarter-rate mode.  Would you be able to post a full implementation of your own SERDES block on github repository ?

Besides, I also have something on multicycle for sdc constraint file (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.sdc#L21-L24), but your explanation on its purpose is a bit confusing to me.  Would you be able have some simple waveform drawing to illustrate your text explanation ?


Quote
BC4 read 2 clocks, plus a 1 clock break before the next BC4 or BL8 is permitted.  So, a BC4 after a BC4 has a 3 CK cycle.  BL8s are permitted every 4 clocks, IE a 4 clock cycle.  That's 1 clock shorter timing.

I do not see such phenomenon though in the following waveform.  Please correct me if wrong.

(https://i.imgur.com/kDW1w1Y.png)

#1, In Altera, it's flip-flops at the IO pin use the 303MHz clock, sample on the rise and sample on the fall, with an internal alignment word DFF which shift the falling clocked data to the next stage FF for the normal synchronous rising clock.  I do not care how Altera internally does this shift, all I care is I provide a clock and I get back data sampled on the rising and falling edge on a double wide buss.  Same for the transmit section in the opposite direction.

#2, My code runs directly connected to the output of the DDR buffer and shares it's clock_in clock.  It does receive 1 single wire control 'run' signal from my system CK clock which generates all the DDR3 commands.  Otherwise, it just shifts away and every 4 words, it latches that serial pipe into an 2x4 word parallel chunk.  The 1 rule the 2 bit position counter is that it resets it's position if the DQS coming in isn't in a valid state, self correcting/aligning for the DDR3's beginning of a read command.  The full code will be posted on this forum once I fix my issue with my 16port read/16port write cache system.  I'm almost done as it just seems to miss a few posted read requests out of every few hundred thousand for some reason.  (Too large to find in simulation unless I go to extremes there, see attached image...)

#3, Thanks for the correction.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 24, 2021, 12:26:40 pm
@BrianHG  Could I say that your phase calibration mechanism (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3585485/#msg3585485) is similar to bitslip (https://www.xilinx.com/support/documentation/application_notes/xapp1208-bitslip-logic.pdf) ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 24, 2021, 01:42:43 pm
No, bit slip is is equivilant to reset of the 2bit counter 'RDQ_POS' which aligns my parallel output snapshot time:

Here is my bit-slip auto real-time self correcting code which corrects itself once valid read data comes in:
Code: [Select]
    if (!(RDQS_CACHE_h[RDQ_SYNC_CHAIN]==0 && RDQS_CACHE_l[RDQ_SYNC_CHAIN]==1)) begin    // No valid read data DQS signal pattern, so keep the RDATA_store & RDQ_POS in reset state.
             RDATA_store <= 0 ;                                                         // Reset due to preamble
             RDQ_POS     <= 1 ;                                                         // Reset due to preamble
    end else if ( RD_WINDOW ) begin                                                     // Only generate a single RDATA_store copy read data strobe when an unbroken
                                                                                        // 4 count DQS clock pattern is continuously running inside the read window.
 
                                                RDQ_POS     <= RDQ_POS + 1'b1 ;
                                if (RDQ_POS==3) RDATA_store <= 1'b1 ;
                                else            RDATA_store <= 1'b0 ;
    end else begin
             RDATA_store <= 0 ;                                                         // No more active read window, force end the read stat.
             RDQ_POS     <= 1 ;                                                         // Make sure a false broken read must run for 4 good clocks at the beginning of the next read window.
             end

if ( RDATA_store ) begin                                                                // When a RDATA_store is received, copy the RDQ_CACHE fifo into the RDATA output and toggle the RDATA_toggle status flag.
    for (int i=0;i<4;i++)   begin
                            RDATA[ ((i)*2+0)*DQ_WIDTH  +: DQ_WIDTH  ] <= RDQ_CACHE_h[i+RDQ_SYNC_CHAIN] ; // Big Endian BL8 burst
                            RDATA[ ((i)*2+1)*DQ_WIDTH  +: DQ_WIDTH  ] <= RDQ_CACHE_l[i+RDQ_SYNC_CHAIN] ;
                            end
                            RDATA_toggle_int    <= !RDATA_toggle_int ;
 
end
The first line 'if' and the 2 after that is my 'auto-bit-slip' correction system.  It works by validating the DQS input which is sampled in parallel with the data as seen on the previous page of code.

When doing a read calibration at powerup, I make phase adjustments to the read clock phase data and this code will either return the proper pattern, or garbage, or nothing at all if the read phase and 'RD_WINDOW ' alignment is wrong.  The RD_WINDOW  begins a clock early and ends a clock late allowing for the fact it was generated by the system clock and it shifts relative to the phase of the read clock as my code moves it around scanning for all the valid read points and selects the center position of all the qualified good reads.

Title: Re: DDR3 initialization sequence issue
Post by: promach on June 24, 2021, 03:09:51 pm
Wait, let's back up a bit.

You mentioned: ”I am not using the DQS strobe as a clock for sampling the DQ.  I'm using the DQS inputs as a 'data_enable' DDR input where the 'preamble' is used as a sync/reset read buffer position.  Remember, when reading data, the DQS is in perfect sync with the read DQ.“ in previous post on June 10 (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3585485/#msg3585485)

I suppose the quoted sentence above resembles bitslip operation ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 24, 2021, 03:47:28 pm
I use the DQ strobe to set the position of of the bit slip.  IE: Equivalent to the alignment of the bit slip input which always runs.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 24, 2021, 04:50:25 pm
wait, let me clarify one more thing again.

Your current code implementation only implements bitslip (single bit shift which is equivalent to nano-second shift),  but not phase calibration (pico-second shift).

Please correct me if wrong.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 24, 2021, 04:57:57 pm
Quote
The tunable PLL output clock goes to the 'input clock' for the DQ & DQS DDR input buffers and subsequent read data FIFO's input clock.

The PLL has the following optional inputs:

Phase_select,                (Selects which one of the many outputs of a single PLL which you may wish to adjust the phase)
Phase_step_enable,       (Steps the selected PLL output's phase by 1/16th to 1/64th of the PLL's reference clock output, IE 64 steps will shift the selected output by a perfect 360 degrees.)
Phase_direction,          ( Step left or right.)

Wait, I think you are using vendor PLL core to generate pico-second shift.  Please correct me if wrong.

However, I had issues using dynamic phase shift feature of PLL inside ISE tool (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3585536/#msg3585536) because of some underlying physical hardware limitations (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3586022/#msg3586022).

What could I do in this case ?

Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 24, 2021, 05:41:38 pm
The PLL core tuning allows me ~50ps steps in either direction.
My so called 'bit-slip' is an alignment to the first word coming out of the DDR3 during a read, not a reference sampling timing like the PLL tuning.  If the PLL tuning is in a lemon position, the bit slip cant do anything.

Within the MAX 10's DDR buffers, I can setup the calibration de-skew function which offers 1ps step tuning with a range of +/-25ps.  It is only worth it for problem PCBs where you must correct for bad trace length matching.

My current error free tuning window is 7 x 50ps steps, or a span of 350ps where every data bit in the entire 16bits are valid and correct.  Going 1 step outside introduces garbage data.  Putting in the effort to get the 1ps adjustments working when I have 7 steps at the 50ps step size is a waste of my time.  In fact, the 1ps step with a size of +/-25ps total is so small, is is not useful in my design.

Title: Re: DDR3 initialization sequence issue
Post by: promach on June 25, 2021, 01:08:38 am
I am thinking of where exactly in the dq signal input path shall the bitslip (https://www.xilinx.com/support/documentation/application_notes/xapp1208-bitslip-logic.pdf#page=4) (nano-second shift) and IODELAY (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=130) (pico-second shift) phase shift be implemented ?

The following is the input path for the dq and dqs signal

RAM -> IOBUF (for inout)  -> IDDR2 (input DDR buffer) -> ISERDES
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 25, 2021, 01:35:33 am
I am thinking of where exactly in the dq signal input path shall the bitslip (https://www.xilinx.com/support/documentation/application_notes/xapp1208-bitslip-logic.pdf#page=4) (nano-second shift) and IODELAY (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=130) (pico-second shift) phase shift be implemented ?

The following is the input path for the dq and dqs signal

RAM -> IOBUF (for inout)  -> IDDR2 (input DDR buffer) -> ISERDES
Bit-slip is not a nanosecond in my design.  It is a 1/2 DDR3 CK# alignment.  It goes up and down by 1/2 DDR3 clock speed, it's a 2 byte on a 16 bit DDR3 ram chip.  My PLL tuning is what shifts inside each DDR3 CK.

On a ser-des, you would call this a bit slip because usually serdes have 1 bit input, not 16 in parallel like with a DDR3 chip.  So, slipping a bit means 1 bit to the left or right.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 25, 2021, 02:45:07 am
Bitslip is a thing specific to ISERDES. It shifts (rotates) the data coming out of ISERDES by 360 degrees (or a multiple of 360 degrees) of the original clock, that is by whole number of serial bits (or whole number of bit pairs for DDR).

Bitslip has nothing to do with phase shift (performed to achieve a desired phase relationship between signals) you're discussing.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 25, 2021, 03:12:44 am
Bitslip is a thing specific to ISERDES. It shifts (rotates) the data coming out of ISERDES by 360 degrees (or a multiple of 360 degrees) of the original clock, that is by whole number of serial bits (or whole number of bit pairs for DDR).

Bitslip has nothing to do with phase shift (performed to achieve a desired phase relationship between signals) you're discussing.
Thanks, I was rights about the bit slip.
Mine can do 180 degrees was an added term, however, being able to detect that the source is 180 degrees out of phase is enough for me as the PLL tuning can go right around.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 25, 2021, 05:32:29 am
Someone told me the following regarding the difference in purpose between bitslip (1-bit shift which is equivalent to nano-second shift) and IODELAY (pico-second shift)

Quote
so the vaguely general description is that variations in things like PCB trace length can lead to variable amounts of skew between periodic (K clocks) or strobe (dqs) signals, such that the rising edge of these signals may not match up well with the data to be sampled. So, during DDR calibration you're going to first perform fine-tuned adjustment of the IODELAY taps in order to center the clocks/strobes between transition edges of your data. However, because the IODELAYs can only ...well, delay things...it's possible that the clocks/strobes can end up centered but a cycle out of sync. So the bitslip provides a way to delay the incoming signals on a cycle by cycle basis to account for that.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 25, 2021, 05:52:30 am
Now reading bitslip appnote (https://www.xilinx.com/support/documentation/application_notes/xapp1208-bitslip-logic.pdf#page=4) for actual coding implementation leads to some Solution A and Solution B ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 25, 2021, 01:38:27 pm
Someone told me the following regarding the difference in purpose between bitslip (1-bit shift which is equivalent to nano-second shift) and IODELAY (pico-second shift)

Quote
so the vaguely general description is that variations in things like PCB trace length can lead to variable amounts of skew between periodic (K clocks) or strobe (dqs) signals, such that the rising edge of these signals may not match up well with the data to be sampled. So, during DDR calibration you're going to first perform fine-tuned adjustment of the IODELAY taps in order to center the clocks/strobes between transition edges of your data. However, because the IODELAYs can only ...well, delay things...it's possible that the clocks/strobes can end up centered but a cycle out of sync. So the bitslip provides a way to delay the incoming signals on a cycle by cycle basis to account for that.

That's correct.

DDR3 chip has programmable latencies (that is you must program the cycle delay between the READ command and the data). So, your controller should already have a mechanism to account for variable number of cycles between the command and the data. Therefore, you don't need to use bitslip.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 25, 2021, 03:44:08 pm
There is a catch though.

Remember that I am coding my own serdes, therefore I have to make it very simple for high DDR3 frequency.
Thus, the deserializer will be continuously doing the work without any turn on/off switch.

With this in mind, let's look at Figure 66 below.  Note the two low read-preamble DQS bits.

Let's say for example deserialize at 4:1 those two low bits on dqs happen to be on bit 1 and 2 of those four parallel bits, how to deal with that?

(https://i.imgur.com/Bh4a957.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 25, 2021, 06:24:29 pm
Besides, search for keyword "bitslip" inside https://www.xilinx.com/support/documentation/ip_documentation/ultrascale_memory_ip/v1_4/pg150-ultrascale-memory-ip.pdf (https://www.xilinx.com/support/documentation/ip_documentation/ultrascale_memory_ip/v1_4/pg150-ultrascale-memory-ip.pdf)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 25, 2021, 06:44:14 pm
With this in mind, let's look at Figure 66 below.  Note the two low read-preamble DQS bits.

These are not two bits. The preamble is at least 900 ps, but may be longer. If your speed is low, say 300 MHz, you may be able to get only the very edge of it and if your timing is off, you may not be able to sample it at all.

The preamble is there just to mark the beginning of the period where DQS input is valid.

Let's say for example deserialize at 4:1 those two low bits on dqs happen to be on bit 1 and 2 of those four parallel bits, how to deal with that?

You don't deal with that. You know where your desirializer must start sampling data relative to the READ command you issued. You start sampling at this point (typically, by setting the CE pin of your deserializer). Your first data sample will be 90% after the first rising edge of DQS.

For sampling you typically use the said DQS edge shifted by 90 degrees, and you make sure you don't react to edges earlier than the preamble starts. Or, if you use BrianHG's method, you sample both DQ and DQS as close to this moment as possible.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 25, 2021, 10:53:00 pm
I am not using the preamble as an alignment.  I am sampling the DQS pattern in parallel with the data and using the 010101010101... as a data qualification alignment bit.  The preamble is just an invalid state preventing my sampling machine from beginning to load byte 0.  Also, it just so happens when I write data out, the DDR input sees my generated DQS upside down, so this also invalidates my read logic from beginning.

It's right there in my code bove from yesterday.  The first 'if' keeps the system in reset unless the DQS coming is always samples a 0 on the high clock and 1 on the low clock.  (I know this looks upside-down.  My read clock is tuned for 0 degrees from the write output clock and this in the internal-external buffer delay in the FPGA's IOs.  Attempting to tune a 0ps setup and hold for in and out makes it impossible for the compiler to make a workable design with high FMAX as the IOs are just so fast.)


Title: Re: DDR3 initialization sequence issue
Post by: promach on June 26, 2021, 02:54:38 am
Quote
You start sampling at this point (typically, by setting the CE pin of your deserializer).

I do not have CE pin on my own deserializer.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 26, 2021, 10:56:39 am
In https://www.xilinx.com/support/documentation/ip_documentation/ultrascale_memory_ip/v1_4/pg150-ultrascale-memory-ip.pdf#page=362 (https://www.xilinx.com/support/documentation/ip_documentation/ultrascale_memory_ip/v1_4/pg150-ultrascale-memory-ip.pdf#page=362) , why transform to Figure 17-22 from Figure 17-21 ?

Besides, how exactly is clocK-Centering activity being performed behind the scene ?

(https://i.imgur.com/d9zfEfy.png)

(https://i.imgur.com/C5rIJeo.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 27, 2021, 02:36:37 am
Just check Figure 17-23 (https://www.xilinx.com/support/documentation/ip_documentation/ultrascale_memory_ip/v1_4/pg150-ultrascale-memory-ip.pdf#page=363)

(https://i.imgur.com/xyzZ1Rh.png)

By the way, if https://github.com/promach/DDR/blob/main/phase_detector.v (https://github.com/promach/DDR/blob/main/phase_detector.v) is used to help with clocK-Centering calibration activity in Figure 17-20 (https://www.xilinx.com/support/documentation/ip_documentation/ultrascale_memory_ip/v1_4/pg150-ultrascale-memory-ip.pdf#page=361), I have a concern that it would not work for the data signal D2.

there is ambiguity on whether to move D2 to the left or to the right
The direction (left or right) is dependent on D0 and D1 (at least few other signals have to be compared with D2)

Do you have any comment about calibrating D2 in this case ?

Besides, any idea why only slave is calibrated here (https://github.com/promach/DDR/blob/main/phase_detector.v#L178-L186) ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 27, 2021, 02:16:53 pm
I've never used ultrascale.

DDR3 calibration is a long process which consists of many steps. If you use SODIMM, you need to calibrate every byte lane separately and delays are generally unpredictable. When you calibrate reading you need to write known data to the chip, otherwise there's no way to figure if your reading is Ok or not. But to write data, you need to calibrate writing first. Since you cannot read anything back because the read is not yet calibrated, you need to calibrate writes without reads, for example, by using write leveling. But write leveling will only align your phase within 360 degrees. This makes the process somewhat tricky. The process requires many steps until you fully calibrate anything.

Ultrascale has calibrated programmable input/output delays. Spartan-6 does not. So, you can read the MIG docs to figure out how the calibration process works, but you won't be able to repeat this in Spartan-6.

Xilinx's IPs do the calibration dynamically upon the start.

If you have a single DDR3 chip, you can simplify the calibration, you can assume certain relationships between signals because you know the physical layout and know how the length matching is done, etc. So, your process may be lot simpler than what Xilinx does. Also, you can calibrate once and store the calibration data.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on June 27, 2021, 03:29:49 pm
When you calibrate reading you need to write known data to the chip, otherwise there's no way to figure if your reading is Ok or not. But to write data, you need to calibrate writing first. Since you cannot read anything back because the read is not yet calibrated, you need to calibrate writes without reads, for example, by using write leveling.

From the portions of Altera's official DDR3 ram controller which arent encrypted, they have a test ram calibration during power-up.  They do use a random number generator to generate write test data and read it back to confirm no bit errors.  To do thing properly, your calibration test sequence should implement such a strategy to confirm the data integrity of a random data pattern during power-up.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 27, 2021, 05:34:44 pm
From the portions of Altera's official DDR3 ram controller which arent encrypted, they have a test ram calibration during power-up.  They do use a random number generator to generate write test data and read it back to confirm no bit errors.  To do thing properly, your calibration test sequence should implement such a strategy to confirm the data integrity of a random data pattern during power-up.

Xilinx also does PRBS tests at the startup.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 28, 2021, 02:07:59 am
Quote
But to write data, you need to calibrate writing first. Since you cannot read anything back because the read is not yet calibrated, you need to calibrate writes without reads, for example, by using write leveling. But write leveling will only align your phase within 360 degrees. This makes the process somewhat tricky. The process requires many steps until you fully calibrate anything.

As for "write leveling will only align your phase within 360 degrees." , write leveling could only help to align DQS strobe with CK signal.  So, it is still not enough to perform the clocK-Centering activity for those parallel write data DQ bits signals.

Besides, see also https://www.xilinx.com/support/documentation/white_papers/wp249.pdf#page=5 (https://www.xilinx.com/support/documentation/white_papers/wp249.pdf#page=5)
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 28, 2021, 03:13:52 am
Furthermore, someone told me that I shall encode the delay tap of IODELAY2 primitive as gray code (0, 1, 3, 2, 6, 7, 5, 4, C, D, F, E, A, B, 9, 8 ) .  But does it really matter whether it is gray code or not for IODELAY2  ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 28, 2021, 04:54:46 am
As for "write leveling will only align your phase within 360 degrees." , write leveling could only help to align DQS strobe with CK signal.  So, it is still not enough to perform the clocK-Centering activity for those parallel write data DQ bits signals.

Write leveling is just one of the (first) steps.

BrianHG assumed that DQS and CK are already aligned, and this worked Ok.

Most controllers simply write DQ with a clock which is nominally 90 degree shifted from the clock they use to write DQS, assuming that DQ and DQS are length matched and have minimum skew.

It is up to you to decide what you're going to calibrate and how.

Furthermore, someone told me that I shall encode the delay tap of IODELAY2 primitive as gray code (0, 1, 3, 2, 6, 7, 5, 4, C, D, F, E, A, B, 9, 8 ) .  But does it really matter whether it is gray code or not for IODELAY2  ?

I always thought the Xilinx's delays are glitch free.

If you delay the clock (e.g. DQS) and the clock is fast, instead of worrying about glitches, I would rather worry about smoothness of changes and do 0,1,2,3 ... rather than allowing big changes (e.g. from 4 to C).
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 28, 2021, 08:33:52 am
Quote
I always thought the Xilinx's delays are glitch free.

If you delay the clock (e.g. DQS) and the clock is fast, instead of worrying about glitches, I would rather worry about smoothness of changes and do 0,1,2,3 ... rather than allowing big changes (e.g. from 4 to C).

See Glitchless delay line using gray code multiplexer (https://patents.google.com/patent/US6400735)
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 28, 2021, 09:59:17 am
Quote
Most controllers simply write DQ with a clock which is nominally 90 degree shifted from the clock they use to write DQS, assuming that DQ and DQS are length matched and have minimum skew.

It is up to you to decide what you're going to calibrate and how.

@NorthGuy  You mean the following skew situation would not actually happen for write activity ?

(https://i.imgur.com/C5rIJeo.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 28, 2021, 12:12:43 pm
now, I am checking if I could eliminate either cal_master or cal_slave signal since I am only using one single whole deserializer

I suspect the same phase calibration (https://github.com/promach/DDR/blob/main/phase_detector.v) mechanism could be done without using an extra SLAVE ISERDES

(https://i.imgur.com/LxgX2qe.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 28, 2021, 01:36:29 pm
@NorthGuy  You mean the following skew situation would not actually happen for write activity ?

This picture is from QDR II memory chapter. I never worked with QDR II, nor with ultrascale, and know very little about either. I cannot comment on this.

In DDR3, delay through DQ and DQS of the same byte lane is equalized through the length-matching of the PCB traces, so if you nominally shift DQ 90 degrees from DQS during writes, then everything should get aligned on the DDR3 side as well. There will be some skew of course. But it will be small enough and will never be as big as whole bit. If your FPGA has an ability to delay output lines, you can try to calibrate it out if you wish. Most controllers won't bother.

Think about it. If you use 600 MHz clock, the distance between bytes is roughly 800 ps. The skew between DQ lines of the same byte lane cannot be that big. I suggest you get a board with DDR3 (preferably with SODIMM to observe disparity between byte lanes, but single/dual chip are Ok too) and measure actual skews and windows.

As an example, look how this guy built his DDR3 controller on Zynq:

https://www.elphel.com/blog/2014/06/ddr3-memory-interface-on-xilinx-zynq-soc-free-software-compatible/ (https://www.elphel.com/blog/2014/06/ddr3-memory-interface-on-xilinx-zynq-soc-free-software-compatible/)

His Zynq was Kintex based, so he had much better hardware than Spartan-6. Therefore, his controller will not work for you. But look at his approach - it is very constructive.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 28, 2021, 01:50:16 pm
now, I am checking if I could eliminate either cal_master or cal_slave signal since I am only using one single whole deserializer

I suspect the same phase calibration (https://github.com/promach/DDR/blob/main/phase_detector.v) mechanism could be done without using an extra SLAVE ISERDES

They use master and slave because they re-calibrate dynamically while maintaining communications at the same time. This adjusts the delays to ongoing temperature and voltage variation.

Alternatively, you can do the calibration once (either at startup or with a separate design), remember the delay values and use them.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 29, 2021, 07:10:57 am
Quote
They use master and slave because they re-calibrate dynamically while maintaining communications at the same time. This adjusts the delays to ongoing temperature and voltage variation.

Is it possible to use only a single whole deserializer (instead of splitting into master and slave deserializers) to achieve the similar quoted purpose of dynamic phase calibration ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 29, 2021, 10:40:00 am
How to correctly split a single 'dq_w_oserdes' SDR signal into two ('dq_w_d0', 'dq_w_d1') SDR signals (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L791-L797) for ODDR2 primitive (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1112-L1113) ?

Note: dq_w_oserdes signal is a serialized output from serializer.v (https://github.com/promach/DDR/blob/main/serializer.v)

In other words, how to generate the D0 and D1 input signals for ODDR2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=224) ?


Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 29, 2021, 12:49:21 pm
In other words, how to generate the D0 and D1 input signals for ODDR2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=224) ?

If you use hardware OSERDES, you don't need to. Just specify DDR mode for your OSERDES.

IF you write your own - you'll need two - one for D0 and one for D1.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 29, 2021, 01:17:57 pm
Quote
IF you write your own - you'll need two - one for D0 and one for D1.

I have an SDR SERDES running at n clock rate (posedge of 303MHz), but the ODDR2 need to receive new bit (D0 or D1, depending on posedge or negedge) every 2n clock ticks (both posedge and negedge of 303MHz).

I do not understand how is having two separate OSERDES would help to solve the clock rate matching issue ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 29, 2021, 02:32:47 pm
Quote
IF you write your own - you'll need two - one for D0 and one for D1.

I have an SDR SERDES running at n clock rate (posedge of 303MHz), but the ODDR2 need to receive new bit (D0 or D1, depending on posedge or negedge) every 2n clock ticks (both posedge and negedge of 303MHz).

I do not understand how is having two separate OSERDES would help to solve the clock rate matching issue ?

For example, look at 8:1 DDR OSERDES which takes 8 inputs D0,D1,D2,D3,D4,D5,D6,D7 and output them serially

The values supplied by D0,D2,D4,D6 are clocked out on the rising edge
The values supplied by D1,D3,D5,D7 are clocked out on the falling edge

You can then create two 4:1 SDR OSERDES modules.

One of the 2 modules will take D0,D2,D4,D6 inputs and output them serially. You route its output to the D0 pin of the ODDR.

The other will output D1,D3,D5,D7 serially. You route its output to the D1 pin of the ODDR.

But this is only if you write your own OSERDES.

The hardware OSERDES will have built-in DDR mode. Even if you put it in SDR mode, it cannot be routed to ODDR because ODDR and OSERDES are two incarnations of the same OLOGIC block.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 29, 2021, 04:16:21 pm
@NorthGuy

Thanks for you advice on using two separate OSERDES.  See https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L777-L897 (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L777-L897) for the code modification

Now, I am back to solving the skew calibration issue. I need to study more on Write Leveling (https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr3/2gb_ddr3_sdram.pdf#page=129) as well as MPR_Read_function (https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr3/2gb_ddr3_sdram.pdf#page=149).

By the way, how to move the DQS strobe to the center of write DQ bits ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 29, 2021, 04:35:58 pm
Another separate question on bitslip:

In https://www.xilinx.com/support/documentation/user_guides/ug190.pdf#page=366 (https://www.xilinx.com/support/documentation/user_guides/ug190.pdf#page=366) ,  how to derive the table for DDR mode inside Figure 8-10 ?

(https://i.imgur.com/sAOO3Ra.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 29, 2021, 05:53:13 pm
By the way, how to move the DQS strobe to the center of write DQ bits ?

I assume you're talking about writes. The easiest way is to generate a separate clock for DQ which is shifted by 90 degree from the clock that generates DQS.

Another separate question on bitslip:

In https://www.xilinx.com/support/documentation/user_guides/ug190.pdf#page=366 (https://www.xilinx.com/support/documentation/user_guides/ug190.pdf#page=366) ,  how to derive the table for DDR mode inside Figure 8-10 ?

As it says, when you apply BITSLIP it rotates 1 bit to the right, the next time it rotates 3 bits to the left and so on. This may be different in Spartan-6.
Title: Re: DDR3 initialization sequence issue
Post by: promach on June 30, 2021, 11:48:47 am
Quote
As it says, when you apply BITSLIP it rotates 1 bit to the right, the next time it rotates 3 bits to the left and so on. This may be different in Spartan-6.

The rationale of using 3 bits rotation is a bit strange though ?


By the way, must I use IODELAY2 primitive in order to shift the incoming READ DQS strobe to the center of the parallel DQ bits ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on June 30, 2021, 02:55:01 pm
The rationale of using 3 bits rotation is a bit strange though ?

This is probably a consequence of the construction. They use the same block to do lots of things, so it is not always logical.

By the way, must I use IODELAY2 primitive in order to shift the incoming READ DQS strobe to the center of the parallel DQ bits ?

I think so. Depending on the path the signal takes inside FPGA they will also have delays of their own, which you need to take into account.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 01, 2021, 09:00:25 am
As for the use of IODELAY2 (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=130) primitive to shift READ DQS strobe to the center of the incoming parallel DQ bits,  which signal should be used for port 'CLK'  ?  Is it the slow system clock, or the PLL-clk (minimum 303MHz) ?

I have a feeling that none of these two clocks should be used for port 'CLK' of IODELAY2 because READ DQS strobe itself is sort of already acting as clock activity.

(https://i.imgur.com/gQ9C5z2.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 01, 2021, 10:43:54 am
besides, should I use DATAOUT , DATAOUT2 or DOUT ?

How shall the logic for CAL and INC inputs signals look like ?

Note: The RX pipeline is   RAM -> IOBUF (for inout)  -> IDDR2 (input DDR buffer) -> ISERDES   
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 01, 2021, 03:41:44 pm
on page 72 of UG381 appnote (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=72), it says : Calibration takes between 12 and 20 global clock cycles depending on the ratio between the global clock and the I/O clock

If it is taking 12 cycles to just shift the dqs strobe to the center of dq bits, then it seems that IODELAY2 is not a suitable candidate to do this kind of high-speed DDR3 RAM work ?

Please correct me if wrong.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 01, 2021, 05:07:22 pm
I have a feeling that none of these two clocks should be used for port 'CLK' of IODELAY2 because READ DQS strobe itself is sort of already acting as clock activity.

CLK port is for the control logic, such as INC pin. You certainly cannot use DQS for this.

besides, should I use DATAOUT , DATAOUT2 or DOUT ?

How shall the logic for CAL and INC inputs signals look like ?

Note: The RX pipeline is   RAM -> IOBUF (for inout)  -> IDDR2 (input DDR buffer) -> ISERDES   

DATAOUT and DATAOUT2 are the same except for routing. You use DATAOUT for IDDR, but you would use DATAOUT2 if you wanted to route the signal to the fabric.

on page 72 of UG381 appnote (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=72), it says : Calibration takes between 12 and 20 global clock cycles depending on the ratio between the global clock and the I/O clock

If it is taking 12 cycles to just shift the dqs strobe to the center of dq bits, then it seems that IODELAY2 is not a suitable candidate to do this kind of high-speed DDR3 RAM work ?

You can do calibration at startup. You read a test pattern from the DDR3 chip and calibrate. Once it's calibrated, you can read the real data.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 02, 2021, 01:46:07 am
As for CAL and INC input signals for IODELAY2 primitive during DQS centering calibration (https://www.xilinx.com/support/documentation/user_guides/ug388.pdf#page=48) process,
how shall the logic for CAL and INC inputs signals look like ?

Strange, my current TX and RX pipeline paths do not fall under any of the categories inside the following table : Possible Clock Structures for Bidirectional I/O (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=51) ?

RX :     RAM -> IOBUF (for inout) -> IDELAY (DQS Centering) -> IDDR2 (input DDR buffer) -> ISERDES      

TX :     OSERDES -> ODDR2 (output DDR buffer) -> ODELAY (DQS Centering) -> IOBUF (for inout) -> RAM

(https://i.imgur.com/a7Wk1zL.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 02, 2021, 08:00:18 am
Besides, in the case of DQS strobe centering, should I use VARIABLE_FROM_ZERO or VARIABLE_FROM_HALF_MAX (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=71) for the attribute IDELAY_TYPE (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=132) in IODELAY2 primitive ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 02, 2021, 10:02:43 am
@BrianHG

What do you exactly mean by "Even DQ Groups may be ignored, however, I still recommend wiring them properly" in your previous reply (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3585461/#msg3585461) ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 02, 2021, 11:37:55 am
When using the dedicated DQS clock circuitry inside the FPGA, each DQS group has a hard-wired 8 DQ pins which are paired with it.  Same with the DDR3 ram chip, there are 8 data bits for each each DQS pair.  When wiring and programing your DDR3 controller, you need to match each 8 bit DQ group with it's connected DQS since only those DQ's are wired to that 1 DQS input inside the FPGA.  This is where they get the ability to shift adjust timing specifically for each group of 8 data bits without having a PLL with another set dedicated tuned output for every 8 bits.

In my design, I do not derive the read DQ clock from the DQS inputs with it's dedicated wiring to the shared 8 DQ pins.  My DQ read clock is generated by the FPGA PLL instead, so, as long as all your inputs on your FPGA can be clocked by the PLL instead of the DQS inputs, any DDR input may be used with my controller.  The only caveat is that all your DQ and DQS wiring lengths to the DDR3 memory need to be fairly closely matched.  Otherwise, for every 8 bits, I would need a separately phase tuned PLL output where in a 64 bit system, this would mean 8 pll outputs for 8 read clocks as well as another 8 pll outputs for 8 write clocks.


The simplicity of my system using exclusive DDR in and DDR out buffers means easy cross vendor and cross FPGA type compatibility with the same code except for the DDR buffers and the tuning size step depending on pll precision.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 02, 2021, 12:03:45 pm
Quote
My DQ read clock is generated by the FPGA PLL instead

Sure, but this dynamic phase shift approach (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3586022/#msg3586022) is not allowed due to some underlying PLL hardware limitations for Xilinx Spartan-6 FPGA chip.


By the way, why do I face the following tDLLK timing violation after MPR System Read Calibration is enabled (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1585-L1680) ?

(https://i.imgur.com/lGpocn7.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 02, 2021, 01:01:32 pm
Quote
My DQ read clock is generated by the FPGA PLL instead

Sure, but this dynamic phase shift approach (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3586022/#msg3586022) is not allowed due to some underlying PLL hardware limitations for Xilinx Spartan-6 FPGA chip.


I don't think so.
You are probably using a buffer or peripheral which is not compatible with it, or, the buffer/peripherals you are using are actually using the feature behind your back to achieve their function, so, they wont let you do it manually.

Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 02, 2021, 01:43:39 pm
As for CAL and INC input signals for IODELAY2 primitive during DQS centering calibration (https://www.xilinx.com/support/documentation/user_guides/ug388.pdf#page=48) process,
how shall the logic for CAL and INC inputs signals look like ?\

For example, you can use INC and CE to go through all possible values of the delay and find the best position.

Besides, in the case of DQS strobe centering, should I use VARIABLE_FROM_ZERO or VARIABLE_FROM_HALF_MAX (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=71) for the attribute IDELAY_TYPE (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=132) in IODELAY2 primitive ?

It depends on path delays. The best way is to determine this empirically  when you have your design.

Strange, my current TX and RX pipeline paths do not fall under any of the categories inside the following table : Possible Clock Structures for Bidirectional I/O (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=51) ?

RX :     RAM -> IOBUF (for inout) -> IDELAY (DQS Centering) -> IDDR2 (input DDR buffer) -> ISERDES      

TX :     OSERDES -> ODDR2 (output DDR buffer) -> ODELAY (DQS Centering) -> IOBUF (for inout) -> RAM

These are data paths. The table lists the clock source.

BTW: You use either IDDR or ISERDES (similarly either OSERDES or ODDR), but not both.

Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 02, 2021, 01:49:40 pm

By the way, why do I face the following tDLLK timing violation after MPR System Read Calibration is enabled (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1585-L1680) ?

(https://i.imgur.com/lGpocn7.png)

You must fully initialize the DDR3 chip, right past the ZQCL before you can read the MPR.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 02, 2021, 02:07:04 pm
(https://i.imgur.com/a7Wk1zL.png)

Looking at this table, the input and output clocks must be the same. Therefore you cannot use DQS as a clock.

So, your only choice is the BrianHG's method.

However, I think you can use their calibration mechanism to maintain the DQS alignment to CK dynamically and adjust the DQ delay in sync with DQS.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 02, 2021, 05:47:08 pm
@BrianHG Why does the following error occur for PREA command inside STATE_ACTIVATE (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1975-L1987) ?

(https://i.imgur.com/m7QjshI.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 02, 2021, 06:09:01 pm
@BrianHG Why does the following error occur for PREA command inside STATE_ACTIVATE (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1975-L1987) ?


It is as it says...
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 03, 2021, 01:43:19 am
@BrianHG  However, I have already explicitly issue a PREA command, so I do not understand why that error still occur ?

Code: [Select]
if(MPR_ENABLE)  // MPR System Read Calibration
begin
// need to do PRECHARGE after ACTIVATE

ck_en <= 1;
cs_n <= 0;
ras_n <= 0;
cas_n <= 1;
we_n <= 0;
address[A10] <= 1;  // precharge ALL banks
                        main_state <= STATE_PRECHARGE;
           
                        wait_count <= 0;
end
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 03, 2021, 02:16:37 am
@BrianHG It seems that the root cause for the failure to enter MPR_Read_function mode is the vicious dependency cycle between ACTIVATE and PRECHARGE commands (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1977-L1990) as shown in the following two screenshots.

(https://i.imgur.com/LLZAAaL.png)

(https://i.imgur.com/d3bQhWl.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 03, 2021, 10:52:10 am
Quote
Looking at this table, the input and output clocks must be the same. Therefore you cannot use DQS as a clock.

So, your only choice is the BrianHG's method.

However, I think you can use their calibration mechanism to maintain the DQS alignment to CK dynamically and adjust the DQ delay in sync with DQS.

@NorthGuy

Where did you see that the input and output clocks must be the same ?

and how does this translate to need for BrianHG's method ?

(https://i.imgur.com/a7Wk1zL.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 03, 2021, 02:12:07 pm
Where did you see that the input and output clocks must be the same ?

It says "Only possible when the two BUFGs are common for both input and output" or similar

and how does this translate to need for BrianHG's method ?

If the clocks must be the same, you must use the same clock for input as you used for output. So, this cannot be DQS.

BrianHG samples both DQS and DQ with a clock. This clock is different from the output clock - it is shifted by PLL. You cannot do exactly this because your input and output clocks must be the same. But you can do similar. You can delay DQ inputs so that they can be sampled with the output clock.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 03, 2021, 02:56:57 pm
@BrianHG The DDR3 RAM still fail to enter MPR_Read_function mode (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3599738/#msg3599738).

Did the MPR sequence look wrong to you ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 03, 2021, 05:33:32 pm
@BrianHG The MPR_Read_function mode is working now in Modelsim simulation.

However, why at time = 702051705 ps , simulation waveform shows 0 , but the modelsim console transcript log shows 1 instead ?

I have thought about DQS centering in this case, but phase-shifting DQ to both left and right by 90 degrees still give contradictory result.

Remember that timing cursor corresponds to the fourth piece of data being read during MPR_Read_function mode.

(https://i.imgur.com/GI7fPXM.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 04, 2021, 03:16:26 am
Quote
If the clocks must be the same, you must use the same clock for input as you used for output. So, this cannot be DQS.

BrianHG samples both DQS and DQ with a clock. This clock is different from the output clock - it is shifted by PLL. You cannot do exactly this because your input and output clocks must be the same. But you can do similar. You can delay DQ inputs so that they can be sampled with the output clock.

@NorthGuy 

Thanks for pointing this similar clock issue out.
I suppose I still could feed in DQS for 'CLK' inputs of both IDELAY and ODELAY, but I need to delay-shifting DQ bits signals instead of DQS signals.

Please correct me if wrong.


By the way, should IODELAY primitive be placed between IODDR and IOBUF ?


Rx : RAM -> IOBUF (for inout) -> IDELAY (DQS Centering) -> IDDR2 (input DDR buffer) -> ISERDES      
Tx : OSERDES -> ODDR2 (output DDR buffer) -> ODELAY (DQS Centering) -> IOBUF (for inout) -> RAM
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 04, 2021, 11:28:13 am
@BrianHG The MPR_Read_function mode is working now in Modelsim simulation.

However, why at time = 702051705 ps , simulation waveform shows 0 , but the modelsim console transcript log shows 1 instead ?

I have thought about DQS centering in this case, but phase-shifting DQ to both left and right by 90 degrees still give contradictory result.

Remember that timing cursor corresponds to the fourth piece of data being read during MPR_Read_function mode.

(https://i.imgur.com/GI7fPXM.png)
What are your contradictory results?
The DQS and data coming out of the ram look normal.
You cannot affect the DQS timing here as the ram is generating it for you.

FPGA inputs aren't instant.  You will need to read up on how your input buffers work.  What you see on the IO pins of the ram in models sim is the instant output, not what the internal logic in the FPGA sees.  Modelsim is a functional simulator, unless you have compiled a real FPGA with your project with set FPGA and every IO pin in Xilinx and provided a gate-level simulation with cell timing information.  Here you loos access to your exact net names and you need to do a full FPGA compile every-time lasting minutes per compile just to see any results.  Xilinx may not even support this feature withing Modelsim as even newer Altera FPGAs have dropped support for timing gate level simulations in Modelsim.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 04, 2021, 02:09:35 pm
I suppose I still could feed in DQS for 'CLK' inputs of both IDELAY and ODELAY, but I need to delay-shifting DQ bits signals instead of DQS signals.

You cannot use DQS as a clock for OSERDES. DQS is not a clock, it's a strobe. It is not present when you write. You could for ISERDES, but since you must use the same for both OSERDES and ISERDES, you can't use DQS as a clock.

By the way, should IODELAY primitive be placed between IODDR and IOBUF ?

Rx : RAM -> IOBUF (for inout) -> IDELAY (DQS Centering) -> IDDR2 (input DDR buffer) -> ISERDES      
Tx : OSERDES -> ODDR2 (output DDR buffer) -> ODELAY (DQS Centering) -> IOBUF (for inout) -> RAM

Rx : RAM -> IOBUF (for inout) -> IDELAY -> ISERDES      
Tx : OSERDES -> ODELAY -> IOBUF (for inout) -> RAM

or

Rx : RAM -> IOBUF (for inout) -> IDELAY -> IDDR2 -> Your own serdes(x2)
Tx : Your own serdes(x2) -> ODDR2 -> ODELAY -> IOBUF (for inout) -> RAM

If you try to connect something in a way that cannot be connected, it won't let you.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 04, 2021, 02:35:00 pm
Quote
You cannot use DQS as a clock for OSERDES. DQS is not a clock, it's a strobe. It is not present when you write. You could for ISERDES, but since you must use the same for both OSERDES and ISERDES, you can't use DQS as a clock.

@NorthGuy  I mean  'CLK' input for IODELAY2 (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=130) , not IOSERDES2
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 04, 2021, 02:46:05 pm
@NorthGuy  I mean  'CLK' input for IODELAY2 (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=130) , not IOSERDES2

CLK input is only for control. You use the same clock as for the logic which drives CE, INC pins.

Title: Re: DDR3 initialization sequence issue
Post by: promach on July 04, 2021, 02:49:52 pm
As for DQS centering, is the IODELAY2 primitive delay-shifting the DQS strobe OR the parallel DQ bits ?

Note: It seems that the similar clock restriction might apply here ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 04, 2021, 04:03:34 pm
As for DQS centering, is the IODELAY2 primitive delay-shifting the DQS strobe OR the parallel DQ bits ?

Note: It seems that the similar clock restriction might apply here ?

DQS and DQ are different pins.

When you write, you drive both DQS and DQ. You either use different clocks (shifted by 90 degrees) to drive them. Or you can use the same clock, but shift the outputs to produce 90 degree difference.

Whatever clock you used to drive DQS during writes, you must use to read DQS (at least your table says so). Therefore, you must use IDELAY to shift the input of DQSs to align with the clock. You can use DQS for verifying "10101010" pattern as BrianHG did. Or you can shift it by extra 90 degress and use for calibrating the delays, and then use the calibration values to adjust the delays in DQ in sync with DQS.

Similarly, when you read DQ, you must use the same clock as you have used to drive DQ during writes. Of course, the data won't be aligned to the clock, so you need to configure IDELAY to create correct alignment.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 04, 2021, 05:10:21 pm
1. For IODELAY (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=130) , how exactly is CAL different from CE ?

2. On page 131, if I use "WRAPAROUND" for the attribute COUNTER_WRAPAROUND , it looks like it would be similar to bitslip operation ?

3. Besides on page 132, IDELAY_VALUE ranges from 0 to 255.  How shall I use these value during calibration ?  Should I increment or decrement by just '1' during initial MPR calibration ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 04, 2021, 06:04:29 pm
1. For IODELAY (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=130) , how exactly is CAL different from CE ?

CAL performs calibration. CE increments/decrements the number of steps.

2. On page 131, if I use "WRAPAROUND" for the attribute COUNTER_WRAPAROUND , it looks like it would be similar to bitslip operation ?

Sort of. When the counter wraps, you can do bitslip in opposite direction to compensate.

3. Besides on page 132, IDELAY_VALUE ranges from 0 to 255.  How shall I use these value during calibration ?  Should I increment or decrement by just '1' during initial MPR calibration ?

If you use dynamic calibration (such as if you try to align input DQS with the CK) then you increment/decrement by 1. When DQS and CK edges are the same, sampling DQS with CK produces roughly the same amount of '0' and '1' when they're aligned. So you either increment by one or decrement by one until you get to this situation. At this point the calibration is complete. But if DQS walks away from CK (say because of temperature change), you need to re-calibrate, which means you need to read periodically.

If you do static calibration, you just try all the values and select the best one for future use. Then you assume that changes in DQS timing are not big enough to derail your calibration.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 05, 2021, 01:45:33 am
Quote
CAL performs calibration. CE increments/decrements the number of steps.

@NorthGuy  CE signal does not increment/decrement, that is the job of INC signal

So, what is the difference between CAL and CE ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 05, 2021, 02:51:25 am
Quote
CAL performs calibration. CE increments/decrements the number of steps.

@NorthGuy  CE signal does not increment/decrement, that is the job of INC signal

INC selects whether the count increments or decrements.
CE selects whether the count changes (increments or decrements depending on INC) or doesn't change.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 05, 2021, 02:59:42 am
Quote
CAL Input : Initiate calibration input.
CE Input : Enable increment/decrement.

I am bit confused with the actual purpose of CAL and CE inputs.

what is the difference between CAL and CE ?
How are they used together ?  Or should I just ignore CAL and use only CE and INC ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 05, 2021, 03:23:46 am
Or should I just ignore CAL and use only CE and INC ?

CAL is for calibration. It calculates number of taps in the clock period - MAX. I think you need to do it before you start using IODELAY. Read "I/O Delay Calibration and Reset" in ug381.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 05, 2021, 07:45:07 am
Quote
In this example, the delay taps have an average value of 40 ps under current operating conditions. An I/O clock of 250 MHz (4,000 ps period) is applied to the IODELAY2 via CLK0 for SDR mode. When the calibrate (CAL) command is issued, a value of 4,000/40 = 100 is returned internally. If the input delay is programmed to be VARIABLE_FROM_HALF_MAX, then, following a reset (RST) command, the input delay value is set to 50 taps, equivalent to approximately ½ the input clock period. As operating conditions change, the average value of the delay taps will also change, as will the result obtained from a CAL command.

For CAL input port of IODELAY2 primitive (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=74) , is this CAL calibration not related to DQS centering ?  It seems to me that this CAL input is only for IODELAY2 internal calibration mechanism ?

But the explanation using the VARIABLE_FROM_HALF_MAX example does not seem to imply so.

Besides, the description for CAL input port : Invokes the IODELAY2 calibration sequence. The calibration sequence lasts between eight and 16 GCLK cycles. Drives BUSY Low when complete. implies that CAL input signal has to be asserted together with CE and INC input signals during actual calibration for read DQS strobe centering ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 05, 2021, 12:00:43 pm
For CAL input port of IODELAY2 primitive (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=74) , is this CAL calibration not related to DQS centering ?  It seems to me that this CAL input is only for IODELAY2 internal calibration mechanism ?

Yes, the CAL calibration is internal to IODELAY. It prepares IODELAY for use. This process is necessary because the number of taps which fit into a clock cycle varies between different FPGAs and also depends on temperature and voltage. This calibration process is different from calibrating delays to align signals in your design, which is done with CE and INC.

In 7-series CAL is not needed - the delays auto-calibrate - you only need to supply a reference clock.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 05, 2021, 12:20:17 pm
Quote
Yes, the CAL calibration is internal to IODELAY. It prepares IODELAY for use.

Thanks, I am now checking how the IODELAY2 primitive's CAL input port is being driven by this Xilinx demo example (https://github.com/promach/DDR/blob/main/phase_detector.v#L149-L155).

It seems to me that Xilinx engineer issues internal CAL command TWICE, note the cal_data_sint signal inside the FSM (https://github.com/promach/DDR/blob/main/phase_detector.v#L178-L192) for both state 4'h1  as well as  state 4'h6

Besides, the busy_data_d logic does not seem to obey IODELAY2 primitive requirement : The calibration sequence lasts between eight and 16 GCLK cycles.

This is a bit confusing.  Any idea ?

Title: Re: DDR3 initialization sequence issue
Post by: promach on July 05, 2021, 01:17:37 pm
@NorthGuy  It seems that IODELAY2 primitive also needs some initial hardware warmup time (https://github.com/promach/DDR/blob/main/phase_detector.v#L126-L137) ?

There is some explanation about Phase Detector Calibration Mechanisms (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=86) , but I do not understand why SLAVE delay is always the MASTER delay minus half MAX ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 05, 2021, 01:21:35 pm
Quote
Yes, the CAL calibration is internal to IODELAY. It prepares IODELAY for use.

Thanks, I am now checking how the IODELAY2 primitive's CAL input port is being driven by this Xilinx demo example (https://github.com/promach/DDR/blob/main/phase_detector.v#L149-L155).

It seems to me that Xilinx engineer issues internal CAL command TWICE, note the cal_data_sint signal inside the FSM (https://github.com/promach/DDR/blob/main/phase_detector.v#L178-L192) for both state 4'h1  as well as  state 4'h6

Besides, the busy_data_d logic does not seem to obey IODELAY2 primitive requirement : The calibration sequence lasts between eight and 16 GCLK cycles.

This is a bit confusing.  Any idea ?

I don't see any discrepancy. They assert CAL for one clock then they wait until the calibration is done by monitoring BUSY.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 05, 2021, 03:11:24 pm
@NorthGuy What is the purpose of the use of mux (https://github.com/promach/DDR/blob/main/phase_detector.v#L215-L220) signal in the phase detector module ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 05, 2021, 03:30:59 pm
@NorthGuy What is the purpose of the use of mux (https://github.com/promach/DDR/blob/main/phase_detector.v#L215-L220) signal in the phase detector module ?

I don't know. I haven't analyzed their code.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 05, 2021, 03:34:30 pm
By the way, I suppose the following early/late data sampling check mechanism (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=86) could not be used during the initial MPR_Read_function calibration since my code only have a single IDELAY primitive to do DQS centering work ?

(https://i.imgur.com/j3q6eyg.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 05, 2021, 07:02:47 pm
By the way, I suppose the following early/late data sampling check mechanism (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=86) could not be used during the initial MPR_Read_function calibration since my code only have a single IDELAY primitive to do DQS centering work ?

You have two IODELAY blocks because DQS is differential. I don't know if the rounting would permit using the built-in mechanism in your case. If not, you can do it in fabric on your own.

Your DQS IO logic is clocked by a clock. You need to align DQS to this clock. If you sample DQS with the rising edge of the clock, you can get different responses:

1. If you get always '0' which means that the clock rising edge already happened, but DQS risding edge didn't. DQS needs to be moved earlier by decreasing DQS delay.

2. If you get always '1' which means that the clock rising edge happens after DQS edge. Therefore, DQS's delay must be increased.

3. If you're somewhere in the middle (in the jitter zone) then DQS and the clock are aligned.

Of course, you don't need DQS data, you only need DQ data. Therefore you adjust DQ delays the same as DQS - every time you increase DQS delay, you also increase DQ delay as well. Every time you decrease DQS delay you decrease DQ delay. This way, if DQS shifts, you shift the DQ sampling point to follow DQS.

Regardless of this, DQ delays themselves must be adjusted so that you sample in the middle of the window.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 06, 2021, 01:41:14 am
Quote
Your DQS IO logic is clocked by a clock. You need to align DQS to this clock. If you sample DQS with the rising edge of the clock, you can get different responses:

1. If you get always '0' which means that the clock rising edge already happened, but DQS risding edge didn't. DQS needs to be moved earlier by decreasing DQS delay.

2. If you get always '1' which means that the clock rising edge happens after DQS edge. Therefore, DQS's delay must be increased.

3. If you're somewhere in the middle (in the jitter zone) then DQS and the clock are aligned.

@NorthGuy

In the following waveform, we have incoming read DQS strobe as well as parallel DQ bits, and also 90-degree phase shifted DQ bits with respect to DQS strobe.

I do not quite understand your quoted points #1 and #2. 

No matter how the DQS strobe is delayed within a single bit period, DQS strobe will always sample the values of DQ bits correctly, although it would not be in the center of DQ bits for all such phase shift delay choices.

Therefore, I am confused as in how your quoted points #1 and #2 actually makes sure that DQS strobe is delay-shifted to the CENTER of DQ bits ?

(https://i.imgur.com/9qmYSSd.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 06, 2021, 02:59:41 am
No matter how the DQS strobe is delayed within a single bit period, DQS strobe will always sample the values of DQ bits correctly, although it would not be in the center of DQ bits for all such phase shift delay choices.

Except you cannot use DQS strobe to sample DQ because you must use the same clock for both input and output SERDES (At least that's what  the table you have posted said). Thus you must sample DQ with the output clock. Therefore you need to align DQ so that the receiving clock edges are centered in DQ bits.

Even if you could sample DQ with DQS strobes (that is if you could route DQS to clock ILOGIC flip-flops), you then must transfer the results to a regular clock domain. To achieve this, DQS would have to be roughly aligned with the clock. So, the DQ/DQS groups would have to be delayed.

Title: Re: DDR3 initialization sequence issue
Post by: promach on July 06, 2021, 03:10:43 am
Quote
Except you cannot use DQS strobe to sample DQ because you must use the same clock for both input and output SERDES (At least that's what  the table you have posted said). Thus you must sample DQ with the output clock. Therefore you need to align DQ so that the receiving clock edges are centered in DQ bits.

@NorthGuy

But how exactly are points #1 and #2 (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3601332/#msg3601332) being applied to achieve such purpose ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 06, 2021, 03:44:34 am
It seems that points #1 and #2 assume that incoming parallel DQ bits are length-matched with read DQS strobe ?

And I think you were implying to use XOR operation between FPGA PLL-ed's clock and the read DQS strobe ?

Please correct me if wrong.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 06, 2021, 03:46:15 am
But how exactly are points #1 and #2 (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3601332/#msg3601332) being applied to achieve such purpose ?

While you read the pattern from DDR3 chip, you shift DQS to be in phase with the clock and maintain it that way. You know how big is the shift, so you know how much you need to shift DQ to move it to the point where the sampling clock will be centred in the DQ bit.

You don't have to do this. You can determine (test or guess) the necessary delays and hard-code them into your design.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 06, 2021, 03:50:00 am
It seems that points #1 and #2 assume that incoming parallel DQ bits are length-matched with read DQS strobe ?

They must be length-matched. However, if there's a mismatch, you can add/remove taps to IODELAYs to compensate.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 06, 2021, 04:38:41 am
Quote
They must be length-matched. However, if there's a mismatch, you can add/remove taps to IODELAYs to compensate.

@NorthGuy

What if the following skew situation also happens between those parallel incoming READ DQ bits ?

And in such high working frequency of DDR3 RAM, it might be impossible to calibrate skew phase between incoming DQ bits.  In other words, COMBINATIONAL logic comparison between those DQ bits would not be feasible in this case.

Note: pairwise comparison between DQS and a particular single DQ bit would not work in all corner test cases (https://www.xilinx.com/support/documentation/ip_documentation/ultrascale_memory_ip/v1_4/pg150-ultrascale-memory-ip.pdf#page=362), therefore ^DQ xor operation is needed.  However, my DDR3 RAM is of x16 configuration, which means DQ_BITWIDTH=16.  So, this really exacerbates the setup timing issue even further.

Not to mention that there would also be placement and routing issue for fixed amount of IODELAY2 primitive in a given hardware block inside xilinx spartan-6 chip.

What do you think ?

(https://i.imgur.com/C5rIJeo.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 06, 2021, 07:37:25 am
Ok, you need to read the data sheet's worst case drift on the IO performance and do a little math.

In my example case, I have 16 tuning steps until my clock rotates 360 degrees.

MAX10-6 FPGA...

At 250Mhz, 7 of 16 steps give me true error free data. (ram underclocked)
At 300Mhz, 7 of 16 steps give me true error free data.
At 350Mhz, 7 of 16 steps give me true error free data. (fpga CK/DQS/DQ IOs buffers overclocked.)
At 400Mhz, 6 of 16 steps give me true error free data.
At 450Mhz, 5 of 16 steps give me true error free data. (fpag DDR3 core and write data DQ serdes overclocked.)
At 500Mhz, 5 of 16 steps give me true error free data. (fpag read data DQ serdes over clocked at this point)

With this, at 500Mhz / 1gtps, it is possible to calculate the # of picosecond play of valid data I get when tuned in the middle and how much each tuning step gives me.

Also, there is additional error 1 tuning point at one end where around half of the 16 bits are correct, the rest are jiggling. 

It is this '1' tuning transition point which should give you the idea as the timing errors between the 16 bits.

Sorry I cannot test above 500Mhz, the Max10 completely fails to do anything.  The DECA board I used has a single 800MHz/1600mtps 16 bit DDR3 ram chip.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 06, 2021, 08:16:08 am
Quote
It is this '1' tuning transition point which should give you the idea as the timing errors between the 16 bits.

@BrianHG

What do you exactly mean by this '1' tuning transition point ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 06, 2021, 08:50:05 am
[These values are now correct...]

Example: at 300MHz, there are 7 of 16 tuning steps where all the bits and the DQS pattern is valid, but, there is also an 8th tuning step where the DQS still reads valid, but half of the read data bits have random noise between the previous byte and next byte being read.

8 out of 16 means a perfect 180 degrees of tuning where read data is error free.  And since the DQS is a single differential bit being received, it is fairly clear why I am able to get a proper read from the DQS for the entire 180 degrees, 8 tuning steps.

360/16 = 22.5 degrees per tuning step.
@300 MHz, 3333ps clock, each tuning step is 208ps.
@500 MHz, 2000ps clock, each tuning step is 125ps.


So, within 208ps, 1 tuning step, there is junk data.  Tune 1 before and everything reads good.  Tune 1 after and everything reads bad.

@300Mhz, there are 7 steps with all good data.  That's a 1458ps valid data read window.

@500MHz, there are only 5 good tuning steps.  That's a 625ps read window.  The MAX 10 is showing it's IO limit here as well a trace length matching on the PCB and IO trace cross-talk.  If it had 7 good tuning steps, the read window would be 875ps out of a perfect 1000ps.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 06, 2021, 09:06:21 am
Note that my read clock is parallel for all DQ bits as well as the DQS.  I do not have any individual tuning skew adjustments on any of the IO pins.  Everything is sampled and transmitted in parallel.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 06, 2021, 09:16:03 am
ERROR ABOVE:

Example: at 300MHz, there are 7 of 16 tuning steps where all the bits and the DQS pattern is valid, but, there is also an 8th tuning step where the DQS still reads valid, but half of the read data bits have random noise between the previous byte and next byte being read.

8 out of 16 means a perfect 180 degrees of tuning where read data is error free.  And since the DQS is a single differential bit being received, it is fairly clear why I am able to get a proper read from the DQS for the entire 180 degrees, 8 tuning steps.

360/16 = 22.5 degrees per tuning step.
@300 MHz, 3333ps clock, each tuning step is 208ps.
@500 MHz, 2000ps clock, each tuning step is 125ps.


So, within 208ps, 1 tuning step, there is junk data.  Tune 1 before and everything reads good.  Tune 1 after and everything reads bad.

@300Mhz, there are 7 steps with all good data.  That's a 1458ps valid data read window.

@500MHz, there are only 5 good tuning steps.  That's a 625ps read window.  The MAX 10 is showing it's IO limit here as well a trace length matching on the PCB and IO trace cross-talk.  If it had 7 good tuning steps, the read window would be 875ps.

Also, remember,@300MHz 3333.3ps, the valid data is held for 1666ps where my valid tuning window is 1458ps.  That's 208ps in the junk zone, or 1458ps out of a theoretical perfect 1666ps.



Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 06, 2021, 12:20:42 pm
What if the following skew situation also happens between those parallel incoming READ DQ bits ?

If your DQ lines are not synchronized, you can adjust delays on each of them. But they're length matched, so you can assume that the skew between them is small enough.

And in such high working frequency of DDR3 RAM, it might be impossible to calibrate skew phase between incoming DQ bits.  In other words, COMBINATIONAL logic comparison between those DQ bits would not be feasible in this case.

Sure combinatorial logic is impossible. You try to achieve alignments to 50-100 ps. Routing to any combinatorial logic and combinatorial delays will be around 1 ns at best.  This will destroy time relationship between signals to the point where your analysis is totally useless. You must calibrate timing using the same structures that you're going to use for sampling real data - ISERDES or IDDR.

All you can do is tweak delays and clock phases and observe what you sample.

Note: pairwise comparison between DQS and a particular single DQ bit would not work in all corner test cases (https://www.xilinx.com/support/documentation/ip_documentation/ultrascale_memory_ip/v1_4/pg150-ultrascale-memory-ip.pdf#page=362), therefore ^DQ xor operation is needed ...

Your memory is DDR3 not QDR II. Using materials about QDR II will only confuse things.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 06, 2021, 12:40:36 pm

Sure combinatorial logic is impossible. You try to achieve alignments to 50-100 ps. Routing to any combinatorial logic and combinatorial delays will be around 1 ns at best.  This will destroy time relationship between signals to the point where your analysis is totally useless. You must calibrate timing using the same structures that you're going to use for sampling real data - ISERDES or IDDR.

Attempting to use combinational is such a way can even makes things worse as each time you compile, minute movement of logic due to surrounding gates can throw this strategy completely off.  Even that best 1ns figure wouldn't be guaranteed.  Get your read and writes directly on the IO pin's dedicated hardware DDR registers without any added logic tied to the pin, and work out a way to analyze the data and tune after that data has been converted to your main system clock.


Title: Re: DDR3 initialization sequence issue
Post by: promach on July 06, 2021, 01:28:57 pm
@NorthGuy On Possible Clock Structures for Bidirectional I/O (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=51) and also the next page, it seems that only the same clock is needed at the IO interface, not the entire Tx/Rx pipeline.  Please correct me if wrong.

Rx: RAM -> IOBUF (for inout) -> IDELAY (DQS Centering) -> IDDR2 (input DDR buffer) -> ISERDES      
Tx: OSERDES -> ODDR2 (output DDR buffer) -> ODELAY (DQS Centering) -> IOBUF (for inout) -> RAM


Quote
Also, remember,@300MHz 3333.3ps, the valid data is held for 1666ps where my valid tuning window is 1458ps.  That's 208ps in the junk zone, or 1458ps out of a theoretical perfect 1666ps.

@BrianHG I am bit confused with what you meant by junk zone.  How did you come up with the value of 1458ps ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 06, 2021, 02:02:25 pm
@NorthGuy Figure 2-2 on page 50 explains it all. The whole Tx/Rx pipeline needs to use the same clock.

By the way, how to modify the following code related to IDDR and ISERDES (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L814-L815) for same clock restriction ?

Code: [Select]
always @(dq_r_q0, dq_r_q1, delayed_dqs_r)
dq_r_iserdes <= (delayed_dqs_r) ?  dq_r_q0: dq_r_q1;

(https://i.imgur.com/xgrbsxO.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 06, 2021, 02:11:08 pm
Quote
Also, remember,@300MHz 3333.3ps, the valid data is held for 1666ps where my valid tuning window is 1458ps.  That's 208ps in the junk zone, or 1458ps out of a theoretical perfect 1666ps.

@BrianHG I am bit confused with what you meant by junk zone.  How did you come up with the value of 1458ps ?

Ok, at 303MHz, my read clock is 3333ps.
I have 16 tuning steps for 360 degrees, IE one complete rotation, so, 3333ps divided by 16 steps means 208.3ps per step.
Since the data is DDR, this means a 16bit word is coming out of the DDR3 every 3333ps/2 = 1666.5ps.
You can also see that 208.3ps * 1/2 of 16 steps is also 1666.5ps.

Now, within the correct sampling phase, IE, not 180 degrees out of phase, if my DDR3 has dead perfect tuning and the IO pins had better than 208ps precision, I would get 8 tuning steps with error free data, but I am not.  I'm only getting 7 tuning steps with perfect data, and the last one tuning step where some of the data bit have noise in the data I read.  This means somewhere in that 208ps, there are transitions right on the edge.

If I were to employ the advanced IO features in Altera's DDR_IO to tune each data DQ wire with the up to +/-50ps delay, with 1ps resolution increments, I would be able to tune out those few data error bit and get a full perfect 8 tuning steps with no read errors.  However, already having 7 good tuning steps and choosing the center of the 7, I have 3 good steps up and 3 good steps down.  This means that the DDR3 would have to drift +3x208ps=924ps, or drift -924ps since I calibrated on power-up before my data becomes corrupt.  And reading in the data sheet the extent which the DQS and DQ outputs of the DDR3 can move over the entire operating range of voltage and temperature, this is unlikely to happen.  At 1GHz, 2000mtps, that tuning window is so tight that you will want occasional re-tuning and perhaps a thermo-couple on you PCB to know when to do so.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 06, 2021, 03:19:51 pm
@BrianHG I am bit confused with what you meant by junk zone.  How did you come up with the value of 1458ps ?

Google "eye diagram". When you sample a signal, there will be a windows where the signal is stable and can be sampled, and jitter zone where the signal is transitioning and sampling yields random results.

By the way, how to modify the following code related to IDDR and ISERDES (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L814-L815) for same clock restriction ?

I use VHDL. I simply instantiate primitives and connect their pins.
Title: Re: DDR3 initialization sequence issue
Post by: asmi on July 06, 2021, 03:40:46 pm
Depending on the DDR3 chip speed bin, DQS can be up to 400 ps before CK to up to 400 ps after, for most commonly used DDR3L-1600 chips it's ±225ps. DQ can appear anytime from tLZDQ(min) (450ps for DDR3L-1600) before DQS to tDQSQ(max) (225ps) after it, and each transition can happen anywhere within this interval, so you can not calibrate this out. The hold time for DQ is defined as 0.38*tCK minimum from the DQS edge (tQH), so your data valid window is tQH - tDQSQ(max), so for DDR3L-1600 running at 400 MHz it is 0.38*2000ps - 225ps = 760 ps. For the same device running at 300 MHz it will be 1042 ps. For comparison's sake, that device running at it's max frequency of 800 MHz would leave a window of only 250 ps. Now, all of that is assuming you sample using actual DQS signal as a clock (the way it's designed and intended to work), if you use the main clock, you will have to reduce your window by the tDQSCK time (225 ps), and compensate for possible DQS duty cycle variation (it can be as short as 0.4 tCK), so the worst case for 400 MHz can be 760 ps - 225 ps - 200 ps (0.1 tCK for duty cycle shortfall) - 128 ps (cumulative error derating) = 207 ps, and the window will be fully closed for 800 MHz. I don't know if DQS to CK offset is stable for specific chip (so it can be calibrated out), if so, the window will be wider by tDQSCK, but this will only work for a single x8 chip and a perfect routing (which is way beyond DDR3's routing guidelines). For anything other that x8 CK-to-DQS time will almost always be different for different DQ groups, unless - again - you go that extra mile while routing to ensure perfect matching. For off-the-shelf boards it will likely to NOT be the case, as DDR3 routing guidelines only call for matching within DQ groups, and there is no requirement of any of these groups to be matched to ADDR/CTRL/CK group.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 06, 2021, 05:25:13 pm
@Northguy


Code: [Select]
// IODDR2 primitives are needed because the 'dq' signals are of double-data-rate
// [url]https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=123[/url]

// IDDR2: Input Double Data Rate Input Register with Set, Reset and Clock Enable.
// Spartan-6
// Xilinx HDL Libraries Guide, version 14.7

IDDR2 #(
.DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
.INIT_Q0(1'b0),  // Sets initial state of the Q0 output to 1'b0 or 1'b1
.INIT_Q1(1'b0),  // Sets initial state of the Q1 output to 1'b0 or 1'b1
.SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
)
IDDR2_dq_r(
.Q0(dq_r_q0[dq_index]),  // 1-bit output captured with C0 clock
.Q1(dq_r_q1[dq_index]),  // 1-bit output captured with C1 clock
.C0(ck),  // 1-bit clock input
.C1(ck_180),  // 1-bit clock input
.CE(1'b1),  // 1-bit clock enable input
.D(delayed_dq_r[dq_index]),    // 1-bit DDR data input
.R(reset),    // 1-bit reset input
.S(1'b0)     // 1-bit set input
);
// End of IDDR2_inst instantiation

Remember that my verilog coding needs to adhere to the similar clock restriction imposed by xilinx (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=51). 

Which of the following code snippet should I use to transfer both dq_r_q0, dq_r_q1 signals to my own deserializer module ?

Code: [Select]
always @(dq_r_q0, dq_r_q1, delayed_dqs_r)
dq_r_iserdes <= (delayed_dqs_r) ?  dq_r_q0: dq_r_q1;


Code: [Select]
always @(dq_r_q0, dq_r_q1, delayed_dqs_r)
  dq_r_iserdes <= (ck) ?  dq_r_q0: dq_r_q1;


Code: [Select]
        always @(posedge ck, negedge ck)
dq_r_iserdes <= (ck) ?  dq_r_q0: dq_r_q1;


delayed_dqs_r is almost similar to ck clock signal after initial MPR_Read_function phase calibration.

Note: dq_r_q0, dq_r_q1 signals are outputs from IDDR2 primitive which takes in delayed_dq_r as input
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 06, 2021, 06:11:27 pm
Remember that my verilog coding needs to adhere to the similar clock restriction imposed by xilinx (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=51).

This requirement simply says that C0/C1 inputs of IDDR/ISERDES and C0/C1 inputs of ODDR/OSERDES are the same (most likely physically connected to the same wire).

Which of the following code snippet should I use to transfer both dq_r_q0, dq_r_q1 signals to my own deserializer module ?

None of these. The best way, of course, is to use ISERDES instead of IDDR (and OSERDES instead of ODDR). This way all is done in hardware and there's nothing to worry about.

However, if you want to build your own serdeses feeding from IDDR,  you cannot clump dq_r_q0 and dq_r_q1 back into a single signal and feed this signal to your serdes. You will need to build two separate serdeses - one for dq_r_q0, and another one for dq_r_q1.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 06, 2021, 11:45:58 pm
You will need to build two separate serdeses - one for dq_r_q0, and another one for dq_r_q1.
?  Xilinx DDR input doesn't translate the 180 degree clocked input over to the 0 degree clock in with it's own internal 180 degree shift d-latch at the input pin?

This would mean 2 serdes running on 2 different clocks instead of 1 serdes with double width data running on 1 clock.

It would look funny if having an IO serdes function where you always had to un-zipper the data coming in or out with 2 clock domains instead of the function with dedicated hardware doing it for you.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 07, 2021, 12:36:44 am
?  Xilinx DDR input doesn't translate the 180 degree clocked input over to the 0 degree clock in with it's own internal 180 degree shift d-latch at the input pin?

This would mean 2 serdes running on 2 different clocks instead of 1 serdes with double width data running on 1 clock.

You can configure IDDR to produce both outputs in the same clock domain (either original clock C0 or inverted original clock C1). So, both of the home-made serdeses would be in the same clock domain.

Xilinx also has built-in hardware ISERDES which can take DDR signal and produces the results in a new clock domain (aligned with original clock but divided). IMHO, It would be better to use this ISERDES instead of using fabric.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 07, 2021, 01:29:03 am
?  Xilinx DDR input doesn't translate the 180 degree clocked input over to the 0 degree clock in with it's own internal 180 degree shift d-latch at the input pin?

This would mean 2 serdes running on 2 different clocks instead of 1 serdes with double width data running on 1 clock.

You can configure IDDR to produce both outputs in the same clock domain (either original clock C0 or inverted original clock C1). So, both of the home-made serdeses would be in the same clock domain.

Xilinx also has built-in hardware ISERDES which can take DDR signal and produces the results in a new clock domain (aligned with original clock but divided). IMHO, It would be better to use this ISERDES instead of using fabric.
Original clock domain is fine as it would be a compatible buffer with Altera's DDRIO and Lattice DDRIO buffers.  My software serdes can run over 500MHz on the slower Altera FPGA and should run faster on Lattice ECP5 & even a lot faster on any Xilinx spartan as it is nothing more than a serial chain of parallel latches with a reset-able 2 bit counter to latch enable the final 4 into 1 bunch.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 07, 2021, 03:38:19 am
For now, I am using different clock domain for the two separate deserializer. (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L835-L899)

I will change that once code compilation finished smoothly inside ISE tool.

By the way, why do I have the following error with clk signal inside pll_ddr (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L586) ?

Code: [Select]
Mapping all equations...
ERROR:Xst:2035 - Port <clk> has illegal connections. This port is connected to an input buffer and other components.
Input Buffer:
   Port <I> of node <clkin1_buf> (IBUFG) in unit <pll_ddr>
Other Components:
   Port <C> of node <_i000050> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000051> (FDRS) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000049> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000047_0> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000047_1> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000047_2> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000047_3> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000047_4> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000047_5> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000047_6> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000047_7> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_24> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_23> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_22> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_21> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_20> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_19> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_18> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_17> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_16> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_5> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_7> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_6> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_2> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_4> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_3> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000048> (FDRSE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_40> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <data_to_ram_71> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_0> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_1> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_2> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_5> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_3> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_4> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_6> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_7> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_10> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_8> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_9> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_11> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_12> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_13> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_14> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_15> (FDRE) in unit <test_ddr3_memory_controller>
   Port <C> of node <_i000074_16> (FDRE) in unit <test_ddr3_memory_controller>
   Port <CLK> of node <ila_write_enable> (ila_1_bit) in unit <test_ddr3_memory_controller>
   Port <CLK> of node <ila_done> (ila_1_bit) in unit <test_ddr3_memory_controller>
   Port <CLK> of node <ila_ck_n> (ila_1_bit) in unit <test_ddr3_memory_controller>
   Port <CLK> of node <ila_dq_w> (ila_16_bits) in unit <test_ddr3_memory_controller>
   Port <CLK> of node <ila_states_and_commands> (ila_16_bits) in unit <test_ddr3_memory_controller>
   Port <CLK> of node <ila_states_and_wait_count> (ila_64_bits) in unit <test_ddr3_memory_controller>
   Port <C> of node <idelay_is_busy_previously> (FD) in unit <ddr3_control>
   Port <C> of node <iodelay_startup_counter_0> (FD) in unit <ddr3_control>
   Port <C> of node <iodelay_startup_counter_1> (FD) in unit <ddr3_control>
   Port <C> of node <iodelay_startup_counter_2> (FD) in unit <ddr3_control>
   Port <C> of node <iodelay_startup_counter_3> (FD) in unit <ddr3_control>
   Port <C> of node <iodelay_startup_counter_4> (FD) in unit <ddr3_control>
   Port <C> of node <iodelay_startup_counter_5> (FD) in unit <ddr3_control>
   Port <C> of node <iodelay_startup_counter_6> (FD) in unit <ddr3_control>
   Port <C> of node <iodelay_startup_counter_7> (FD) in unit <ddr3_control>
   Port <C> of node <iodelay_startup_counter_8> (FD) in unit <ddr3_control>
   Port <C> of node <iodelay_startup_counter_9> (FD) in unit <ddr3_control>
   Port <C> of node <iodelay_startup_counter_10> (FD) in unit <ddr3_control>
   Port <C> of node <iodelay_startup_counter_11> (FD) in unit <ddr3_control>
   Port <CLK> of node <iodelay_dqs_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[0].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[1].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[2].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[3].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[4].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[5].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[6].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[7].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[8].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[9].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[10].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[11].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[12].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[13].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[14].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>
   Port <CLK> of node <dq_io[15].iodelay_dq_r> (IODELAY2) in unit <ddr3_control>

Title: Re: DDR3 initialization sequence issue
Post by: promach on July 07, 2021, 05:53:37 am
I solved the above pll_ddr error by setting "No Buffer" for input clock inside PLL clocking wizard.

(https://i.imgur.com/7RenDQI.png)


However, I got some "tri-state" related errors (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1258) during mapping process.

Code: [Select]
NGDBUILD done.

Process "Translate" completed successfully

Started : "Map".
Running map...
Command Line: map -intstyle ise -p xc6slx16-ftg256-3 -w -logic_opt off -ol high -t 1 -xt 0 -register_duplication off -r 4 -global_opt off -mt off -ir off -pr off -lc off -power off -o test_ddr3_memory_controller_map.ncd test_ddr3_memory_controller.ngd test_ddr3_memory_controller.pcf
Using target part "6slx16ftg256-3".
Mapping design into LUTs...
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[0].IO_dq/OBUFT" (output signal=dq<0>) has its
   I input pin driven by
   'ddr3_control/physical_group_dq_w<0>/dq_io[0].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[0].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[0].IO_dq/OBUFT" (output signal=dq<0>) has its T input pin
   driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[1].IO_dq/OBUFT" (output signal=dq<1>) has its
   I input pin driven by
   'ddr3_control/physical_group_dq_w<1>/dq_io[1].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[1].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[1].IO_dq/OBUFT" (output signal=dq<1>) has its T input pin
   driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[2].IO_dq/OBUFT" (output signal=dq<2>) has its
   I input pin driven by
   'ddr3_control/physical_group_dq_w<2>/dq_io[2].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[2].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[2].IO_dq/OBUFT" (output signal=dq<2>) has its T input pin
   driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[3].IO_dq/OBUFT" (output signal=dq<3>) has its
   I input pin driven by
   'ddr3_control/physical_group_dq_w<3>/dq_io[3].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[3].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[3].IO_dq/OBUFT" (output signal=dq<3>) has its T input pin
   driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[4].IO_dq/OBUFT" (output signal=dq<4>) has its
   I input pin driven by
   'ddr3_control/physical_group_dq_w<4>/dq_io[4].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[4].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[4].IO_dq/OBUFT" (output signal=dq<4>) has its T input pin
   driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[5].IO_dq/OBUFT" (output signal=dq<5>) has its
   I input pin driven by
   'ddr3_control/physical_group_dq_w<5>/dq_io[5].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[5].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[5].IO_dq/OBUFT" (output signal=dq<5>) has its T input pin
   driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[6].IO_dq/OBUFT" (output signal=dq<6>) has its
   I input pin driven by
   'ddr3_control/physical_group_dq_w<6>/dq_io[6].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[6].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[6].IO_dq/OBUFT" (output signal=dq<6>) has its T input pin
   driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[7].IO_dq/OBUFT" (output signal=dq<7>) has its
   I input pin driven by
   'ddr3_control/physical_group_dq_w<7>/dq_io[7].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[7].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[7].IO_dq/OBUFT" (output signal=dq<7>) has its T input pin
   driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[8].IO_dq/OBUFT" (output signal=dq<8>) has its
   I input pin driven by
   'ddr3_control/physical_group_dq_w<8>/dq_io[8].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[8].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[8].IO_dq/OBUFT" (output signal=dq<8>) has its T input pin
   driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[9].IO_dq/OBUFT" (output signal=dq<9>) has its
   I input pin driven by
   'ddr3_control/physical_group_dq_w<9>/dq_io[9].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[9].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[9].IO_dq/OBUFT" (output signal=dq<9>) has its T input pin
   driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[10].IO_dq/OBUFT" (output signal=dq<10>) has
   its I input pin driven by
   'ddr3_control/physical_group_dq_w<10>/dq_io[10].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[10].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[10].IO_dq/OBUFT" (output signal=dq<10>) has its T input
   pin driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[11].IO_dq/OBUFT" (output signal=dq<11>) has
   its I input pin driven by
   'ddr3_control/physical_group_dq_w<11>/dq_io[11].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[11].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[11].IO_dq/OBUFT" (output signal=dq<11>) has its T input
   pin driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[12].IO_dq/OBUFT" (output signal=dq<12>) has
   its I input pin driven by
   'ddr3_control/physical_group_dq_w<12>/dq_io[12].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[12].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[12].IO_dq/OBUFT" (output signal=dq<12>) has its T input
   pin driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[13].IO_dq/OBUFT" (output signal=dq<13>) has
   its I input pin driven by
   'ddr3_control/physical_group_dq_w<13>/dq_io[13].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[13].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[13].IO_dq/OBUFT" (output signal=dq<13>) has its T input
   pin driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[14].IO_dq/OBUFT" (output signal=dq<14>) has
   its I input pin driven by
   'ddr3_control/physical_group_dq_w<14>/dq_io[14].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[14].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[14].IO_dq/OBUFT" (output signal=dq<14>) has its T input
   pin driven by an ODDR2.
ERROR:LIT:675 - In Spartan-6 devices when the data path on a TBUF is driven by
   an ODDR2, tri state path of the same TBUF must also be driven by an ODDR2.
   TBUF symbol "ddr3_control/dq_io[15].IO_dq/OBUFT" (output signal=dq<15>) has
   its I input pin driven by
   'ddr3_control/physical_group_dq_w<15>/dq_io[15].ODDR2_dq_w/D1' which is a D1
   pin of an ODDR2. However, no ODDR2 was found driving the corresponding T
   input pin 'ddr3_control/dq_io[15].IO_dq/OBUFT/E', which will result in only
   half of the ODDR2's data path being enabled. This will cause the design to
   not work in hardware. Please ensure that TBUF symbol
   "ddr3_control/dq_io[15].IO_dq/OBUFT" (output signal=dq<15>) has its T input
   pin driven by an ODDR2.
Errors found during logical drc.

Design Summary
--------------
Number of errors   :  16
Number of warnings :   0

Process "Map" failed


The root cause is at the T input port of IOBUF (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1211-L1212) which has to be made to use ODDR primitive as well (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=61), but I am not sure how to do this exactly.

Code: [Select]
IOBUF IO_dq (
.IO(dq[dq_index]),
.I(dq_w[dq_index]),
.T(((wait_count > TIME_RL) && (main_state == STATE_READ_AP)) ||
  (main_state == STATE_READ_DATA)),
.O(dq_r[dq_index])
);
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 07, 2021, 10:09:02 am
Now that adding ODDR2 for dq_iobuf_enable signal got me into another different error (http://en.qi-hardware.com/mmlogs/milkymist_2013-09-17.log.html) :

Quote
ERROR:Pack:2531 - The dual data rate register "ddr3_control/ODDR2_dq_iobuf_en"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/ODDR2_dq_iobuf_en requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.

I tried to solve this new error using (* IOB = "FORCE" *) (https://forums.xilinx.com/t5/Implementation/Instantiation-of-ODDR2/m-p/682061/highlight/true#M14722), but it is NOT helping.

Any advice ?

(https://i.imgur.com/6fhR9PD.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 07, 2021, 02:18:26 pm
Quote
ERROR:Pack:2531 - The dual data rate register "ddr3_control/ODDR2_dq_iobuf_en"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/ODDR2_dq_iobuf_en requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.

I tried to solve this new error using (* IOB = "FORCE" *) (https://forums.xilinx.com/t5/Implementation/Instantiation-of-ODDR2/m-p/682061/highlight/true#M14722), but it is NOT helping.

Any advice ?

Each DQ line will require its own ODDR for T pin. If you want to use one ODDR for all this would require its output to be routed through the fabric to multiple T pins. This is impossible, ODDR belongs to the specific IOB.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 07, 2021, 02:29:17 pm
Quote
Each DQ line will require its own ODDR for T pin. If you want to use one ODDR for all this would require its output to be routed through the fabric to multiple T pins. This is impossible, ODDR belongs to the specific IOB.

@NorthGuy

So in this particular case, how do I get around this issue with ODDR for T pin ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 07, 2021, 02:37:20 pm
@NorthGuy  I followed your suggestion that each DQ line will require its own ODDR for T pin,  and then half of the errors are gone with the remaining errors related to ODDR2_dq_w (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1280) as shown below:

Code: [Select]
Started : "Map".
Running map...
Command Line: map -intstyle ise -p xc6slx16-ftg256-3 -w -logic_opt off -ol high -t 1 -xt 0 -register_duplication off -r 4 -global_opt off -mt off -ir off -pr off -lc off -power off -o test_ddr3_memory_controller_map.ncd test_ddr3_memory_controller.ngd test_ddr3_memory_controller.pcf
Using target part "6slx16ftg256-3".
Mapping design into LUTs...
Running directed packing...
ERROR:Pack:2531 - The dual data rate register "ddr3_control/dq_io[0].ODDR2_dq_w"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/dq_io[0].ODDR2_dq_w requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register "ddr3_control/dq_io[1].ODDR2_dq_w"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/dq_io[1].ODDR2_dq_w requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register "ddr3_control/dq_io[2].ODDR2_dq_w"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/dq_io[2].ODDR2_dq_w requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register "ddr3_control/dq_io[3].ODDR2_dq_w"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/dq_io[3].ODDR2_dq_w requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register "ddr3_control/dq_io[4].ODDR2_dq_w"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/dq_io[4].ODDR2_dq_w requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register "ddr3_control/dq_io[5].ODDR2_dq_w"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/dq_io[5].ODDR2_dq_w requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register "ddr3_control/dq_io[6].ODDR2_dq_w"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/dq_io[6].ODDR2_dq_w requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register "ddr3_control/dq_io[7].ODDR2_dq_w"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/dq_io[7].ODDR2_dq_w requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register "ddr3_control/dq_io[8].ODDR2_dq_w"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/dq_io[8].ODDR2_dq_w requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register "ddr3_control/dq_io[9].ODDR2_dq_w"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/dq_io[9].ODDR2_dq_w requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register
   "ddr3_control/dq_io[10].ODDR2_dq_w" failed to join the "OLOGIC2" component as
   required.  The output signal for register symbol
   ddr3_control/dq_io[10].ODDR2_dq_w requires general routing to fabric, but the
   register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register
   "ddr3_control/dq_io[11].ODDR2_dq_w" failed to join the "OLOGIC2" component as
   required.  The output signal for register symbol
   ddr3_control/dq_io[11].ODDR2_dq_w requires general routing to fabric, but the
   register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register
   "ddr3_control/dq_io[12].ODDR2_dq_w" failed to join the "OLOGIC2" component as
   required.  The output signal for register symbol
   ddr3_control/dq_io[12].ODDR2_dq_w requires general routing to fabric, but the
   register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register
   "ddr3_control/dq_io[13].ODDR2_dq_w" failed to join the "OLOGIC2" component as
   required.  The output signal for register symbol
   ddr3_control/dq_io[13].ODDR2_dq_w requires general routing to fabric, but the
   register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register
   "ddr3_control/dq_io[14].ODDR2_dq_w" failed to join the "OLOGIC2" component as
   required.  The output signal for register symbol
   ddr3_control/dq_io[14].ODDR2_dq_w requires general routing to fabric, but the
   register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register
   "ddr3_control/dq_io[15].ODDR2_dq_w" failed to join the "OLOGIC2" component as
   required.  The output signal for register symbol
   ddr3_control/dq_io[15].ODDR2_dq_w requires general routing to fabric, but the
   register can only be routed to ILOGIC, IODELAY, and IOB.

Mapping completed.
See MAP report file "test_ddr3_memory_controller_map.mrp" for details.
Problem encountered during the packing phase.

Design Summary
--------------
Number of errors   :  16
Number of warnings :   0

Process "Map" failed
Title: Re: DDR3 initialization sequence issue
Post by: asmi on July 07, 2021, 02:37:47 pm
In 7 series there is a special mode of a SERDES which takes in x4 DQ signal as well as x4 TQ signal, allowing edge-by-edge control of tristate. Not sure if it's a thing in S6.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 07, 2021, 02:52:33 pm
@asmi I am coding my own serdes due to some placement blockage restriction issues (https://forums.xilinx.com/t5/Implementation/Xilinx-ISE-implementation-stage-issues/m-p/1255587/highlight/true#M30717).

@NorthGuy The reason for the remaining errors are due to the fact that my coding bypasses ODELAY since for write operation to RAM, the controller could just make use of PLL to generate the required phase-shifted clock instead of using ODELAY in order to drive dq_w bits

Tx: OSERDES -> ODDR2 (output DDR buffer) -> ODELAY (DQS Centering) -> IOBUF (for inout) -> RAM

What do you think in this case ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 07, 2021, 03:03:53 pm
@NorthGuy The reason for the remaining errors are due to the fact that my coding bypasses ODELAY since for write operation to RAM, the controller could just make use of PLL to generate the required phase-shifted clock instead of using ODELAY in order to drive dq_w bits

Tx: OSERDES -> ODDR2 (output DDR buffer) -> ODELAY (DQS Centering) -> IOBUF (for inout) -> RAM

What do you think in this case ?

Logically, both O and T pin should be delayed the same. It is most likely hardwired somehow, so you cannot connect it differently than it is already connected.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 07, 2021, 03:22:20 pm
Quote
Logically, both O and T pin should be delayed the same. It is most likely hardwired somehow, so you cannot connect it differently than it is already connected.

You mean the signal at T port of IOBUF (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1235-L1238) also needs IDELAY ?


By the way, my previous post is asking about dq_w , not about dq_r
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 07, 2021, 04:12:31 pm
Quote
Logically, both O and T pin should be delayed the same. It is most likely hardwired somehow, so you cannot connect it differently than it is already connected.

You mean the signal at T port of IOBUF (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1235-L1238) also needs IDELAY ?


By the way, my previous post is asking about dq_w , not about dq_r

I am also talking about the writes. There's ODELAY in your path. The IODELAY has the T pin to change direction, as well as TOUT and DOUT to connect to IOBUF. Even though these pins appear to be independent, they're already connected within FPGA. You need to connect them in your design the same way.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 07, 2021, 04:20:53 pm
Quote
You need to connect them in your design the same way.

Okay, but T port of IOBUF is a bit tricky.  Should I connect T port the same way as write pipeline OR read pipeline ?

Quote
There's ODELAY in your path.

No, there is no ODELAY (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3602558/#msg3602558).  That is the reason for the remaining errors (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3602545/#msg3602545)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 07, 2021, 04:39:45 pm
Let me rephrase my question:

Could both write and read pipeline share the same IODELAY primitives for DQ bits group (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1302-L1338) ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 07, 2021, 05:02:23 pm
Quote
You need to connect them in your design the same way.

Okay, but T port of IOBUF is a bit tricky.  Should I connect T port the same way as write pipeline OR read pipeline ?

T port should resemble write path more - you drive it and it is routed to the T pin of IOBUFT (or IOBUFTDS) in the end. I don't know if you must use bidirectional IODELAY2 primitive when you have bidirectional port with an input delay. What I would do, I would create a small project and tried to connect IOBUFT to IDDR/ODDR through IODELAY2 to see what works. My guess you need something like

outside<->IBUFT(DS)<->IODELAY2<->(IDDR,ODDR for data, ODDR for T)<->fabric

Once you get it right, you can replicate this in your project.

Title: Re: DDR3 initialization sequence issue
Post by: promach on July 08, 2021, 09:29:16 am
Quote
outside<->IBUFT(DS)<->IODELAY2<->(IDDR,ODDR for data, ODDR for T)<->fabric

@Northguy

I updated the code for write pipeline (by not bypassing ODELAY)  (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1336-L1382) to use your quoted suggestion, but ISE tool threw me the following packing error 2530 (https://forums.xilinx.com/t5/Spartan-Family-FPGAs-Archived/ERROR-PACK-2530/td-p/290061):

Code: [Select]
Started : "Map".
Running map...
Command Line: map -intstyle ise -p xc6slx16-ftg256-3 -w -logic_opt off -ol high -t 1 -xt 0 -register_duplication off -r 4 -global_opt off -mt off -ir off -pr off -lc off -power off -o test_ddr3_memory_controller_map.ncd test_ddr3_memory_controller.ngd test_ddr3_memory_controller.pcf
Using target part "6slx16ftg256-3".
Mapping design into LUTs...
Running directed packing...
ERROR:Pack:2530 - The dual data rate register "ddr3_control/dq_io[1].ODDR2_dq_w"
   failed to join an OLOGIC component as required.
ERROR:Pack:2530 - The dual data rate register
   "ddr3_control/dq_io[10].ODDR2_dq_w" failed to join an OLOGIC component as
   required.
ERROR:Pack:2530 - The dual data rate register "ddr3_control/dq_io[6].ODDR2_dq_w"
   failed to join an OLOGIC component as required.
ERROR:Pack:2530 - The dual data rate register
   "ddr3_control/dq_io[15].ODDR2_dq_w" failed to join an OLOGIC component as
   required.
ERROR:Pack:2530 - The dual data rate register "ddr3_control/dq_io[4].ODDR2_dq_w"
   failed to join an OLOGIC component as required.
ERROR:Pack:2530 - The dual data rate register
   "ddr3_control/dq_io[13].ODDR2_dq_w" failed to join an OLOGIC component as
   required.
ERROR:Pack:2530 - The dual data rate register "ddr3_control/dq_io[9].ODDR2_dq_w"
   failed to join an OLOGIC component as required.
ERROR:Pack:2530 - The dual data rate register "ddr3_control/dq_io[2].ODDR2_dq_w"
   failed to join an OLOGIC component as required.
ERROR:Pack:2530 - The dual data rate register
   "ddr3_control/dq_io[11].ODDR2_dq_w" failed to join an OLOGIC component as
   required.
ERROR:Pack:2530 - The dual data rate register "ddr3_control/dq_io[7].ODDR2_dq_w"
   failed to join an OLOGIC component as required.
ERROR:Pack:2530 - The dual data rate register "ddr3_control/dq_io[0].ODDR2_dq_w"
   failed to join an OLOGIC component as required.
ERROR:Pack:2530 - The dual data rate register "ddr3_control/dq_io[5].ODDR2_dq_w"
   failed to join an OLOGIC component as required.
ERROR:Pack:2530 - The dual data rate register
   "ddr3_control/dq_io[14].ODDR2_dq_w" failed to join an OLOGIC component as
   required.
ERROR:Pack:2530 - The dual data rate register "ddr3_control/dq_io[3].ODDR2_dq_w"
   failed to join an OLOGIC component as required.
ERROR:Pack:2530 - The dual data rate register
   "ddr3_control/dq_io[12].ODDR2_dq_w" failed to join an OLOGIC component as
   required.
ERROR:Pack:2530 - The dual data rate register "ddr3_control/dq_io[8].ODDR2_dq_w"
   failed to join an OLOGIC component as required.

Mapping completed.
See MAP report file "test_ddr3_memory_controller_map.mrp" for details.
Problem encountered during the packing phase.

Design Summary
--------------
Number of errors   :  16
Number of warnings :   0

Title: Re: DDR3 initialization sequence issue
Post by: promach on July 08, 2021, 11:20:07 am
@NorthGuy

ERROR:PACK 2530  could be eliminated by not attaching chipscope ILA module onto dq_w signal
AND directly connect the output of ODDR2 primitive to the I port of IOBUF primitive by bypassing ODELAY. 

With such above configuration with ODELAY being bypassed and output of ODDR could not be used for user FPGA verilog logic in fabric, the only way is to drive dqs_w strobe with ck_90 which is 90 degree phase-shifted from ckassign dqs_w = ck_out_90;

After I got past ERROR:PACK 2530, ISE tool threw me some other placement errors (https://forums.xilinx.com/t5/Other-FPGA-Architecture/Place-1198-Error-Route-cause-and-possible-solution/td-p/408295) as follows, but they are not related to dq_w write pipeline anymore.

Code: [Select]
Phase 4.2  Initial Placement for Architecture Specific Features

.......
ERROR:Place:1198 - A PLL clock component is not placed at a routable site. The
   PLL component <ddr3_control/pll_ddr/pll_base_inst/PLL_ADV> is placed at site
   <PLL_ADV_X0Y0>. The corresponding clock load component <ck> is placed at site
   <PAD223>. The PLL can use the fast path between the PLL and the clock load if
   they are placed in adjacent horizontal clock regions. You may want to analyze
   why this problem exists and correct it. This placement is UNROUTABLE in PAR
   and therefore, this error condition should be fixed in your design. You may
   use the CLOCK_DEDICATED_ROUTE constraint in the .ucf file to demote this
   message to a WARNING in order to generate an NCD file. This NCD file can then
   be used in FPGA Editor to debug the problem. A list of all the COMP.PINS used
   in this clock placement rule is listed below. These examples can be used
   directly in the .ucf file to demote this ERROR to a WARNING.
   < PIN "ddr3_control/pll_ddr/pll_base_inst/PLL_ADV.CLKOUT0"
   CLOCK_DEDICATED_ROUTE = FALSE; >
   < NET "ck" CLOCK_DEDICATED_ROUTE = FALSE; >

ERROR:Place:1198 - A PLL clock component is not placed at a routable site. The
   PLL component <ddr3_control/pll_ddr/pll_base_inst/PLL_ADV> is placed at site
   <PLL_ADV_X0Y0>. The corresponding clock load component <udqs> is placed at
   site <PAD199>. The PLL can use the fast path between the PLL and the clock
   load if they are placed in adjacent horizontal clock regions. You may want to
   analyze why this problem exists and correct it. This placement is UNROUTABLE
   in PAR and therefore, this error condition should be fixed in your design.
   You may use the CLOCK_DEDICATED_ROUTE constraint in the .ucf file to demote
   this message to a WARNING in order to generate an NCD file. This NCD file can
   then be used in FPGA Editor to debug the problem. A list of all the COMP.PINS
   used in this clock placement rule is listed below. These examples can be used
   directly in the .ucf file to demote this ERROR to a WARNING.
   < PIN "ddr3_control/pll_ddr/pll_base_inst/PLL_ADV.CLKOUT1"
   CLOCK_DEDICATED_ROUTE = FALSE; >
   < NET "udqs" CLOCK_DEDICATED_ROUTE = FALSE; >

ERROR:Place:1198 - A PLL clock component is not placed at a routable site. The
   PLL component <ddr3_control/pll_ddr/pll_base_inst/PLL_ADV> is placed at site
   <PLL_ADV_X0Y0>. The corresponding clock load component <udqs_n> is placed at
   site <PAD200>. The PLL can use the fast path between the PLL and the clock
   load if they are placed in adjacent horizontal clock regions. You may want to
   analyze why this problem exists and correct it. This placement is UNROUTABLE
   in PAR and therefore, this error condition should be fixed in your design.
   You may use the CLOCK_DEDICATED_ROUTE constraint in the .ucf file to demote
   this message to a WARNING in order to generate an NCD file. This NCD file can
   then be used in FPGA Editor to debug the problem. A list of all the COMP.PINS
   used in this clock placement rule is listed below. These examples can be used
   directly in the .ucf file to demote this ERROR to a WARNING.
   < PIN "ddr3_control/pll_ddr/pll_base_inst/PLL_ADV.CLKOUT3"
   CLOCK_DEDICATED_ROUTE = FALSE; >
   < NET "udqs_n" CLOCK_DEDICATED_ROUTE = FALSE; >
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 08, 2021, 01:20:32 pm
I got past the ERROR:Place:1198 above by using this countermeasure (https://forums.xilinx.com/t5/Other-FPGA-Architecture/Place-1198-Error-Route-cause-and-possible-solution/m-p/408489/highlight/true#M34528).

However, I am quite confused with the following ERROR:Pack:2531 which I am quite certain that my ODDR2 coding (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L643) did not violate ODDR2 output connection requirement.

Code: [Select]
Running directed packing...
ERROR:Pack:2531 - The dual data rate register "ddr3_control/ODDR2_ck_out_90"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/ODDR2_ck_out_90 requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register "ddr3_control/ODDR2_ck_out_90"
   failed to join the "OLOGIC2" component as required.  The output signal for
   register symbol ddr3_control/ODDR2_ck_out_90 requires general routing to
   fabric, but the register can only be routed to ILOGIC, IODELAY, and IOB.
ERROR:Pack:2531 - The dual data rate register "ddr3_control/ODDR2_ck_out" failed
   to join the "OLOGIC2" component as required.  The output signal for register
   symbol ddr3_control/ODDR2_ck_out requires general routing to fabric, but the
   register can only be routed to ILOGIC, IODELAY, and IOB.


The technology schematics also shows that ODDR2_ck_out_90 only connects to IO_ldqs
So, what is wrong ?

(https://i.imgur.com/ACP3iyz.png)

(https://i.imgur.com/7H3egEB.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 08, 2021, 02:16:16 pm
However, I am quite confused with the following ERROR:Pack:2531 which I am quite certain that my ODDR2 coding (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L643) did not violate ODDR2 output connection requirement.

You cannot use ODDR outputs in fabric - it can only go to a pad (or ODELAY perhaps) - but you assign it to signals, such as ldqs_w.

DQS should use its own ODDR which is associated with the DQS pad.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 08, 2021, 02:19:04 pm
Quote
You cannot use ODDR outputs in fabric - it can only go to a pad (or ODELAY perhaps) - but you assign it to signals, such as ldqs_w.

However, the technology schematics has confirmed that this is not the issue.
Please correct me if wrong.


Quote
DQS should use its own ODDR which is associated with the DQS pad.

I had already done this.

Code: [Select]
IOBUF IO_ldqs (
.IO(ldqs),
.I(ldqs_w),
.T(ldqs_iobuf_enable),
.O(ldqs_r)
);

IOBUF IO_ldqs_n (
.IO(ldqs_n),
.I(ldqs_n_w),
.T(ldqs_n_iobuf_enable),
.O(ldqs_n_r)
);

IOBUF IO_udqs (
.IO(udqs),
.I(udqs_w),
.T(udqs_iobuf_enable),
.O(udqs_r)
);

IOBUF IO_udqs_n (
.IO(udqs_n),
.I(udqs_n_w),
.T(udqs_n_iobuf_enable),
.O(udqs_n_r)
);


// see [url]https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=61[/url]

ODDR2 #(
.DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
.INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
.SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
)
ODDR2_ldqs_iobuf_en(
.Q(ldqs_iobuf_enable),  // 1-bit DDR output data
.C0(ck_out_90),  // 1-bit clock input
.C1(ck_270),  // 1-bit clock input
.CE(1'b1),  // 1-bit clock enable input
.D0(data_read_is_ongoing),    // 1-bit DDR data input (associated with C0)
.D1(data_read_is_ongoing),    // 1-bit DDR data input (associated with C1)
.R(reset),    // 1-bit reset input
.S(1'b0)     // 1-bit set input
);

ODDR2 #(
.DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
.INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
.SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
)
ODDR2_ldqs_n_iobuf_en(
.Q(ldqs_n_iobuf_enable),  // 1-bit DDR output data
.C0(ck_out_90),  // 1-bit clock input
.C1(ck_270),  // 1-bit clock input
.CE(1'b1),  // 1-bit clock enable input
.D0(data_read_is_ongoing),    // 1-bit DDR data input (associated with C0)
.D1(data_read_is_ongoing),    // 1-bit DDR data input (associated with C1)
.R(reset),    // 1-bit reset input
.S(1'b0)     // 1-bit set input
);

ODDR2 #(
.DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
.INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
.SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
)
ODDR2_udqs_iobuf_en(
.Q(udqs_iobuf_enable),  // 1-bit DDR output data
.C0(ck_out_90),  // 1-bit clock input
.C1(ck_270),  // 1-bit clock input
.CE(1'b1),  // 1-bit clock enable input
.D0(data_read_is_ongoing),    // 1-bit DDR data input (associated with C0)
.D1(data_read_is_ongoing),    // 1-bit DDR data input (associated with C1)
.R(reset),    // 1-bit reset input
.S(1'b0)     // 1-bit set input
);

ODDR2 #(
.DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
.INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
.SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
)
ODDR2_udqs_n_iobuf_en(
.Q(udqs_n_iobuf_enable),  // 1-bit DDR output data
.C0(ck_out_90),  // 1-bit clock input
.C1(ck_270),  // 1-bit clock input
.CE(1'b1),  // 1-bit clock enable input
.D0(data_read_is_ongoing),    // 1-bit DDR data input (associated with C0)
.D1(data_read_is_ongoing),    // 1-bit DDR data input (associated with C1)
.R(reset),    // 1-bit reset input
.S(1'b0)     // 1-bit set input
);


Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 08, 2021, 02:28:53 pm
However, the technology schematics has confirmed that this is not the issue.

You code ( https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L643 ) says:

Code: [Select]
assign ldqs_w = ck_out_90;
This assigns the output of CK ODDR to ldqs_w.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 08, 2021, 03:19:28 pm
Quote
DQS should use its own ODDR which is associated with the DQS pad.

@NorthGuy

every single differential DQS signals (ldqs, ldqs_n, udqs, udqs_n) needs have its own ODDR.

Those ERROR:Pack:2531 are due to sharing of ODDR among these DQS signals.


Now, I am stucked at this ERROR:Pack:1107 (https://forums.xilinx.com/t5/Implementation/ERROR-Pack-1107-Pack-was-unable-to-combine-the-symbols-listed/td-p/742734) for OBUF primitive (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L601-L612)

Code: [Select]
ERROR:Pack:1107 - Pack was unable to combine the symbols listed below into a
   single IOB component because the site type selected is not compatible.

   Further explanation:
   The pad symbol ddr3_control/ck is connected to a symbol that is outside of
   I/O comp. There is no routing resource between them.

   Symbols involved:
    BUF symbol "ddr3_control/OBUF_ck" (Output Signal = ddr3_control/ck)
    PAD symbol "ddr3_control/ck" (Pad Signal = ddr3_control/ck)


Upon further checking on PCB schematics (https://github.com/promach/DDR/blob/main/PCB_schematics.pdf) of the DDR3 RAM FPGA development board and Spartan-6 FPGA Packaging and Pinouts (https://www.xilinx.com/support/documentation/user_guides/ug385.pdf#page=64), I confirmed that pin E2 is a clock-capable pin.

So, what is wrong here ?

Code: [Select]
grep -n ck *.ucf
8:NET "ck" LOC = E2;
9:NET "ck_n" LOC = E1;
10:NET "ck_en" LOC = F4;
160:NET "ck" IOSTANDARD = LVCMOS25;
161:NET "ck" DRIVE = 12;
162:NET "ck" SLEW = SLOW;
163:NET "ck_n" IOSTANDARD = LVCMOS25;
164:NET "ck_n" DRIVE = 12;
165:NET "ck_n" SLEW = SLOW;
166:NET "ck_en" IOSTANDARD = LVCMOS25;
167:NET "ck_en" DRIVE = 12;
168:NET "ck_en" SLEW = SLOW;


(https://i.imgur.com/GZ1QPLF.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 08, 2021, 03:40:01 pm
Now, I am stucked at this ERROR:Pack:1107 (https://forums.xilinx.com/t5/Implementation/ERROR-Pack-1107-Pack-was-unable-to-combine-the-symbols-listed/td-p/742734) for OBUF primitive (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L601-L612)

The output of OBUF is a pad - the signal travels outside of FPGA. You cannot use it inside FPGA, but you use it in many places, for example:

Code: [Select]
.IOCLK0 (ck), // High speed clock for calibration
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 08, 2021, 03:55:33 pm
Quote
The output of OBUF is a pad - the signal travels outside of FPGA. You cannot use it inside FPGA

@NorthGuy

If I do not use OBUF for the PLL output clock (https://forums.xilinx.com/t5/Other-FPGA-Architecture/Place-1198-Error-Route-cause-and-possible-solution/m-p/408489/highlight/true#M34528) , which primitive should I use instead in order to connect the output of ODDR2_ck_out  (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L628)to FPGA fabric ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 08, 2021, 04:11:49 pm
@NorthGuy

The bitstream is generated without errors using this github commit code modification (https://github.com/promach/DDR/commit/4bd49bed2e1f60cefebdfc964397385f6ece1ea6).

For now, let me check all the warnings during the verilog code compilation process.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 08, 2021, 06:38:24 pm
The bitstream is generated without errors using this github commit code modification (https://github.com/promach/DDR/commit/4bd49bed2e1f60cefebdfc964397385f6ece1ea6).

Looks like you find the answers much faster than I can respond :)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 09, 2021, 03:14:17 am
@BrianHG

Why the modelsim simulation no longer logs all the Micron simulation text output such as initialization sequence settings, which operations happen at which time, ... etc....  ?

Note: I tried to simulate for another 700us, but still there is no more log in the transcript console.

Code: [Select]
restart
# ** Note: (vsim-12125) Error and warning message counts have been reset to '0' because of 'restart'.
# ** Warning: (vsim-3015) [PCDPC] - Port size (2) does not match connection size (1) for port 'tdqs_n'. The port definition is at: Z:/ddr3/ddr3.v(115).
#    Time: 0 ps  Iteration: 0  Instance: /test_ddr3_memory_controller/mem File: Z:/ddr3/test_ddr3_memory_controller.v Line: 577
run 700us
# ddr3.file_io_open: at time                    0 WARNING: no +model_data option specified, using /tmp.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.0.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.1.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.2.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.3.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.4.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.5.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.6.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.7.
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /ddr3/#ASSIGN#543 File: Z:/ddr3/ddr3.v Line: 543
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /ddr3/#ASSIGN#544 File: Z:/ddr3/ddr3.v Line: 544
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /ddr3/#ASSIGN#545 File: Z:/ddr3/ddr3.v Line: 545
# test_ddr3_memory_controller.mem.file_io_open: at time 0.0 ps WARNING: no +model_data option specified, using /tmp.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.0.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.1.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.2.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.3.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.4.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.5.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.6.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.7.
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /test_ddr3_memory_controller/mem/#ASSIGN#543 File: Z:/ddr3/ddr3.v Line: 543
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /test_ddr3_memory_controller/mem/#ASSIGN#544 File: Z:/ddr3/ddr3.v Line: 544
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /test_ddr3_memory_controller/mem/#ASSIGN#545 File: Z:/ddr3/ddr3.v Line: 545
run 1us
run 1us
run 1us



(https://i.imgur.com/dhFVWEY.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 09, 2021, 08:03:59 am
@NorthGuy

Why do ISE tool states that my two deserializer instances (dq_iserdes_0, dq_iserdes_1)  (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1003-L1067) are unconnected ?

Note: dq_r_q0 and dq_r_q1 signals are obtained from output of IDDR2_dq_r (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1476-L1499) instance

Code: [Select]
WARNING:Xst:1290 - Hierarchical block <dq_iserdes_0> is unconnected in block <ddr3_control>.
   It will be removed from the design.
WARNING:Xst:1290 - Hierarchical block <dq_iserdes_1> is unconnected in block <ddr3_control>.
   It will be removed from the design.

The warning does not make sense to me.

(https://i.imgur.com/iUnNopg.png)

Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 09, 2021, 08:22:18 am
@BrianHG

Why the modelsim simulation no longer logs all the Micron simulation text output such as initialization sequence settings, which operations happen at which time, ... etc....  ?

Note: I tried to simulate for another 700us, but still there is no more log in the transcript console.

Code: [Select]
restart
# ** Note: (vsim-12125) Error and warning message counts have been reset to '0' because of 'restart'.
# ** Warning: (vsim-3015) [PCDPC] - Port size (2) does not match connection size (1) for port 'tdqs_n'. The port definition is at: Z:/ddr3/ddr3.v(115).
#    Time: 0 ps  Iteration: 0  Instance: /test_ddr3_memory_controller/mem File: Z:/ddr3/test_ddr3_memory_controller.v Line: 577
run 700us
# ddr3.file_io_open: at time                    0 WARNING: no +model_data option specified, using /tmp.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.0.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.1.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.2.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.3.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.4.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.5.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.6.
# ddr3.open_bank_file: at time 0 INFO: opening /tmp/ddr3.open_bank_file.7.
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /ddr3/#ASSIGN#543 File: Z:/ddr3/ddr3.v Line: 543
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /ddr3/#ASSIGN#544 File: Z:/ddr3/ddr3.v Line: 544
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /ddr3/#ASSIGN#545 File: Z:/ddr3/ddr3.v Line: 545
# test_ddr3_memory_controller.mem.file_io_open: at time 0.0 ps WARNING: no +model_data option specified, using /tmp.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.0.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.1.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.2.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.3.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.4.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.5.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.6.
# test_ddr3_memory_controller.mem.open_bank_file: at time 0.0 ps INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.7.
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /test_ddr3_memory_controller/mem/#ASSIGN#543 File: Z:/ddr3/ddr3.v Line: 543
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /test_ddr3_memory_controller/mem/#ASSIGN#544 File: Z:/ddr3/ddr3.v Line: 544
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
#    Time: 0 ps  Iteration: 0  Process: /test_ddr3_memory_controller/mem/#ASSIGN#545 File: Z:/ddr3/ddr3.v Line: 545
run 1us
run 1us
run 1us



(https://i.imgur.com/dhFVWEY.png)
LOL, you are showing me a write.  Show me a read.
It looks like your Micron model isn't even wired to any of the IOs.
In your sim, just go to the TB module which holds Micron's model, select the net names at the module and right-click 'add wave' and rerun the sim.  See if those net names are even toggling.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 09, 2021, 09:03:34 am
Quote
It looks like your Micron model isn't even wired to any of the IOs.

The root cause is with unconnected clock for the simulation model due to addition of OBUF primitive.

It is now solved with this commit. (https://github.com/promach/DDR/commit/d383951192bedcf21f1c9abc260b088670903763)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 09, 2021, 02:59:30 pm
I suppose the connectivity warnings (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3603750/#msg3603750) are just due to some tool optimization tricks and could be ignored ?

By the way, how shall I set the initial value (after system reset is asserted) of dqs_delay_sampling_margin (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1883) such that the initial MPR_Read_function phase calibration mechanism (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L2713-L2732) works correctly ?


Regarding tdqs signal (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L213-L221) , how shall it be shared physically with the data mask (DM) ball (http://hands.com/~lkcl/ddr3/4295tn4106_tdqs.pdf) ?


(https://i.imgur.com/Dri5rOM.png)

(https://i.imgur.com/Ck3k5Gp.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 10, 2021, 03:24:59 am
Apart from the connectivity warnings for dq_iserdes_0 and dq_iserdes_1, the data_to_ram signal (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L351-L352) also has two different warnings.

Note: Please comment out `define MICRON_SIM 1 on the line numbered 1 of the verilog source code

I suppose the "unconnected" warnings below are due to non-functional genvar data_write_index; (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L311) because the "unconnected" warnings start from bit index 32 ?

As for bit indexes from 0 to 31, the ISE tool warns about "constant value of 0".

By the way, two different warnings before (warning 1895) and after (warning 2677) bit index 32 is still a bit strange when DQ_BITWIDTH = 16

Any advice ?

Is there a way to simulate within ISE tool because my verilog coding has some Xilinx-specific primitive and IP core ?  I need simulation to check warning 2677

Note: I think applying "KEEP" synthesis attribute (https://forums.xilinx.com/t5/Spartan-Family-FPGAs-Archived/Xst-1710-my-variable-has-a-constant-value-of-0-and-will-be/m-p/36311/highlight/true#M2781) here would be inappropriate without first finding out the root cause of these 2 different warnings below.


Code: [Select]
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_18> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_17> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_16> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_15> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_14> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_13> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_12> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_11> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_10> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_9> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_8> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_7> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_19> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_20> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_21> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_22> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_23> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_24> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_25> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_26> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_27> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_28> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_29> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_30> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_31> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_0> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_1> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_2> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_3> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_4> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_5> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_6> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.

WARNING:Xst:2677 - Node <data_to_ram_32> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_33> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_34> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_35> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_36> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_37> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_38> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_39> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_40> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_41> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_42> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_43> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_44> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_45> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_46> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_47> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_48> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_49> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_50> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_51> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_52> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_53> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_54> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_55> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_56> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_57> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_58> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_59> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_60> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_61> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_62> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_63> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_64> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_65> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_66> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_67> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_68> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_69> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_70> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_71> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_72> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_73> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_74> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_75> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_76> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_77> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_78> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_79> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_80> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_81> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_82> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_83> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_84> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_85> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_86> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_87> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_88> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_89> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_90> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_91> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_92> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_93> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_94> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_95> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_96> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_97> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_98> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_99> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_100> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_101> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_102> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_103> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_104> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_105> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_106> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_107> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_108> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_109> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_110> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_111> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_112> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_113> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_114> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_115> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_116> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_117> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_118> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_119> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_120> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_121> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_122> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_123> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_124> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_125> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_126> of sequential type is unconnected in block <test_ddr3_memory_controller>.
WARNING:Xst:2677 - Node <data_to_ram_127> of sequential type is unconnected in block <test_ddr3_memory_controller>.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 10, 2021, 05:47:53 am
I got the following routing error for delayed_dqs_r (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L960) signal . Why ?

Code: [Select]
Starting Router

ERROR:Route:471 -
   This design is unrouteable. Router will not continue. To evaluate the problem please use fpga_editor. The nets listed below can not be
   routed:
Unrouteable Net:ddr3_control/dqs_r
Unrouteable Net:ddr3_control/delayed_dqs_r
Total REAL time to Router completion: 2 secs
Total CPU time to Router completion: 2 secs
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 10, 2021, 07:20:42 am
I have singled out the code modification that leads to routing error.

(https://i.imgur.com/KtslCKW.png)

I removed the main_state signal because as a signal within the high-speed clock (ck) domain, it is not supposed to be interfering with low-speed clock (clk) domain stuff, and would only work with Micron simulation where it only uses a single clock domain (since there is no serdes in micron simulation).

However, it is this code modification (https://www.diffchecker.com/RYcNtZvb) on removal of "main_state" variable that leads to routing error, because upon variable removal, this IODELAY primitive (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L945) is now no longer unconnected and have to be placed and routed.


Code: [Select]
Starting Router

ERROR:Route:471 -
   This design is unrouteable. Router will not continue. To evaluate the problem please use fpga_editor. The nets listed below can not be
   routed:
Unrouteable Net:ddr3_control/delayed_dqs_r
Routing Conflict 1:
Net:ddr3_control/ck_270 on pin CLK1 on location OLOGIC_X0Y15
Net:ddr3_control/ck_180 on pin IOCLK1 on location IODELAY_X0Y15
    Conflict detected on wire: PINFEED1(-64742,-66654)

Total REAL time to Router completion: 2 secs
Total CPU time to Router completion: 2 secs


how to use FPGA editor to investigate this routing issue ?

Note: I had also attached a zip file containing the whole ISE project at the end of this post.

(https://i.imgur.com/6k4fBfF.png)

(https://i.imgur.com/UgxdaPQ.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 12, 2021, 03:15:00 am
I think all the IO blocks related to the same pad - IDDR, ODDR, and IODELAY must be clocked with the same clock. Hence they cannot be ck_180 and ck_270.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 12, 2021, 08:46:38 am
Quote
they cannot be ck_180 and ck_270.

You mean ck_180 (iodelay_dqs_r block) (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L955) and ck_270 (udqs_w block) (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L689) as shown in the last picture in previous post ?

If yes, how shall I modify these two clock inputs ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 12, 2021, 08:55:26 am
@NorthGuy

Thanks for your advice which helps to eliminate the routing conflict between ck_180 and ck_270.

However, delayed_dqs_r is still unrouteable as shown below.

Note: delayed_dqs_r is output from IDELAY primitive and goes to internal FPGA fabric, is this allowed ?


Code: [Select]
ERROR:Route:471 -
   This design is unrouteable. Router will not continue. To evaluate the problem please use fpga_editor. The nets listed below can not be
   routed:
Unrouteable Net:ddr3_control/delayed_dqs_r
Total REAL time to Router completion: 2 secs
Total CPU time to Router completion: 2 secs


Code: [Select]
[phung@archlinux DDR]$ git diff ddr3_memory_controller.v
diff --git a/ddr3_memory_controller.v b/ddr3_memory_controller.v
index 0ecfa39..0aa5c47 100644
--- a/ddr3_memory_controller.v
+++ b/ddr3_memory_controller.v
@@ -951,8 +951,8 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
                        .ODATAIN                (1'b0),                 // data from OLOGIC/OSERDES2
                        .DATAOUT                (delayed_dqs_r),                // Output data 1 to ILOGIC/ISERDES2
                        .DATAOUT2               (),                     // Output data 2 to ILOGIC/ISERDES2
-                       .IOCLK0                 (ck),           // High speed clock for calibration
-                       .IOCLK1                 (ck_180),               // High speed clock for calibration
+                       .IOCLK0                 (ck_90),                // High speed clock for calibration
+                       .IOCLK1                 (ck_270),               // High speed clock for calibration
                        .CLK                    (clk),          // Fabric clock (GCLK) for control signals
                        .CAL                    (idelay_cal_dqs_r),     // Calibrate control signal
                        .INC                    (idelay_inc_dqs_r),             // Increment counter
[phung@archlinux DDR]$
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 12, 2021, 12:25:20 pm
Note: delayed_dqs_r is output from IDELAY primitive and goes to internal FPGA fabric, is this allowed ?

If you want to route it to the fabric, use DATAOUT2 instead of DATAOUT. Both are the same except for where they connect.

However, fabric routing/logic delays are way too big for DDR3 speeds. DQS signal will be useless in the fabric.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 12, 2021, 02:07:50 pm
Quote
However, fabric routing/logic delays are way too big for DDR3 speeds. DQS signal will be useless in the fabric.

@NorthGuy

If internal FPGA fabric has too much delay, then how would the MPR_Read_function calibration coding perform delay feedback (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L2724-L2762) to IDELAY ?


Code: [Select]
/*
Your DQS IO logic is clocked by a clock. You need to align DQS to this clock.
If you sample DQS with the rising edge of the clock, you can get different responses:
1. If you get always '0' which means that the clock rising edge already happened,
   but DQS rising edge didn't. DQS needs to be moved earlier by decreasing DQS delay.
2. If you get always '1' which means that the clock rising edge happens after DQS edge.
   Therefore, DQS's delay must be increased.
3. If you're somewhere in the middle (in the jitter zone) then DQS and clock are aligned.
Of course, you don't need DQS data, you only need DQ data. Therefore you adjust DQ delays
the same as DQS - every time you increase DQS delay, you also increase DQ delay as well.
Every time you decrease DQS delay you decrease DQ delay. This way, if DQS shifts, you shift
the DQ sampling point to follow DQS.
*/

if(MPR_ENABLE)
begin
// samples the delayed version of dqs_r for continous feedback to IDELAY2 primitive
if(~delayed_dqs_r & ~previous_delayed_dqs_r)
begin
idelay_inc_dqs_r <= 0;  // 1st case : decrements delay value
dqs_delay_sampling_margin <= dqs_delay_sampling_margin - 1;
end

else if(delayed_dqs_r & previous_delayed_dqs_r)
begin
idelay_inc_dqs_r <= 1;  // 2nd case : increments delay value
dqs_delay_sampling_margin <= dqs_delay_sampling_margin + 1;
end

// see 3rd case
if(dqs_delay_sampling_margin < JITTER_MARGIN_FOR_DQS_SAMPLING)
idelay_counter_enable <= 0;  // disables delay feedback process, calibration is done

else idelay_counter_enable <= 1;  // enables delay feedback process
end



I tried to assign delayed_dqs_r to DATAOUT2 port as suggested, and the routing error is gone.

However, the following warnings made P&R stage ends prematurely.

Code: [Select]
Starting initial Timing Analysis.  REAL time: 2 secs
Finished initial Timing Analysis.  REAL time: 2 secs

WARNING:Par:288 - The signal ddr3_control/dq_r<0> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/dq_r<4> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/dq_r<5> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/dq_r<6> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/dq_r<7> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/dq_r<8> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/dq_r<9> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/dq_r<10> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/dq_r<11> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/dq_r<12> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/dq_r<13> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/dq_r<14> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/dq_r<15> has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/IO_ldqs_n/O has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/IO_udqs_n/O has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal ddr3_control/IO_ldqs/O has no load.  PAR will not attempt to route this signal.
Starting Router


Phase  1  : 2028 unrouted;      REAL time: 2 secs
WARNING:Route:540 -
   Following pins had unsupported clocking structure. Inversion programming detected.
Pin:CLK1.ILOGIC_X0Y20
Pin:CLK1.OLOGIC_X0Y20
Pin:CLK1.ILOGIC_X0Y23
Pin:CLK1.OLOGIC_X0Y23
Pin:CLK1.ILOGIC_X0Y22
Pin:CLK1.OLOGIC_X0Y22
Pin:IOCLK1.IODELAY_X0Y20
Pin:IOCLK1.IODELAY_X0Y22
Pin:IOCLK1.IODELAY_X0Y23

Phase  2  : 1742 unrouted;      REAL time: 5 secs
WARNING:Route:436 - The router has detected an unroutable situation for one or more connections. The router will finish the rest of the
   design and leave them as unrouted. The cause of this behavior is either an issue with the placement or unroutable placement constraints.
   To allow you to use FPGA editor to isolate the problems, the following is a list of (up to 10) such unroutable connections:
Unroutable signal: ddr3_control/ck_90 pin:  ddr3_control/ddr3_control/udqs_w/CLK0
Unroutable signal: ddr3_control/ck_90 pin:  ddr3_control/ddr3_control/iodelay_dqs_r/IOCLK0
Unroutable signal: ddr3_control/ck_180 pin:  ddr3_control/ddr3_control/dq_r_q1<1>/CLK1
Unroutable signal: ddr3_control/ck_180 pin:  ddr3_control/ddr3_control/dq_w<1>/CLK1
Unroutable signal: ddr3_control/ck_180 pin:  ddr3_control/ddr3_control/dq_r_q1<2>/CLK1
Unroutable signal: ddr3_control/ck_180 pin:  ddr3_control/ddr3_control/dq_w<2>/CLK1
Unroutable signal: ddr3_control/ck_180 pin:  ddr3_control/ddr3_control/dq_r_q1<3>/CLK1
Unroutable signal: ddr3_control/ck_180 pin:  ddr3_control/ddr3_control/dq_w<3>/CLK1
Unroutable signal: ddr3_control/ck_180 pin:  ddr3_control/ddr3_control/dq_io[1].iodelay_dq_r/IOCLK1
Unroutable signal: ddr3_control/ck_180 pin:  ddr3_control/ddr3_control/dq_io[3].iodelay_dq_r/IOCLK1
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 12, 2021, 02:48:00 pm
If internal FPGA fabric has too much delay, then how would the MPR_Read_function calibration coding perform delay feedback (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L2724-L2762) to IDELAY ?

You sample it with ISERDES or IDDR and the output from these go to the fabric where you analyze them. Once it is sampled, the delays don't matter any more.

However, the following warnings made P&R stage ends prematurely.

It deletes everything which doesn't produce an effect. Then everything which feeds the deleted parts gets deleted too. If you don't use the signals, this may be half of your design (or even the entire design).
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 12, 2021, 02:57:14 pm
Quote
You sample it with ISERDES or IDDR and the output from these go to the fabric where you analyze them. Once it is sampled, the delays don't matter any more.

I do not understand how sampling with IDDR would make the FPGA fabric delay insignificant anymore ?


Besides, how shall I handle those routing warnings which almost deletes the entire design ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 12, 2021, 07:54:59 pm
Quote
You sample it with ISERDES or IDDR and the output from these go to the fabric where you analyze them. Once it is sampled, the delays don't matter any more.

I do not understand how sampling with IDDR would make the FPGA fabric delay insignificant anymore ?

There are IOB flip-flops (which are parts of IDDR). They sample the data on the edge of the clock. You use IODELAY and/or clock phase to align the signal  with the clock with 50-100 ps. The output signals from these flops can travel to fabric and the only timing requirement for them is that they get processed and arrive to receiving fabric flops by the next clock edge.

If you let the combinatorial signal from IODELAY go into the fabric, it will bypass the IDDR flops and will only be registered with fabric flops. It will take long time for the data to reach these flops. More important, this delay is unpredictable.

When you deal with fast signals you need to register them as close to the edge of FPGA as possible.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 13, 2021, 04:23:19 am
Quote
If you let the combinatorial signal from IODELAY go into the fabric, it will bypass the IDDR flops and will only be registered with fabric flops. It will take long time for the data to reach these flops. More important, this delay is unpredictable.

I did not capture read DQS strobe (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L953) "dqs_delayed_r" signal (since it is only used for DQS centering phase-alignment (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L2726-L2764)), I only captured the "IODELAY2-delayed" read DQ bits (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1519-L1565) using IDDR2 primitive (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1478-L1493)


Now, let me work on the following routing warnings.

(https://i.imgur.com/rsuN2BU.png)

Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 13, 2021, 12:31:30 pm
Now, let me work on the following routing warnings.

Where does ck_90 originate? Looks like it cannot use ck_90 as a clock.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 13, 2021, 12:48:57 pm
ck_90 comes from PLL (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L587)

dq_r_q1 is related to IDDR (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1486)

dq_w is related to ODDR (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1507) and IOBUF (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1440)

udqs_w is related to ODDR (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L687) and IOBUF (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1344)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 13, 2021, 03:48:24 pm
ck_90 comes from PLL (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L587)

You need a clock buffer (such as BUFG) on the output of the PLL. I would expect ISE to automatically add BUFG to the clocking path. I have no idea why it doesn't do this.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 13, 2021, 04:27:24 pm
I have already added BUFG to ck_90 , so it is not related to BUFG

(https://i.imgur.com/ga45SGw.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 13, 2021, 04:43:42 pm
I have already added BUFG to ck_90 , so it is not related to BUFG

BUFG should be able to drive any clock. I don't understand why it cannot be routed to drive your IO blocks.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 13, 2021, 05:06:48 pm
I have dozen of connectivity and constant value warnings before routing process.

I suspect my deserializer coding (https://github.com/promach/DDR/blob/main/deserializer.v) might have contributed to all these warnings.

Note: I am only showing part of the warnings due to max number of character allowed in a single post.

Code: [Select]
WARNING:Xst:1710 - FF/Latch <dq_w_d1_15> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d1_14> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d1_13> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d1_12> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d1_7> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d1_6> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d1_5> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d1_4> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d0_15> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d0_14> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d0_13> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d0_7> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d0_6> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d0_5> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <dq_w_d0_4> (without init value) has a constant value of 0 in block <ddr3_control>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_4> (without init value) has a constant value of 0 in block <dq_oserdes_0>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_5> (without init value) has a constant value of 0 in block <dq_oserdes_0>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_6> (without init value) has a constant value of 0 in block <dq_oserdes_0>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_7> (without init value) has a constant value of 0 in block <dq_oserdes_0>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_13> (without init value) has a constant value of 0 in block <dq_oserdes_0>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_14> (without init value) has a constant value of 0 in block <dq_oserdes_0>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_15> (without init value) has a constant value of 0 in block <dq_oserdes_0>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_4> (without init value) has a constant value of 0 in block <dq_oserdes_1>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_5> (without init value) has a constant value of 0 in block <dq_oserdes_1>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_6> (without init value) has a constant value of 0 in block <dq_oserdes_1>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_7> (without init value) has a constant value of 0 in block <dq_oserdes_1>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_12> (without init value) has a constant value of 0 in block <dq_oserdes_1>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_13> (without init value) has a constant value of 0 in block <dq_oserdes_1>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_14> (without init value) has a constant value of 0 in block <dq_oserdes_1>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1710 - FF/Latch <data_out_15> (without init value) has a constant value of 0 in block <dq_oserdes_1>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_25> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_26> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_27> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_29> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_30> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_28> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_31> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_8> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_10> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_11> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_9> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_12> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_13> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_15> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:1895 - Due to other FF/Latch trimming, FF/Latch <data_to_ram_14> (without init value) has a constant value of 0 in block <test_ddr3_memory_controller>. This FF/Latch will be trimmed during the optimization process.
WARNING:Xst:2677 - Node <data_out_4> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_5> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_6> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_7> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_8> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_9> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_10> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_11> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_12> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_13> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_14> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_15> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_20> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_21> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_22> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_23> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_24> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_25> of sequential type is unconnected in block <dq_iserdes_0>.
WARNING:Xst:2677 - Node <data_out_26> of sequential type is unconnected in block <dq_iserdes_0>.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 14, 2021, 01:57:51 am
As shown below, these signals are not having constant value of 0.  I am quite confused now.

(https://i.imgur.com/p1TIn8s.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 14, 2021, 02:22:11 am
I changed MAP's "-global_opt" setting from OFF to SPEED, then routing warnings reduces to only the following:

Note: clk_BUFGP seems to be related to the IODELAY Clock input (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=130)

Code: [Select]
WARNING:Route:436 - The router has detected an unroutable situation for one or more connections. The router will finish the rest of the
   design and leave them as unrouted. The cause of this behavior is either an issue with the placement or unroutable placement constraints.
   To allow you to use FPGA editor to isolate the problems, the following is a list of (up to 10) such unroutable connections:
Unroutable signal: clk_BUFGP pin:  ddr3_control/dq_io[1].iodelay_dq_r/CLK
Unroutable signal: clk_BUFGP pin:  ddr3_control/iodelay_dqs_r/CLK
Unroutable signal: clk_BUFGP pin:  ddr3_control/dq_io[3].iodelay_dq_r/CLK
Unroutable signal: clk_BUFGP pin:  ddr3_control/dq_io[2].iodelay_dq_r/CLK
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 14, 2021, 03:08:43 am
I changed MAP's "-global_opt" setting from OFF to SPEED, then routing warnings reduces to only the following:

Some of the IODELAYs are fine, only these four give errors. I guess there are bank restrictions and clk for some reason cannot reach the bank where these particular IODELAYs are located.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 14, 2021, 04:30:32 am
okay, there might be some bank restrictions that I am not aware of.  May I know which xilinx appnote documents I could check about this ?


By the way, someone else told me that bufgp is described as a combination ibufg+bufg, and these iodelay CLK pins have to be coming from the global clock network.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 14, 2021, 04:40:07 am
okay, there might be some bank restrictions that I am not aware of.  May I know which xilinx appnote documents I could check about this ?


By the way, someone else told me that bufgp is described as a combination ibufg+bufg, and these iodelay CLK pins have to be coming from the global clock network.

ug382 should explain all the restrictions. You can look at these four IODELAYs on the map and see how they're different from the others.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 15, 2021, 09:50:16 am
Checking upon bank restriction for IODELAY2 clocking (https://www.xilinx.com/support/documentation/user_guides/ug382.pdf#page=22), what does it mean by full bank ?

(https://i.imgur.com/r5QNAdi.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 15, 2021, 01:33:09 pm
Checking upon bank restriction for IODELAY2 clocking (https://www.xilinx.com/support/documentation/user_guides/ug382.pdf#page=22), what does it mean by full bank ?

The bank is split in halves. By "ful bank", they mean both halves.

In the page you posted they talk about delaying BUFIO2 clock with IODELAY2. This has nothing to do with the CLK pin of IODELAY2. The CLK pin must be clocked with something can clock the fabric as well.

Have you found what's the difference between the IODELAY2 blocks which are mentioned in the error message and others? Do other IODELAY2 blocks have the CLK pin routed Ok?
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 15, 2021, 04:53:14 pm
The problem is that some of the iodelay_dq_r blocks had been optimized away.

Only dq_io[3] has its CLK signal routed as shown in FPGA editor.

(https://i.imgur.com/743xAHD.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 15, 2021, 11:42:58 pm
The problem is that some of the iodelay_dq_r blocks had been optimized away.

I see.

I cannot think of any reason why these CLK pins of IODELAY cannot be driven by BUFG.

If something is unroutable then either there's no path, or a path exists but is occupied by some other routing which cannot be moved away.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 16, 2021, 01:57:51 am
Sure, a lot of issues actually happened before the routing process starts.

I still have timing issues ( fmax at only 169MHz ) and a lot of logic signals got optimized away.

Let me solve those issues first.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 16, 2021, 11:06:25 am
I solved the fmax issue (I still need to use a proper timing constraint file later) , but ISE tool gave me totally different routing warnings now.

By the way, I still need to investigate why some of the iodelay_dq_r blocks are being optimized away.


Code: [Select]
WARNING:Route:436 - The router has detected an unroutable situation for one or more connections. The router will finish the rest of the
   design and leave them as unrouted. The cause of this behavior is either an issue with the placement or unroutable placement constraints.
   To allow you to use FPGA editor to isolate the problems, the following is a list of (up to 10) such unroutable connections:
Unroutable signal: ddr3_control/ck_90 pin:  ddr3_control/udqs_w/CLK0
Unroutable signal: ddr3_control/ck_90 pin:  ddr3_control/dq_r_q1<1>/CLK0
Unroutable signal: ddr3_control/ck_90 pin:  ddr3_control/dq_w<1>/CLK0
Unroutable signal: ddr3_control/ck_90 pin:  ddr3_control/dq_r_q1<2>/CLK0
Unroutable signal: ddr3_control/ck_90 pin:  ddr3_control/dq_w<2>/CLK0
Unroutable signal: ddr3_control/ck_90 pin:  ddr3_control/dq_r_q1<3>/CLK0
Unroutable signal: ddr3_control/ck_90 pin:  ddr3_control/dq_w<3>/CLK0
Unroutable signal: ddr3_control/ck_90 pin:  ddr3_control/dq_io[1].iodelay_dq_r/IOCLK0
Unroutable signal: ddr3_control/ck_90 pin:  ddr3_control/iodelay_dqs_r/IOCLK0
Unroutable signal: ddr3_control/ck_90 pin:  ddr3_control/dq_io[3].iodelay_dq_r/IOCLK0
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 16, 2021, 02:23:53 pm
I solved the fmax issue (I still need to use a proper timing constraint file later) , but ISE tool gave me totally different routing warnings now.

You already had these warnings few days back. Then it went away for some reason, possibly because the problematic elements were optimized away. You need to figure out why BUFG cannot clock your components. It's either because you use different clocks in a situation where the same clock is used, because there's no BUFG on your clock, or possibly there's some sort of physical limitation.

I usually don't write everything in one module as you do. Rather, I would have a clocking module, an IO module, a command module etc. Then I would test each module separately and only then put them together. This makes things easier. For example, you would have single IOCLK input to your IO module, so you wouldn't have to worry whether all the IO components are clocked the same or not.

Title: Re: DDR3 initialization sequence issue
Post by: promach on July 16, 2021, 04:29:25 pm
I suspect physical limitation, but I am not sure.

Note:  dq_io[2].iodelay_dq_r/IOCLK0 does not have the same ck_90 routing issue

One other strange optimization being done is that why all the [0]-th block are not available ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 16, 2021, 05:07:53 pm
I suspect physical limitation, but I am not sure.

Note:  dq_io[2].iodelay_dq_r/IOCLK0 does not have the same ck_90 routing issue

May be it just doesn't show the error because there's too many of them.

If it cannot route from a BUFG to IOCLK0, it's weird. Create a simple project with clocking components and IO only, route all the unused wires to outside. Make it work.

Title: Re: DDR3 initialization sequence issue
Post by: promach on July 17, 2021, 02:02:05 am
Quote
WARNING:Xst:2957 - There are clock and non-clock loads on clock signal clk_BUFGP. This is not a recommended design practice, that may cause excessive delay, skew or unroutable situations.

I suppose the CLK_BUFGP issue is still unresolved, and is just hidden behind dozen of other unrouteable nets.

May I know where exactly does Xilinx ISE stores information of ALL such unrouteable nets ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 17, 2021, 11:22:16 am
Quote
ERROR:Route:472 -
   This design is unrouteable.
   To evaluate the problem please use fpga_editor.
Routing Conflict 1:
   Net:ddr3_control/ck_270 on pin IOCLK1 on location IODELAY_X0Y15
   Net:ddr3_control/ck_180 on pin CLK1 on location OLOGIC_X0Y15
    Conflict detected on wire: PINFEED1(-64742,-66654)

I found another logic bug where both write DQS strobe and write DQ bits are driven using the same clock, which means there is no 90 phase shift (DQS centering).

However, solving this logic bug resulted in the above quoted routing error...
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 18, 2021, 03:27:11 am
@NorthGuy

I just found that for Xilinx ISE PLL clocking wizard to use dynamic phase shift approach (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3585485/#msg3585485) which is what @BrianHG is using for his DECA board, the clocking wizard requires a fresh new PLL IP instantiation for each different generated clocks.

@BrianHG

What do you exactly mean by "I'm only using 1 PLL, it just that I have 3 outputs enabled and I am adjusting the phase of clk #2 while I set the parameter that clk #1 is at 90 degrees and clk #0 is my reference 0 degree and it's the system clock. " ?


Besides, what is wrong with ldqs and udqs signals (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L677-L689) ?

(https://i.imgur.com/Hzrh4WX.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 19, 2021, 01:55:36 am
1.  Why after dynamic phase shift is enabled, the accepted values for the field "requested phase" could only take up values of 90 and 270 (values of 0 and 180 are not accepted)  ?

2.  Why dq goes from 6 to 7 to 6 , then jumps to 8 ?  There is something wrong with ODDR2 primitive (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L668-L682).


(https://i.imgur.com/xQd05EC.png)


(https://i.imgur.com/cjhAcvg.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 20, 2021, 05:01:16 pm
@BrianHG

What do you exactly mean by "I'm only using 1 PLL, it just that I have 3 outputs enabled and I am adjusting the phase of clk #2 while I set the parameter that clk #1 is at 90 degrees and clk #0 is my reference 0 degree and it's the system clock. " ?


Exactly that.  Note that even if you for example request 300MHz out from a PLL, internally the oscillator generating that clock is running many fold that frequency.  The core oscillator is then simple divided with simple binary stage counters of shift registers for each of it's multiple outputs.  Basically 1 programmable shifting divider per output.  When you set the multiple possible PLL's outputs in your ISE, or send it controls, it is usually only modifying the contents/settings of these dedicated output dividers to generate the multiple phases & dividend frequencies.  This is why not every combination of output phase and frequency multiple is allowed as it all comes from that super high frequency oscillator phase locked to a fraction of the source crystal frequency.

If you are using special IO features in a particular way, things may be even more limiting as behind your back, the compiler may be reserving a number of features/outputs for a particular required use removing the ability to use some features or modes.  For example, when tuning one of my output phases in my DDR3 controller, if I tune any one other than the one I reserved for the purpose, it may just completely crash my FPGA to a point where only a power cycle will recover it.

Title: Re: DDR3 initialization sequence issue
Post by: promach on July 21, 2021, 02:54:53 am
@BrianHG

Thanks for your advice.

May I know if it is feasible to build my own digital PLL using verilog logic coding as described in https://zipcpu.com/dsp/2017/12/14/logic-pll.html for DDR3 RAM controller ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 21, 2021, 03:42:43 pm
@BrianHG

Thanks for your advice.

May I know if it is feasible to build my own digital PLL using verilog logic coding as described in https://zipcpu.com/dsp/2017/12/14/logic-pll.html for DDR3 RAM controller ?
At 300MHz, the Altera core PLL oscillator drives it's counter logic at 2.4GHz.  Can you think of any verilog code which can run at 2.4GHz in a FPGA whose core functions usually max out at ~300MHz?  What would be the stability and jitter of logic driven clocks?  What's the jitter penalty of using logic cell generated clocks to drive global clocks onto the FPGA's clock networks?  What happens every-time you compile and your clock generation logic gets moved around the FPGA fabric.  You will never replicate the hard-wired dedicated PLL's circuitry stability and phase tuning functions where the FPGA manufacturer has even gone so far as dedicate a separate analog and digital supply pins for each PLL section with hard wired routes to the FPGA global clock nets.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 21, 2021, 05:42:17 pm
I faced too many issues at routing and implementation stages due to PLL and the IODELAY for DQ bits.

I suspect that Spartan-6 might not have enough mapping and routing resources to implement DDR3 RAM controller without much careful maneuver around those mapping and routing warnings and errors.  I am now curious as in how litedram is ported to Sprtan-6

Not only mapping and routing errors: when I instantiate a new PLL IP for every different clock of different phases, I have ISIM simulation error which aborts the simulation run, because the PLL input jitter already exceeded the "multiplied clock"'s period.

What do you think ?

Should I switch to DECA board which might be easier and less issues with much higher chances to getting my DDR3 RAM controller to run without issues ?
Title: Re: DDR3 initialization sequence issue
Post by: asmi on July 21, 2021, 06:10:27 pm
Don't you think there's got to be a reason Xilinx opted to implement hardware DDR3 MCBs in Spartan-6, as opposed to going with "soft" implementation via fabric? Even in Spartan-7/Artix-7 with fabric about twice as fast as that of a S6, they still have to use some hardware blocks to implement a DDR3 controller which is reliable across entire PVT range.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 21, 2021, 06:52:02 pm
I faced too many issues at routing and implementation stages due to PLL and the IODELAY for DQ bits.

Issues regarding the obvious improvement when using IO pin features in later FPGA like what ASMI has pointed out, you have gone about your implementation in not a way which can be done easily in Spartan 6, but it still can be done properly.  Your lacking in experience of how the compiler interprets your setup and what only required as you have gone overboard on requesting some clocks you don't even need.  Please re-read the DDR buffer features and try to get your output clocks from the all the ones you setup down to only the three you need if you are aiming to copy my approach.  That 1 main system clock, a write clock, and 1 tune-able read clock.  Otherwise, what you want to do may be achievable with 1 PLL output clock if you are using the tune-able delays in the IO buffer.

In my code, I have a huge pipeline of registers on the writes and reads data & mask allowing the compiler to shift the data from 1 pll clock phase to another to avoid timing errors.  This eats logic cells, where in a design with an immediate controller without any cache means not using any additional resources.  The compiler also auto upgrades these shift registers to the various type of block memory, so, I don't care.  However, if your design reads and writes to and from dedicated block memory blocks, this may waste logic cells where smarter wiring direct to DDR IO port buffers and clock management becomes more cumbersome especially if you want to space commands with an odd DDR_CK spacing and your FPGA core memory may need to be run at half or quarter speed.

My design wastes around 1k logic cells in a 16 bit DDR3 controller at 400MHz for these serial shift registers.  If I wanted to save logic cells to the max, I would have used M9K memory blocks making my ram controller consume only ~2k logic gates instead of ~3k.  But that 1K penalty means code which can run way faster than Altera's core M9K blocks on their slower FPGA as well as code which ports to other FPGA vendor's cores easily.  I know what I built isn't the best thing in the world, but, it is built to be flexible across a number of architectures with a few given top performance limitations, yet it should function dead on identically on each platform.  Since I targeted a hobbyist usually running smaller, slower FPGAs with free versions of the compiler, I chose this flexible code route which could still achieve the 303MHz required by the DDR3 spec and also did 1 up on Altera since their software DDR3 controller cant even achieve this, only 300MHz, on their -6 FPGAs where as mine can now easily achieve 350MHz or better on their fastest -6 and still work on their slower bottom end MAX10/CycloneV/IV/III FPGAs.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 21, 2021, 07:42:48 pm
Spartan-6 also has DCM to generate clocks.

If you want to build a DDR3 controller on an FPGA where Xilinx decided to use dedicated MCB, don't expect this to be easy. You need to know very details of their hardware. It'll take lots of reading, investigating, understanding, inventing clever ways, and there's no guarantee for success.
Title: Re: DDR3 initialization sequence issue
Post by: asmi on July 21, 2021, 07:57:38 pm
If you want to build a DDR3 controller on an FPGA where Xilinx decided to use dedicated MCB, don't expect this to be easy. You need to know very details of their hardware. It'll take lots of reading, investigating, understanding, inventing clever ways, and there's no guarantee for success.
The most time-consuming part is characterization. Even if you get it to work on a single board in a lab, you will need to test it across entire PVT, and then on a variety of different designs and configurations. I once designed a board for a customer using LPDDR1 memory with my own controller, but regret that decision many times over the course of a project exactly because of requirement to make sure it works over entire range of conditions. That was a lesson I learnt, since that I only use DDR2 or DDR3 for my projects and a MIG because I know it will not give me headaches, even if sometimes it will be an overkill as far as bandwidth and capacity.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 22, 2021, 02:15:27 am
Quote
Please re-read the DDR buffer features and try to get your output clocks from the all the ones you setup down to only the three you need if you are aiming to copy my approach.  That 1 main system clock, a write clock, and 1 tune-able read clock.  Otherwise, what you want to do may be achievable with 1 PLL output clock if you are using the tune-able delays in the IO buffer.

The Xilinx IODELAY2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=130) for read operation requires not just a read clock (IOCLK0), but also the inversion of the read clock (IOCLK1).

Same applies to ODDR2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=224) for its C0 and C1 input pins.

What makes things worse is that my current design is using ck_90 to drive write DQ bits without using ODELAY for the write operation
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 22, 2021, 03:11:37 am
What happened to the DDR buffer you showed here? :
https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3601858/#msg3601858 (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3601858/#msg3601858)

It has features you can use...
Hilariously enough, almost the exact same features, cell for cell, as in the MAX10 DDR IO buffer as well as 2 more extra features than the Cyclone IV/III DDR buffers which is already enough function for my DDR3 controller.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 22, 2021, 11:57:54 am
That diagram does not show the IODELAY2 primitive.
Now, I have thought of a design which uses only 2 PLL clocks (ck and ck_180)

Note: ck_180 is needed because ODDR2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=224) requires both C0 and C1 clocks in the case of DDR3 RAM controller.

By the way, should I share and time-multiplex the IODELAY2 primitives for read and write operations ?
If IODELAY2 are not shared, I am afraid there will be routing issues ?

Besides, what is the purpose of IOCLK0 and IOCLK1 inside IODELAY2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=130) ?  What does it mean by Optionally Invertible ?  Do I need both IOCLK0 and IOCLK1 for IODELAY2 primitive to work properly ?

(https://i.imgur.com/xgrbsxO.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 22, 2021, 01:00:54 pm
Besides, what is the purpose of IOCLK0 and IOCLK1 inside IODELAY2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=130) ?  What does it mean by Optionally Invertible ?  Do I need both IOCLK0 and IOCLK1 for IODELAY2 primitive to work properly ?

IOCLK0 and IOCLK1 are needed for tap calibration with CAL. It counts the number of taps in the clock period, hence it needs the clock.

"Optionally Invertible" means that the block can invert the clock in hardware prior to use. That's how it can use a single BUFG instead of two.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 22, 2021, 01:30:26 pm
1 clock for in, 1 clock for out.
(https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/?action=dlattach;attach=1238459)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 22, 2021, 02:09:31 pm
@Northguy

Do I need BOTH IOCLK0 and IOCLK1 for IODELAY2 primitive to work properly ?
Is IOCLK0 and IOCLK1 being 180 degree phase difference apart from each other ?


Quote
"Optionally Invertible" means that the block can invert the clock in hardware prior to use. That's how it can use a single BUFG instead of two.

I do not understand how "Optionally invertible" uses a single BUFG instead of two ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 22, 2021, 02:13:30 pm
@BrianHG

Quote
1 clock for in, 1 clock for out.

I am planning to use the same clock for both read (in) and write (out) operations.

Why did you cross out the two bullet points ?

What is local inversion ?

Besides, why do I need frequency doubler when IDDR2 and ODDR2 alone could already handle double-data-rate signals ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 22, 2021, 02:36:03 pm
Just showing you how I would configure the buffer using minimal resources.
I also allowed the option to have a tuneable read phase clock.

You are free to go about things any way you choose.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 22, 2021, 03:28:37 pm
Quote
I also allowed the option to have a tuneable read phase clock.

How exactly the dynamic phase shift mechanism inside PLL works to perform DQS centering (90 degree phase shift) ?

I asked this question because inside Xilinx PLL clocking wizard, there are only PSDONE, PSINCDEC, PSEN and PSCLK (https://www.xilinx.com/support/documentation/user_guides/ug382.pdf#page=61) signals.

(https://i.imgur.com/LeqlZQo.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 22, 2021, 03:54:20 pm
I'm using the PLL to generate both my fixed 0 and 90 degree clocks, not the DCM.  Depending on resolution, I either would use the PLL third output itself, or use the DCM connected to the PLL 0 degree output to generate my phase tuneable clock for the read data.  In other words, the DCM module only has 1 single 0 degree clock output enabled which exclusively drives my read clock.  No other outputs enables, no frequency conversion.  Wired like this, there should be 0 problem using the tuning input controls with the finest possible tuning steps, as in the act of tuning moves that 0 degree to where I need.  I believe since you have 2 DCMs for each PLL, you can then individually tune each one to clock each 8 bits group the DDR3.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 22, 2021, 04:24:56 pm
Is it possible to use only a single PLL 0 degree output clock (without generating 90 degree clocks) to perform 90 degree phase shift since the PLL 0 degree output clock period is already known beforehand ?

I suppose the dynamic phase shift inside the PLL has some fixed phase increment value (https://www.xilinx.com/support/documentation/user_guides/ug382.pdf#page=66) ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 22, 2021, 05:20:58 pm
Do I need BOTH IOCLK0 and IOCLK1 for IODELAY2 primitive to work properly ?
Is IOCLK0 and IOCLK1 being 180 degree phase difference apart from each other ?

Most likely, these are the same clock lines that drive IDDR and ODDR, so both get connected no matter what you do.

I do not understand how "Optionally invertible" uses a single BUFG instead of two ?

You use "single_BUFG" to drive one clock and "not single_BUFG" to drive the other clock, as opposed to "phase_0_BUFG" driving one clock and "phase_180_BUFG" to drive the other.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 23, 2021, 01:06:08 am
Quote
You use "single_BUFG" to drive one clock and "not single_BUFG" to drive the other clock, as opposed to "phase_0_BUFG" driving one clock and "phase_180_BUFG" to drive the other.

Does this mean that I could use the same clock (0 degree phase) signal for both IOCLK0 and IOCLK1 of IODELAY2 primitive ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 23, 2021, 02:10:41 pm
Quote
You use "single_BUFG" to drive one clock and "not single_BUFG" to drive the other clock, as opposed to "phase_0_BUFG" driving one clock and "phase_180_BUFG" to drive the other.

Does this mean that I could use the same clock (0 degree phase) signal for both IOCLK0 and IOCLK1 of IODELAY2 primitive ?

If you invert it for IOCLK1 (and if you do the same for IDDR and ODDR as well).
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 24, 2021, 04:57:18 pm
Quote
I'm using the PLL to generate both my fixed 0 and 90 degree clocks, not the DCM.  Depending on resolution, I either would use the PLL third output itself, or use the DCM connected to the PLL 0 degree output to generate my phase tuneable clock for the read data.  In other words, the DCM module only has 1 single 0 degree clock output enabled which exclusively drives my read clock.  No other outputs enables, no frequency conversion.  Wired like this, there should be 0 problem using the tuning input controls with the finest possible tuning steps, as in the act of tuning moves that 0 degree to where I need.  I believe since you have 2 DCMs for each PLL, you can then individually tune each one to clock each 8 bits group the DDR3.

@BrianHG

For read operation, I do not understand why using only a 90 degree clock (without using any delay elements such as Xilinx's IODELAY2 primitive) would be able to clock all 8 bits of the incoming DQ signals ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 24, 2021, 07:02:02 pm
Quote
I'm using the PLL to generate both my fixed 0 and 90 degree clocks, not the DCM.  Depending on resolution, I either would use the PLL third output itself, or use the DCM connected to the PLL 0 degree output to generate my phase tuneable clock for the read data.  In other words, the DCM module only has 1 single 0 degree clock output enabled which exclusively drives my read clock.  No other outputs enables, no frequency conversion.  Wired like this, there should be 0 problem using the tuning input controls with the finest possible tuning steps, as in the act of tuning moves that 0 degree to where I need.  I believe since you have 2 DCMs for each PLL, you can then individually tune each one to clock each 8 bits group the DDR3.

@BrianHG

For read operation, I do not understand why using only a 90 degree clock (without using any delay elements such as Xilinx's IODELAY2 primitive) would be able to clock all 8 bits of the incoming DQ signals ?
90 degree is for the write, not read.
The read is being tuned at initialization based on the read cal.
I haven't looked at or used the IODELAY2 primitive.
 
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 24, 2021, 08:58:10 pm
90 degree is for the write, not read.
The read is being tuned at initialization based on the read cal.
I haven't looked at or used the IODELAY2 primitive.

Yes, there may be arbitrary relationship for read because the delay is caused by round-trip

CLK->CLK output buffers->CLK traces->DDR3 chip->DQ/DQS traces->DQ/DQS input buffers=>DQ

and hence is not very predictable (not to mention fly-by connections).

You need to align the signal and the clock somehow. BrianHG use phase shift in his clock to do so. Delaying the signal with IODELAY is another method.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 25, 2021, 03:36:11 am
Quote
You need to align the signal and the clock somehow. BrianHG use phase shift in his clock to do so. Delaying the signal with IODELAY is another method.

PLL dynamic phase shift approach by @BrianHG could only align the incoming DQS strobe, what about all 8 bits of DQ signal ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on July 25, 2021, 02:53:13 pm
Quote
You need to align the signal and the clock somehow. BrianHG use phase shift in his clock to do so. Delaying the signal with IODELAY is another method.

PLL dynamic phase shift approach by @BrianHG could only align the incoming DQS strobe, what about all 8 bits of DQ signal ?

Incoming DQ and DQS are in phase with each other.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 25, 2021, 05:04:45 pm
I know both incoming DQS and DQ are in phase with each other, at least in most of the ordinary situations.

However, what I am asking is in the case of DQS centering, how exactly the PLL dynamic phase shift mechanism helps to align BOTH DQS and DQ signals without using any delay elements on DQ signals ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 26, 2021, 11:54:20 am
in https://www.xilinx.com/support/documentation/user_guides/ug382.pdf#page=65 (https://www.xilinx.com/support/documentation/user_guides/ug382.pdf#page=65) , there is clock with 90 degrees phase shift already being generated internally.

I am really confused as in how the DQS centering for read operation works with PLL dynamic phase shift.

(https://i.imgur.com/W3f1JfL.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 27, 2021, 11:25:27 am
Quote
the tunable PLL output is used for read sampling
The tunable PLL output clock goes to the 'input clock' for the DQ & DQS DDR input buffers and subsequent read data FIFO's input clock.
Remember, when reading data, the DQS is in perfect sync with the read DQ.

using tuneable PLL output (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3585485/#msg3585485) (@BrianHG approach) for read sampling is not as robust as the conventional DQS centering achieved by using IODELAY2 delay primitive.

Please correct me if wrong.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 27, 2021, 04:51:02 pm
Litedram code author told me that for spartan-6 platform, he uses fixed bitslip/delay which required manual adjustments by the user. (https://github.com/litex-hub/litex-boards/blob/master/litex_boards/targets/saanlima_pipistrello.py#L174-L176)

The reason for doing so is due to that IODELAY2 primitive on Spartan-6 platform has internal hardware issues (https://www.xilinx.com/support/answers/38408.html)

I think I could only use PLL dynamic phase shift feature.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 27, 2021, 04:56:10 pm
From what I read so far...

IODELAY should be used to tune out trace length differences on the PCB of each individual pin.  Not adjust read phase position.

Using just a tunable PLL output will only give you 16-32 steps per 360 degrees per clock.
Using a tunable DCM attached to your main PLL output will give you the enhanced 256 steps per 360 degree phase.

There is a good advantage of higher end frequencies and quality if you use the DQS as a clock input signal instead of using a tuned PLL output as a read sample clock.  However, this will rely on use of the IODELAY if the FPGA cannot accept that the DQS coming in toggles at the same time as the data.  Using this feature will lower compatibility between FPGA types, or, you might need a far greater amount of code for each type of FPGA.  Older, cheaper, slower FPGA might not be able to handle this as the modern FPGAs designed to implement DDR3/DDR4 topologies.

My DDR3 controller has a 50MHz power-up processor handling the power-up tuning.  It is a sequencer where it is possible to add/remove code to support any type of buffer tuning during power-up.  Though, it does cycle the power-up commands at a maximum of ~25 million a second instead of the 1 command per every ~4 DDR3_CK clocks.  This just alleviates fitting & routing load on a variable power-up program having to always run at full tilt DDR_CK frequency so I need not worry about how complex I need to make that part of the firmware to measure calculate and set the IO buffer features if I ever wish to enhance support of higher end features.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 27, 2021, 05:22:11 pm
Quote
this will rely on use of the IODELAY if the FPGA cannot accept that the DQS coming in toggles at the same time as the data.

@BrianHg , I am confused with your quoted sentence. Is this "same time" keyword implying the need for IODELAY in the case of DQS centering ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 27, 2021, 05:35:40 pm
I only know how this worked on older Altera fixed timing designs.

We used to specify the tSU and tH in from the DQS grouped to the DQ pins, with PCB timing trace length adjustments in the .sdc file and allow the compiler to setup the buffers for us during compile.

Altera also offers a DDR3/4 IO phy which auto wires all of the PLL and delay line clocking buffers for any DDR2/3/4 ram in one single neat little function package with a control port which allows us to real-time software dynamic tune the PHY's IO delays, or parameter fix them.  If I were to use, all I provide is the RAM type, the # of DQS / DQ groups and # of CK/command pins with clock frequencies and PCB trace lengths.  The instructions on usage is nothing like Xilinx.

I do not know how Xilinx implements their DQS clocked DQ input topology.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 28, 2021, 01:56:52 am
Quote
There is a good advantage of higher end frequencies and quality if you use the DQS as a clock input signal instead of using a tuned PLL output as a read sample clock.

Wait, I thought tuned PLL output could be phase-shifted to the middle/center of DQ bits ?  How would using DQS as clock signal be more superior compared to tuned PLL output ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 28, 2021, 02:36:45 am
When reading data, the DQS is generated by the DDR3 ram chip, just like the DQ data.  This means that there any timing and jitter noise generated by the FPGA clock output, pcb traces, and the ram's CK input plus it's PLL clock and the speed of the IOs and the effect of both IC temperatures will be there in the DQS and DQ alike.  If you are clocking the data in from the local PLL clock, all these errors may not be there as the local signal may be super clean by comparison.

Now, if you sample the DQ data using the DQS without a PLL, just using the DQS as a dedicated clock with the FPGA's special routed hard wired path, these accumulated timing errors will better match the DQ data.  However, this can only be accurately done because the FPGA has special IO zones where you must use the dedicated DQS inputs paired with it's 8 dedicated hard wired DQ IO input pins to get this enhanced timing error following performance as only each of the 8 IO matched with their dedicated 2 DQS pins have this special direct clock routing on the FPGA silicon achieving the tight timing needed to pull off the high end 1-2 GSPS route with a predicted minimum delay.  This cannot work with global clocks through PLLs as the DQS is not a continuous clock and you don't want to clean up that clock as you want that timing noise within the DQS to match the timing noise within the DQ data being read to further guaranteed a correct read.
Title: Re: DDR3 initialization sequence issue
Post by: promach on July 28, 2021, 12:21:08 pm
As for PLL dynamic phase shift approach (https://www.xilinx.com/support/documentation/user_guides/ug382.pdf#page=65), I have few questions:

1. Could I actually generate a 90 degree phase-shifted clock from CLK_OUT2 using DCM_SP Settings ?
    Could I actually generate a 270 degree phase shifted clock from CLK_OUT4 using DCM_SP Settings ?

2. What about PLL_BASE Settings which seems to have the phase shift capability as well ?

(https://i.imgur.com/qc3Lqx9.png)

(https://i.imgur.com/cJ7hD2N.png)

(https://i.imgur.com/pE0xDiK.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on July 28, 2021, 07:09:08 pm
I do not know about Xilinx specifics, but, remember in earlier discussion, if you make a clk_90 degree output, remember that a 270 degree is actually the same as !clk_90.

Also, according to the Xilinx data sheet, if you make a PLL output with clk_0, clk_90, clk_270, you can tie any of these outputs to one or two of that PLL's DCM modules and use that DCM to output 1 tunable 256 step phase from the clock you are receiving from the PLL.  Also, the output from that DCM inverted will be 180 degree out phase compared to where you tuned.

Presetting and using multiple outputs from each DCM, or trying to get the DCM to do a frequency conversion will probably loose it's ability to be software tuned in real time.  Since you get 2 DCM per PLL, and the PLL itself is also tunable, only with far fewer steps, your options should be almost boundless.  This is why the main PLL should first be used for the main frequency conversion and system clock generation unless you are doing something special.

Title: Re: DDR3 initialization sequence issue
Post by: promach on July 29, 2021, 02:10:16 am
Quote
use that DCM to output 1 tunable 256 step phase from the clock you are receiving from the PLL.

cool, but how to phase-shift to the middle of DQ bit once the first DQ bit transition edge is detected ?

Besides, I suppose the transition edge detection should not be done using FPGA fabric ?


When I simulate my design with DCM, I have Warning : Input Clock Period Jitter on instance test_ddr3_memory_controller.ddr3_control.pll_ddr.dcm_sp_inst exceeds 1.000 ns. Locked CLKIN Period = 0.822. Current CLKIN Period = 0.822. ?

Why PLL DCM could not be locked ?

(https://i.imgur.com/lz6cyKU.png)

(https://i.imgur.com/mzUHnee.png)

(https://i.imgur.com/34BVHY4.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 01, 2021, 03:22:20 am
Why ck_dynamic is having period of 0.822ns when it is stated to be of 333MHz frequency?

(https://i.imgur.com/Rr1jS8Q.png)

(https://i.imgur.com/jgfvxk6.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 06, 2021, 01:53:23 pm
I have solved the locked issue above.

Now, I have this Warning : Please wait for PSDONE (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L730-L747) signal before adjusting the Phase Shift issue.  Why ?

(https://i.imgur.com/0YZGNXU.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 06, 2021, 06:54:13 pm
I have solved the locked issue above.

Now, I have this Warning : Please wait for PSDONE (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L730-L747) signal before adjusting the Phase Shift issue.  Why ?

(https://i.imgur.com/0YZGNXU.png)

Phase shifting is like changing the PLL settings, so you need to wait for the new PLL lock.
Though, the step is so small, the PLL moves smoothly.

This takes a few clock cycles with Altera PLLs as well.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 07, 2021, 12:08:38 am
Quote
Phase shifting is like changing the PLL settings, so you need to wait for the new PLL lock.
Though, the step is so small, the PLL moves smoothly.

This takes a few clock cycles with Altera PLLs as well.

Xilinx requires only a single clock cycle to wait for the new PLL lock (https://github.com/promach/DDR/commit/405c27bca5fccc088b600804c939d391c9147f4e)

However, ck_dynamic waveform is not really locked to udqs_r even though lock_dynamic is asserted high.  WHY ?

(https://i.imgur.com/M2pn4Ox.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 07, 2021, 01:16:56 am
Strange, why the Warning : Please wait for PSDONE signal before adjusting the Phase Shift. come back again ?

(https://i.imgur.com/o0kvbGG.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 07, 2021, 02:28:05 am
I managed to solve the PSDONE warning (https://github.com/promach/DDR/commit/2a20acfba377c0a3ad3890d3a91680e4514422db) :)

However, ck_dynamic output is still incorrect with respect to udqs_r

Note: I try to read this Xilinx support webpage (https://www.xilinx.com/support/answers/52806.html), but I still could not find what I want in order to debug the non-working dynamic phase shift.

(https://i.imgur.com/ezZtJum.png)

Title: Re: DDR3 initialization sequence issue
Post by: promach on August 13, 2021, 03:33:26 pm
@NorthGuy  I tried to read about DCM_SP settings (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=83) and modified the code to use CLKFX instead of CLK0, but ck_dynamic signal is still wrong.  Why ?

Code: [Select]
// file: pll_tuneable.v
//
// (c) Copyright 2008 - 2011 Xilinx, Inc. All rights reserved.
//
// This file contains confidential and proprietary information
// of Xilinx, Inc. and is protected under U.S. and
// international copyright and other intellectual property
// laws.
//
// DISCLAIMER
// This disclaimer is not a license and does not grant any
// rights to the materials distributed herewith. Except as
// otherwise provided in a valid license issued to you by
// Xilinx, and to the maximum extent permitted by applicable
// law: (1) THESE MATERIALS ARE MADE AVAILABLE "AS IS" AND
// WITH ALL FAULTS, AND XILINX HEREBY DISCLAIMS ALL WARRANTIES
// AND CONDITIONS, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING
// BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, NON-
// INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE; and
// (2) Xilinx shall not be liable (whether in contract or tort,
// including negligence, or under any other theory of
// liability) for any loss or damage of any kind or nature
// related to, arising under or in connection with these
// materials, including for any direct, or any indirect,
// special, incidental, or consequential loss or damage
// (including loss of data, profits, goodwill, or any type of
// loss or damage suffered as a result of any action brought
// by a third party) even if such damage or loss was
// reasonably foreseeable or Xilinx had been advised of the
// possibility of the same.
//
// CRITICAL APPLICATIONS
// Xilinx products are not designed or intended to be fail-
// safe, or for use in any application requiring fail-safe
// performance, such as life-support or safety devices or
// systems, Class III medical devices, nuclear facilities,
// applications related to the deployment of airbags, or any
// other applications that could lead to death, personal
// injury, or severe property or environmental damage
// (individually and collectively, "Critical
// Applications"). Customer assumes the sole risk and
// liability of any use of Xilinx products in Critical
// Applications, subject only to applicable laws and
// regulations governing limitations on product liability.
//
// THIS COPYRIGHT NOTICE AND DISCLAIMER MUST BE RETAINED AS
// PART OF THIS FILE AT ALL TIMES.
//
//----------------------------------------------------------------------------
// User entered comments
//----------------------------------------------------------------------------
// None
//
//----------------------------------------------------------------------------
// "Output    Output      Phase     Duty      Pk-to-Pk        Phase"
// "Clock    Freq (MHz) (degrees) Cycle (%) Jitter (ps)  Error (ps)"
//----------------------------------------------------------------------------
// CLK_OUT1___333.333_____90.000______50.0______260.000____150.000
//
//----------------------------------------------------------------------------
// "Input Clock   Freq (MHz)    Input Jitter (UI)"
//----------------------------------------------------------------------------
// __primary__________50.000____________0.010

`timescale 1ps/1ps

(* CORE_GENERATION_INFO = "pll_tuneable,clk_wiz_v3_6,{component_name=pll_tuneable,use_phase_alignment=true,use_min_o_jitter=false,use_max_i_jitter=false,use_dyn_phase_shift=true,use_inclk_switchover=false,use_dyn_reconfig=false,feedback_source=FDBK_AUTO,primtype_sel=DCM_SP,num_out_clk=1,clkin1_period=3.000,clkin2_period=3.000,use_power_down=false,use_reset=true,use_locked=true,use_inclk_stopped=true,use_status=true,use_freeze=false,use_clk_valid=true,feedback_type=SINGLE,clock_mgr_type=AUTO,manual_override=true}" *)
module pll_tuneable
 (// Clock in ports
  input         clk,
  // Clock out ports
  output        ck_dynamic,
  // Dynamic phase shift ports
  input         psclk,
  input         psen,
  input         psincdec,
  output        psdone,
  // Status and control signals
  input         reset,
  output [2:0]  status,
  output        input_clk_stopped,
  output        locked_dynamic,
  output        clk_valid
 );

  // Input buffering
  //------------------------------------
  BUFG clkin1_buf
   (.O (clkin1),
    .I (clk));


  // Clocking primitive
  //------------------------------------

  // Instantiation of the DCM primitive
  //    * Unused inputs are tied off
  //    * Unused outputs are labeled unused
  wire        locked_int;
  wire [7:0]  status_int;
  wire clkfb;
  wire clk0;
  wire clkfx;

  DCM_SP
  #(.CLKDV_DIVIDE          (2.000),
    .CLKFX_DIVIDE          (2),
    .CLKFX_MULTIPLY        (2),
    .CLKIN_DIVIDE_BY_2     ("FALSE"),
    .CLKIN_PERIOD          (3.000),
    .CLKOUT_PHASE_SHIFT    ("VARIABLE"),
    .CLK_FEEDBACK          ("1X"),
    .DESKEW_ADJUST         ("SOURCE_SYNCHRONOUS"),
    .PHASE_SHIFT           (64),
    .STARTUP_WAIT          ("FALSE"))
  dcm_sp_inst
    // Input clock
   (.CLKIN                 (clkin1),
    .CLKFB                 (clkfb),
    // Output clocks
    .CLK0                  (clk0),
    .CLK90                 (),
    .CLK180                (),
    .CLK270                (),
    .CLK2X                 (),
    .CLK2X180              (),
    .CLKFX                 (clkfx),
    .CLKFX180              (),
    .CLKDV                 (),
    // Ports for dynamic phase shift
    .PSCLK                 (psclk),
    .PSEN                  (psen),
    .PSINCDEC              (psincdec),
    .PSDONE                (psdone),
    // Other control and status signals
    .LOCKED                (locked_int),
    .STATUS                (status_int),
 
    .RST                   (reset),
    // Unused pin- tie low
    .DSSEN                 (1'b0));

    assign locked_dynamic = locked_int;
    assign status = status_int[2:0];
    assign input_clk_stopped = status_int[1];
    assign clk_valid = ( ( locked_int == 1'b 1 ) && ( status_int[2:1] == 2'b 0 ) );

  // Output buffering
  //-----------------------------------
  BUFG clkf_buf
   (.O (clkfb),
    .I (clk0));

  BUFG clkout1_buf
   (.O   (ck_dynamic),
    .I   (clkfx));




endmodule
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 15, 2021, 02:55:11 am
ck_dynamic output frequency issue is solved.

I made a mistake in the CLKIN_PERIOD and M/D ratio inside the clocking wizard configuration.
I have attached the wizard-generated pll_tuneable.v file.

From the simulation waveform, it seems that ck_dynamic output is still not 90 degree phase-locked (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L684-L704) (it is now 180 degree) to incoming udqs_r.  Why ?

(https://i.imgur.com/18qerum.png)

Code: [Select]
// file: pll_tuneable.v
//
// (c) Copyright 2008 - 2011 Xilinx, Inc. All rights reserved.
//
// This file contains confidential and proprietary information
// of Xilinx, Inc. and is protected under U.S. and
// international copyright and other intellectual property
// laws.
//
// DISCLAIMER
// This disclaimer is not a license and does not grant any
// rights to the materials distributed herewith. Except as
// otherwise provided in a valid license issued to you by
// Xilinx, and to the maximum extent permitted by applicable
// law: (1) THESE MATERIALS ARE MADE AVAILABLE "AS IS" AND
// WITH ALL FAULTS, AND XILINX HEREBY DISCLAIMS ALL WARRANTIES
// AND CONDITIONS, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING
// BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, NON-
// INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE; and
// (2) Xilinx shall not be liable (whether in contract or tort,
// including negligence, or under any other theory of
// liability) for any loss or damage of any kind or nature
// related to, arising under or in connection with these
// materials, including for any direct, or any indirect,
// special, incidental, or consequential loss or damage
// (including loss of data, profits, goodwill, or any type of
// loss or damage suffered as a result of any action brought
// by a third party) even if such damage or loss was
// reasonably foreseeable or Xilinx had been advised of the
// possibility of the same.
//
// CRITICAL APPLICATIONS
// Xilinx products are not designed or intended to be fail-
// safe, or for use in any application requiring fail-safe
// performance, such as life-support or safety devices or
// systems, Class III medical devices, nuclear facilities,
// applications related to the deployment of airbags, or any
// other applications that could lead to death, personal
// injury, or severe property or environmental damage
// (individually and collectively, "Critical
// Applications"). Customer assumes the sole risk and
// liability of any use of Xilinx products in Critical
// Applications, subject only to applicable laws and
// regulations governing limitations on product liability.
//
// THIS COPYRIGHT NOTICE AND DISCLAIMER MUST BE RETAINED AS
// PART OF THIS FILE AT ALL TIMES.
//
//----------------------------------------------------------------------------
// User entered comments
//----------------------------------------------------------------------------
// None
//
//----------------------------------------------------------------------------
// "Output    Output      Phase     Duty      Pk-to-Pk        Phase"
// "Clock    Freq (MHz) (degrees) Cycle (%) Jitter (ps)  Error (ps)"
//----------------------------------------------------------------------------
// CLK_OUT1___350.000_____90.000______50.0______257.143____150.000
//
//----------------------------------------------------------------------------
// "Input Clock   Freq (MHz)    Input Jitter (UI)"
//----------------------------------------------------------------------------
// __primary__________50.000____________0.010

`timescale 1ps/1ps

(* CORE_GENERATION_INFO = "pll_tuneable,clk_wiz_v3_6,{component_name=pll_tuneable,use_phase_alignment=true,use_min_o_jitter=false,use_max_i_jitter=false,use_dyn_phase_shift=true,use_inclk_switchover=false,use_dyn_reconfig=false,feedback_source=FDBK_AUTO,primtype_sel=DCM_SP,num_out_clk=1,clkin1_period=20.000,clkin2_period=20.000,use_power_down=false,use_reset=true,use_locked=true,use_inclk_stopped=true,use_status=true,use_freeze=false,use_clk_valid=true,feedback_type=SINGLE,clock_mgr_type=AUTO,manual_override=true}" *)
module pll_tuneable
 (// Clock in ports
  input         clk,
  // Clock out ports
  output        ck_dynamic,
  // Dynamic phase shift ports
  input         psclk,
  input         psen,
  input         psincdec,
  output        psdone,
  // Status and control signals
  input         reset,
  output [2:0]  status,
  output        input_clk_stopped,
  output        locked_dynamic,
  output        clk_valid
 );

  // Input buffering
  //------------------------------------
  BUFG clkin1_buf
   (.O (clkin1),
    .I (clk));


  // Clocking primitive
  //------------------------------------

  // Instantiation of the DCM primitive
  //    * Unused inputs are tied off
  //    * Unused outputs are labeled unused
  wire        locked_int;
  wire [7:0]  status_int;
  wire clkfb;
  wire clk0;
  wire clkfx;

  DCM_SP
  #(.CLKDV_DIVIDE          (2.000),
    .CLKFX_DIVIDE          (2),
    .CLKFX_MULTIPLY        (14),
    .CLKIN_DIVIDE_BY_2     ("FALSE"),
    .CLKIN_PERIOD          (20.000),
    .CLKOUT_PHASE_SHIFT    ("VARIABLE"),
    .CLK_FEEDBACK          ("1X"),
    .DESKEW_ADJUST         ("SOURCE_SYNCHRONOUS"),
    .PHASE_SHIFT           (64),
    .STARTUP_WAIT          ("FALSE"))
  dcm_sp_inst
    // Input clock
   (.CLKIN                 (clkin1),
    .CLKFB                 (clkfb),
    // Output clocks
    .CLK0                  (clk0),
    .CLK90                 (),
    .CLK180                (),
    .CLK270                (),
    .CLK2X                 (),
    .CLK2X180              (),
    .CLKFX                 (clkfx),
    .CLKFX180              (),
    .CLKDV                 (),
    // Ports for dynamic phase shift
    .PSCLK                 (psclk),
    .PSEN                  (psen),
    .PSINCDEC              (psincdec),
    .PSDONE                (psdone),
    // Other control and status signals
    .LOCKED                (locked_int),
    .STATUS                (status_int),
 
    .RST                   (reset),
    // Unused pin- tie low
    .DSSEN                 (1'b0));

    assign locked_dynamic = locked_int;
    assign status = status_int[2:0];
    assign input_clk_stopped = status_int[1];
    assign clk_valid = ( ( locked_int == 1'b 1 ) && ( status_int[2:1] == 2'b 0 ) );

  // Output buffering
  //-----------------------------------
  BUFG clkf_buf
   (.O (clkfb),
    .I (clk0));

  BUFG clkout1_buf
   (.O   (ck_dynamic),
    .I   (clkfx));




endmodule
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 15, 2021, 05:44:50 pm
I have just temporarily solved the 180 degree phase shift issue just above.

Now, I have STA setup timing issues to worry about.
How shall I tackle these STA issues especially violated path #1 (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1514-L1528) ?

(https://i.imgur.com/PxffCV2.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 16, 2021, 12:06:38 am
Is that you 'OE' control path for the 'DQS's output enable?

Funny, but in my design, I have the OE DQS control tied to the DDR_CK 0 degree phase, not 90.

Also, I use a SDR OE control even though the data driving the IO pins is DDR for both DQS & DQ.  At least in Altera, it does help with routing not having to deal with the extra paths and 180 degree compliment path.

Note that my data DQ 'OE' control port does operate on the DDR_CK_90 phase as that port's write data is also on the DDR_CK_90 clock.

Read up on your DDR buffer.  It also probably has a non-ddr capable OE option, just like the newer MAX10/CycloneV do have a DDR OE, or even a SDR OE which you may set to shift onto the 180 degree falling clock, yet still receive it's control on the 0 degree input clock.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 16, 2021, 12:05:48 pm
@BrianHg  Thanks for reminding me to use differential IOBUFDS primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=128).

However, the STA setup timing violation issue that I am facing now needs to be solved using set_false_path or set_group_group since wait_count is only used inside "clk" domain (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L2013), instead of "ck_90" domain as stated in the STA timing violation table.

Please correct me if wrong.

Someone told me to use either derive_pll_clocks (https://www.xilinx.com/support/documentation/sw_manuals/ug1192-xilinx-design-for-intel.pdf#page=57) OR set_max_delay of 1.25 clock periods OR set_multicycle_path instead of just a brute set_false_path command which literally says that the signal could take 1 ns, or it could take 1 ms, and the user don't care.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 16, 2021, 06:34:09 pm
Someone told me to use either derive_pll_clocks (https://www.xilinx.com/support/documentation/sw_manuals/ug1192-xilinx-design-for-intel.pdf#page=57) OR set_max_delay of 1.25 clock periods OR set_multicycle_path instead of just a brute set_false_path command which literally says that the signal could take 1 ns, or it could take 1 ms, and the user don't care.
Never use false path unless you do not care for random errors from build to built as that path delay  may fall inside one clock, or be accepted at the next clock, or even 2 more since the compiler will not care how slow a path takes from A-B and it may just get routed any which way on the fabric.  You use false path when you dont have any wiring from those 2 clock domains or dont care at all.

Only use multicycle, for example a setting of 2, if your code has been coded to allow for a 1 clock or 2 clock data delay.  This will be random every build and worse even random between multiple controls and bits if you do it between multiple clock domains.  So, use carefully and code with the error intent if you do.

Max delay should only be used for the IO pins and not internal paths as again, each silicon fabric can perform better than what the compiler targets as worst case scenario while the IOs may have specific rigid timing wiring & transistors to achieve a stable external interfacing.  If you are the master sending the clock and all controls to the DDR3, you may actually have a pretty large delay from internal clock to the IO pins, so long as that delay is globally flat between all transmitting IOs.  This will generate havoc with regard to reading data back into the FPGA.

To solve your internal core to IOBUF timing issues, have you tried making a set or series latches/chain to give the compiler the ability to graciously cross the clock domain on it's own terms without having to do any such tricks?  In my controller, my write data path going from CK_0 internally to the IO buffer's CK_90 first goes through a 3-4 DFF chain, the entire data DQ and DM and OE busses before it reaches my DDR_IOBUFF.  Without this chain, (this means all that data is also valid and ready 3-4 clocks early) Quartus would cripple my write data path (it actually cripples my core's CK_0 FMAX, not the actual CK_90 clock) down to ~75MHz, or generate a huge negative error slacks in the NS range instead of my current >400MHz clearance.


(In my code,'BrianHG_DDR3_IO_PORT_ALTERA.sv (v0.95 build)', lines 595 to end has this pipe running on at CK_90, while lines 550-595 has the write BL8 data serializer running on my core system clock CK_0.)
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 17, 2021, 02:52:23 am
Quote
To solve your internal core to IOBUF timing issues, have you tried making a set or series latches/chain to give the compiler the ability to graciously cross the clock domain on it's own terms without having to do any such tricks?  In my controller, my write data path going from CK_0 internally to the IO buffer's CK_90 first goes through a 3-4 DFF chain, the entire data DQ and DM and OE busses before it reaches my DDR_IOBUFF.

@BrianHg  you mean the need for CDC crossing ?

However according to https://www.verilogpro.com/clock-domain-crossing-part-1/ (https://www.verilogpro.com/clock-domain-crossing-part-1/) , slow-to-fast clock domain crossing (output to DRAM) is easy, but it is difficult the other way round (input from DRAM).

Please correct me if wrong.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 17, 2021, 03:37:51 am
Quote
To solve your internal core to IOBUF timing issues, have you tried making a set or series latches/chain to give the compiler the ability to graciously cross the clock domain on it's own terms without having to do any such tricks?  In my controller, my write data path going from CK_0 internally to the IO buffer's CK_90 first goes through a 3-4 DFF chain, the entire data DQ and DM and OE busses before it reaches my DDR_IOBUFF.

@BrianHg  you mean the need for CDC crossing ?

However according to https://www.verilogpro.com/clock-domain-crossing-part-1/ (https://www.verilogpro.com/clock-domain-crossing-part-1/) , slow-to-fast clock domain crossing (output to DRAM) is easy, but it is difficult the other way round (input from DRAM).

Please correct me if wrong.

Remember, we are CDC crossing from the same frequency to the same frequency, just at different phases.  Yes this is easy for the compiler, however, depending on the direction, it will require a layer of sequential DFF in the fabric before the final IO to allow the compiler to auto-accommodate regardless of your setup.  The compiler will do this by register-retiming a number of DFF cells along as carry chain to shift the data from 1 clock to the next phase in increments.  There is a lot more to this happening internally, but this tactic for stretching to data to the final DFF in the IO buffer will work.

When reading data, since the compiler will not know the final end product functioning phase, I specify the initial read clock phase equivalent to the CK_0 phase.  With the knowledge of the potential +/- 1 clock error in the read data as I tune when crossing from the read clock domain to the original CK_0 domain, I have written my own system to do that transfer correct this +/-1 error along all the data bits and controls regardless of the timing noise in the system as each signal path inside the FPGA will have a slight timing error and tuning the read clock position will reveal errors along the edges of certain critical phase positions.  You will probably need to look up and read a ton of crap with ingenuity to engineer your own equivalent solution as mine is based on decades of experiance.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 17, 2021, 03:44:13 am
By the way, your verilog CDC paper does give you a solid clue to how my read data CDC works, but I do not use an acknowledge response and you need 1 more piece to safely cover very wide buses like what happening with a 16bit or 128bit DDR3 data channel.

Again, some items are not needed as in the DDR3, you are crossing form 1 clock domain to the next where both have the same frequency, but are at an unknown phase relationship between one-another.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 17, 2021, 03:46:45 am
Quote
Remember, we are CDC crossing from the same frequency to the same frequency, just at different phases.

@BrianHg 

Not really the same frequency at all. 

wait_count is used only inside "clk" domain which is only 50MHz (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L2013) , but udqs_iobuf_en is used inside "ck_90" domain which is of 350MHz (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1514-L1528)

Note: I am also attaching the timing report  (https://paste.ubuntu.com/p/3MNKDjkvW8/)(test_ddr3_memory_controller.twr) starting from line 1022

(https://i.imgur.com/PxffCV2.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 17, 2021, 03:56:07 am
Are you saying clk_IBUFG_BUFG is a 50MHz clock, and obviously the ck_90 is your DDR3 350MHz clk?

Why is your DDR3_control/*** running at only 50MHz and not at the system 350MHz?

Having multiple clock domain frequencies driving a high speed IO is a big fat no-no for clean high speed FMAX.

If it must be done, any controls running on that input clock domain must fist be latched to the 350MHz domain, then use that bundle to drive the IOs.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 17, 2021, 04:04:04 am
Quote
Are you saying clk_IBUFG_BUFG is a 50MHz clock, and obviously the ck_90 is your DDR3 350MHz clk?

Yes.


Quote
Why is your DDR3_control/*** running at only 50MHz and not at the system 350MHz?

Because there is too much of combinational logic happening inside the finite state machine within "clk" domain (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1987-L2989)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 17, 2021, 04:20:03 am
Quote
Are you saying clk_IBUFG_BUFG is a 50MHz clock, and obviously the ck_90 is your DDR3 350MHz clk?

Yes.


Quote
Why is your DDR3_control/*** running at only 50MHz and not at the system 350MHz?

Because there is too much of combinational logic happening inside the finite state machine within "clk" domain (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1987-L2989)
Then, how do you get commands and data out at high enough speed for the ram?

Hint: inside 'BrianHG_DDR3_PHY_SEQ.sv' (v0.95) read lines 500 to 512.  (DDR_CLK_50 is the DDR_CK  speed / 2 at 0 degrees, not 50 degrees, not 50MHz.  Not shown, CLK_IN is equivalent to your 50MHz input.)
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 17, 2021, 04:21:52 am
Quote
Then, how do you get commands and data out at high enough speed for the ram?

All manufacturer timings have been converted to the slow 50MHz "clk" domain.  Do not worry, I have verified this concern inside Modelsim micron model simulation and Xilinx ISIM simulator.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 17, 2021, 04:46:22 am
Ok, we are talking about something different on top of a different coding philosophy.  I guess you will have to solve this one on your own as my code, except for that power-up sequencer bridge from my 50MHz domain to the DDR_CK/2 domain switch at power-up, completely runs in the DDR_CK & DDR_CK/2 clock domain.

What I can tell you:
1)     Note that running additional clock domains directly off of the same PLL generating the DDR_CK clock, except that you have additional perfect divide by 2 and perfect divide by 4 clock outs will not generate those massive negative slack NS you are getting.

2)     The simulators like Modelsim will always simulate perfect.  ( I could have been finished in a week...)  When you get to silicon and authentic timing reports from the compiler, including after multiple builds with shared logic controlling your ram controller, this will not be the case.  Prepare for a debug and slight code alteration fest even if it appears to meet timing constraints.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 17, 2021, 05:11:07 am
Quote
Note that running additional clock domains directly off of the same PLL generating the DDR_CK clock, except that you have additional perfect divide by 2 and perfect divide by 4 clock outs will not generate those massive negative slack NS you are getting.

Why did you mention about "additional perfect divide by 2 and perfect divide by 4" ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 17, 2021, 05:43:32 am
Say for example you already have 3 PLL output, CK_0, CK_90 and a CK_READ being the tuneable one, I'm saying also add a CK_0_div2 and CK_0_div4, and maybe even CK_0_div8 outputs on that same PLL.  The compiler will only give you the routing timing benefits I mentioned if you place all your control logic on those CK_0/_div2/_div4/_div8 clocks.  Those timing benefits will all be destroyed if any control signals come from your 50MHz CLK_IN or from any other PLL even if it has been set to the same frequency or it was locked to your first PLL.

You should not need to worry about cross clock domain conversion with logic on these clocks communicating across their domains and you will not need to add any .sdc entries to get said performance gains.  Only at the edge of squeezing out the FPGA fabric's top last FMAX is when if your core logic is running on the _div4 clock sending control to the master CK_0 clock domain, you might want to pass a latch in the middle from _div4 to _div2, then the _div2 latched data onto the CK_0, however, this would only be required under extreme optimization circumstance which you should not need.

Operating between these CK_0/_div2/_div4/_div8 clock domains, you can trust that Modelsim's sim will match actual silicon.


Also, Modelsim ignores, doesn't use any .sdc timing information.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 17, 2021, 09:13:32 am
Quote
Operating between these CK_0/_div2/_div4/_div8 clock domains, you can trust that Modelsim's sim will match actual silicon.

Are you suggesting to use PLL output even for slow 50MHz "clk" domain instead of feeding directly from external 50MHz clock crystal ?

If yes, CDC crossing is still needed since the clocks are not in the same frequency although they are from the same PLL.

Besides, I suppose Modelsim could not simulate vendor's PLL IP ?

Please advise.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 17, 2021, 09:40:45 am
Quote
Operating between these CK_0/_div2/_div4/_div8 clock domains, you can trust that Modelsim's sim will match actual silicon.

Are you suggesting to use PLL output even for slow 50MHz "clk" domain instead of feeding directly from external 50MHz clock crystal ?

If yes, CDC crossing is still needed since the clocks are not in the same frequency although they are from the same PLL.

Besides, I suppose Modelsim could not simulate vendor's PLL IP ?

Please advise.
#1, this can help, but will still be nowhere nearly as effective as using a /2, /4, and /8 from the DDR_CK frequency and it will not be effective at all if your DDR_CK output clock is a weird fraction with regard to the 50 MHz input.

#2, Yes, Xilinx should have their PLL function included in their Modelsim library, or be able to generate a library function of their PLL.  You still may use Xilinx's simulator anyways.

Title: Re: DDR3 initialization sequence issue
Post by: promach on August 17, 2021, 09:48:06 am
#1 I am quite confused with your reply : "will still be nowhere nearly as effective as using a /2, /4, and /8 from the DDR_CK frequency". 

Note: My slow 50MHz "clk" domain would be a /7 from the 350MHz DDR_CK frequency.

#2 Wait, which of the following files to be copied to the Modelsim simulation directory ?

Code: [Select]
[phung@archlinux ipcore_dir]$ ls -al *pll*
-rw-r--r-- 1 phung users 1215 Jun  8 22:35 create_pll.tcl
-rw-r--r-- 1 phung users 1218 Jul 19 16:55 create_pll_ck.tcl
-rw-r--r-- 1 phung users 1222 Jul 19 17:21 create_pll_ck_180.tcl
-rw-r--r-- 1 phung users 1222 Jul 19 17:23 create_pll_ck_270.tcl
-rw-r--r-- 1 phung users 1221 Jul 19 17:18 create_pll_ck_90.tcl
-rw-r--r-- 1 phung users 1219 Jul 19 09:15 create_pll_ram.tcl
-rw-r--r-- 1 phung users 1224 Jul 30 21:27 create_pll_tuneable.tcl
-rw-r--r-- 1 phung users 1082 Aug 15 11:01 edit_pll.tcl
-rw-r--r-- 1 phung users 1085 Jul 21 20:36 edit_pll_ck.tcl
-rw-r--r-- 1 phung users 1089 Jul 19 17:29 edit_pll_ck_180.tcl
-rw-r--r-- 1 phung users 1089 Jul 19 17:29 edit_pll_ck_270.tcl
-rw-r--r-- 1 phung users 1088 Jul 19 17:29 edit_pll_ck_90.tcl
-rw-r--r-- 1 phung users 1091 Aug 15 11:10 edit_pll_tuneable.tcl
-rw-r--r-- 1 phung users  721 Aug 15 11:01 pll.asy
-rw-r--r-- 1 phung users 2489 Aug 17 12:50 pll.gise
-rw-r--r-- 1 phung users 2547 Aug 16 01:28 pll.ncf
-rw-r--r-- 1 phung users 1944 Aug 15 11:02 pll.sym
-rwxrwxrwx 1 phung users 2546 Aug 15 11:01 pll.ucf
-rwxrwxrwx 1 phung users 6169 Aug 15 11:01 pll.v
-rwxrwxrwx 1 phung users 3890 Aug 15 11:01 pll.veo
-rw-r--r-- 1 phung users 7959 Aug 15 11:01 pll.xco
-rwxrwxrwx 1 phung users 2981 Aug 15 11:01 pll.xdc
-rw-r--r-- 1 phung users 4867 Aug 15 15:00 pll.xise
-rw-r--r-- 1 phung users  789 Jul 19 16:59 pll_ck.asy
-rw-r--r-- 1 phung users 1258 Aug 17 12:50 pll_ck.gise
-rw-r--r-- 1 phung users 2548 Aug 16 01:28 pll_ck.ncf
-rw-r--r-- 1 phung users 2092 Jul 19 16:59 pll_ck.sym
-rwxrwxrwx 1 phung users 2547 Jul 19 16:59 pll_ck.ucf
-rwxrwxrwx 1 phung users 5679 Jul 19 16:59 pll_ck.v
-rwxrwxrwx 1 phung users 3720 Jul 19 16:59 pll_ck.veo
-rw-r--r-- 1 phung users 7927 Jul 19 16:59 pll_ck.xco
-rwxrwxrwx 1 phung users 2982 Jul 19 16:59 pll_ck.xdc
-rw-r--r-- 1 phung users 4891 Jul 20 00:28 pll_ck.xise
-rw-r--r-- 1 phung users  793 Jul 19 17:22 pll_ck_180.asy
-rw-r--r-- 1 phung users 1270 Aug 17 12:50 pll_ck_180.gise
-rw-r--r-- 1 phung users 2552 Aug 16 01:28 pll_ck_180.ncf
-rw-r--r-- 1 phung users 2100 Jul 19 17:22 pll_ck_180.sym
-rwxrwxrwx 1 phung users 2551 Jul 19 17:22 pll_ck_180.ucf
-rwxrwxrwx 1 phung users 5712 Jul 19 17:22 pll_ck_180.v
-rwxrwxrwx 1 phung users 3732 Jul 19 17:22 pll_ck_180.veo
-rw-r--r-- 1 phung users 7939 Jul 19 17:22 pll_ck_180.xco
-rwxrwxrwx 1 phung users 2986 Jul 19 17:22 pll_ck_180.xdc
-rw-r--r-- 1 phung users 4923 Jul 20 00:28 pll_ck_180.xise
-rw-r--r-- 1 phung users 2028 Jul 19 17:22 pll_ck_180_flist.txt
-rwxrwxrwx 1 phung users 5827 Jul 19 17:22 pll_ck_180_xmdf.tcl
-rw-r--r-- 1 phung users  793 Jul 19 17:25 pll_ck_270.asy
-rw-r--r-- 1 phung users 1270 Aug 17 12:50 pll_ck_270.gise
-rw-r--r-- 1 phung users 2552 Aug 16 01:28 pll_ck_270.ncf
-rw-r--r-- 1 phung users 2100 Jul 19 17:25 pll_ck_270.sym
-rwxrwxrwx 1 phung users 2551 Jul 19 17:25 pll_ck_270.ucf
-rwxrwxrwx 1 phung users 5705 Jul 19 17:25 pll_ck_270.v
-rwxrwxrwx 1 phung users 3732 Jul 19 17:25 pll_ck_270.veo
-rw-r--r-- 1 phung users 7938 Jul 19 17:25 pll_ck_270.xco
-rwxrwxrwx 1 phung users 2986 Jul 19 17:25 pll_ck_270.xdc
-rw-r--r-- 1 phung users 4923 Jul 20 00:28 pll_ck_270.xise
-rw-r--r-- 1 phung users 2028 Jul 19 17:25 pll_ck_270_flist.txt
-rwxrwxrwx 1 phung users 5827 Jul 19 17:25 pll_ck_270_xmdf.tcl
-rw-r--r-- 1 phung users  792 Jul 19 17:20 pll_ck_90.asy
-rw-r--r-- 1 phung users 1267 Aug 17 12:50 pll_ck_90.gise
-rw-r--r-- 1 phung users 2551 Aug 16 01:28 pll_ck_90.ncf
-rw-r--r-- 1 phung users 2098 Jul 19 17:20 pll_ck_90.sym
-rwxrwxrwx 1 phung users 2550 Jul 19 17:20 pll_ck_90.ucf
-rwxrwxrwx 1 phung users 5698 Jul 19 17:20 pll_ck_90.v
-rwxrwxrwx 1 phung users 3729 Jul 19 17:20 pll_ck_90.veo
-rw-r--r-- 1 phung users 7934 Jul 19 17:20 pll_ck_90.xco
-rwxrwxrwx 1 phung users 2985 Jul 19 17:20 pll_ck_90.xdc
-rw-r--r-- 1 phung users 4915 Jul 20 00:28 pll_ck_90.xise
-rw-r--r-- 1 phung users 1969 Jul 19 17:20 pll_ck_90_flist.txt
-rwxrwxrwx 1 phung users 5795 Jul 19 17:20 pll_ck_90_xmdf.tcl
-rw-r--r-- 1 phung users 1792 Jul 19 16:59 pll_ck_flist.txt
-rwxrwxrwx 1 phung users 5699 Jul 19 16:59 pll_ck_xmdf.tcl
-rw-r--r-- 1 phung users 1615 Aug 15 11:02 pll_flist.txt
-rw-r--r-- 1 phung users 1094 Aug 15 11:11 pll_tuneable.asy
-rw-r--r-- 1 phung users 1276 Aug 17 12:50 pll_tuneable.gise
-rw-r--r-- 1 phung users 2556 Aug 16 01:28 pll_tuneable.ncf
-rw-r--r-- 1 phung users 2880 Aug 15 11:11 pll_tuneable.sym
-rwxrwxrwx 1 phung users 2555 Aug 15 11:11 pll_tuneable.ucf
-rwxrwxrwx 1 phung users 6017 Aug 15 11:11 pll_tuneable.v
-rwxrwxrwx 1 phung users 3885 Aug 15 11:11 pll_tuneable.veo
-rw-r--r-- 1 phung users 7949 Aug 15 11:11 pll_tuneable.xco
-rwxrwxrwx 1 phung users 2990 Aug 15 11:11 pll_tuneable.xdc
-rw-r--r-- 1 phung users 4939 Aug 15 11:40 pll_tuneable.xise
-rw-r--r-- 1 phung users 2146 Aug 15 11:11 pll_tuneable_flist.txt
-rwxrwxrwx 1 phung users 5891 Aug 15 11:11 pll_tuneable_xmdf.tcl
-rwxrwxrwx 1 phung users 5603 Aug 15 11:01 pll_xmdf.tcl

pll:
total 40
drwxr-xr-x  6 phung users  4096 Aug 15 11:02 .
drwxr-xr-x 10 phung users 12288 Aug 16 01:28 ..
-rw-rw-rw-  1 phung users  6131 Mar 14 23:41 clk_wiz_v3_6_readme.txt
drwxr-xr-x  2 phung users  4096 Aug 15 11:02 doc
drwxr-xr-x  2 phung users  4096 Aug 15 11:02 example_design
drwxr-xr-x  2 phung users  4096 Aug 15 11:02 implement
drwxr-xr-x  4 phung users  4096 Aug 15 11:02 simulation

pll_ck:
total 40
drwxr-xr-x  6 phung users  4096 Jul 19 16:59 .
drwxr-xr-x 10 phung users 12288 Aug 16 01:28 ..
-rw-rw-rw-  1 phung users  6131 Mar 14 23:41 clk_wiz_v3_6_readme.txt
drwxr-xr-x  2 phung users  4096 Jul 19 16:59 doc
drwxr-xr-x  2 phung users  4096 Jul 19 16:59 example_design
drwxr-xr-x  2 phung users  4096 Jul 19 16:59 implement
drwxr-xr-x  4 phung users  4096 Jul 19 16:59 simulation

pll_ck_180:
total 40
drwxr-xr-x  6 phung users  4096 Jul 19 17:22 .
drwxr-xr-x 10 phung users 12288 Aug 16 01:28 ..
-rw-rw-rw-  1 phung users  6131 Mar 14 23:41 clk_wiz_v3_6_readme.txt
drwxr-xr-x  2 phung users  4096 Jul 19 17:22 doc
drwxr-xr-x  2 phung users  4096 Jul 19 17:22 example_design
drwxr-xr-x  2 phung users  4096 Jul 19 17:22 implement
drwxr-xr-x  4 phung users  4096 Jul 19 17:22 simulation

pll_ck_270:
total 40
drwxr-xr-x  6 phung users  4096 Jul 19 17:25 .
drwxr-xr-x 10 phung users 12288 Aug 16 01:28 ..
-rw-rw-rw-  1 phung users  6131 Mar 14 23:41 clk_wiz_v3_6_readme.txt
drwxr-xr-x  2 phung users  4096 Jul 19 17:25 doc
drwxr-xr-x  2 phung users  4096 Jul 19 17:25 example_design
drwxr-xr-x  2 phung users  4096 Jul 19 17:25 implement
drwxr-xr-x  4 phung users  4096 Jul 19 17:25 simulation

pll_ck_90:
total 40
drwxr-xr-x  6 phung users  4096 Jul 19 17:20 .
drwxr-xr-x 10 phung users 12288 Aug 16 01:28 ..
-rw-rw-rw-  1 phung users  6131 Mar 14 23:41 clk_wiz_v3_6_readme.txt
drwxr-xr-x  2 phung users  4096 Jul 19 17:20 doc
drwxr-xr-x  2 phung users  4096 Jul 19 17:20 example_design
drwxr-xr-x  2 phung users  4096 Jul 19 17:20 implement
drwxr-xr-x  4 phung users  4096 Jul 19 17:20 simulation

pll_tuneable:
total 40
drwxr-xr-x  6 phung users  4096 Aug 15 11:11 .
drwxr-xr-x 10 phung users 12288 Aug 16 01:28 ..
-rw-rw-rw-  1 phung users  6131 Mar 14 23:41 clk_wiz_v3_6_readme.txt
drwxr-xr-x  2 phung users  4096 Aug 15 11:11 doc
drwxr-xr-x  2 phung users  4096 Aug 15 11:11 example_design
drwxr-xr-x  2 phung users  4096 Aug 15 11:11 implement
drwxr-xr-x  4 phung users  4096 Aug 15 11:11 simulation
[phung@archlinux ipcore_dir]$
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 17, 2021, 09:59:13 am
#1 I am quite confused with your reply : "will still be nowhere nearly as effective as using a /2, /4, and /8 from the DDR_CK frequency". 

Note: My slow 50MHz "clk" domain would be a /7 from the 350MHz DDR_CK frequency.

#2 Wait, which of the following files to be copied to the Modelsim simulation directory ?

Code: [Select]
[phung@archlinux ipcore_dir]$ ls -al *pll*
-rw-r--r-- 1 phung users 1215 Jun  8 22:35 create_pll.tcl
-rw-r--r-- 1 phung users 1218 Jul 19 16:55 create_pll_ck.tcl
-rw-r--r-- 1 phung users 1222 Jul 19 17:21 create_pll_ck_180.tcl
-rw-r--r-- 1 phung users 1222 Jul 19 17:23 create_pll_ck_270.tcl
-rw-r--r-- 1 phung users 1221 Jul 19 17:18 create_pll_ck_90.tcl
-rw-r--r-- 1 phung users 1219 Jul 19 09:15 create_pll_ram.tcl
-rw-r--r-- 1 phung users 1224 Jul 30 21:27 create_pll_tuneable.tcl
-rw-r--r-- 1 phung users 1082 Aug 15 11:01 edit_pll.tcl
-rw-r--r-- 1 phung users 1085 Jul 21 20:36 edit_pll_ck.tcl
-rw-r--r-- 1 phung users 1089 Jul 19 17:29 edit_pll_ck_180.tcl
-rw-r--r-- 1 phung users 1089 Jul 19 17:29 edit_pll_ck_270.tcl
-rw-r--r-- 1 phung users 1088 Jul 19 17:29 edit_pll_ck_90.tcl
-rw-r--r-- 1 phung users 1091 Aug 15 11:10 edit_pll_tuneable.tcl
-rw-r--r-- 1 phung users  721 Aug 15 11:01 pll.asy
-rw-r--r-- 1 phung users 2489 Aug 17 12:50 pll.gise
-rw-r--r-- 1 phung users 2547 Aug 16 01:28 pll.ncf
-rw-r--r-- 1 phung users 1944 Aug 15 11:02 pll.sym
-rwxrwxrwx 1 phung users 2546 Aug 15 11:01 pll.ucf
-rwxrwxrwx 1 phung users 6169 Aug 15 11:01 pll.v
-rwxrwxrwx 1 phung users 3890 Aug 15 11:01 pll.veo
-rw-r--r-- 1 phung users 7959 Aug 15 11:01 pll.xco
-rwxrwxrwx 1 phung users 2981 Aug 15 11:01 pll.xdc
-rw-r--r-- 1 phung users 4867 Aug 15 15:00 pll.xise
-rw-r--r-- 1 phung users  789 Jul 19 16:59 pll_ck.asy
-rw-r--r-- 1 phung users 1258 Aug 17 12:50 pll_ck.gise
-rw-r--r-- 1 phung users 2548 Aug 16 01:28 pll_ck.ncf
-rw-r--r-- 1 phung users 2092 Jul 19 16:59 pll_ck.sym
-rwxrwxrwx 1 phung users 2547 Jul 19 16:59 pll_ck.ucf
-rwxrwxrwx 1 phung users 5679 Jul 19 16:59 pll_ck.v
-rwxrwxrwx 1 phung users 3720 Jul 19 16:59 pll_ck.veo
-rw-r--r-- 1 phung users 7927 Jul 19 16:59 pll_ck.xco
-rwxrwxrwx 1 phung users 2982 Jul 19 16:59 pll_ck.xdc
-rw-r--r-- 1 phung users 4891 Jul 20 00:28 pll_ck.xise
-rw-r--r-- 1 phung users  793 Jul 19 17:22 pll_ck_180.asy
-rw-r--r-- 1 phung users 1270 Aug 17 12:50 pll_ck_180.gise
-rw-r--r-- 1 phung users 2552 Aug 16 01:28 pll_ck_180.ncf
-rw-r--r-- 1 phung users 2100 Jul 19 17:22 pll_ck_180.sym
-rwxrwxrwx 1 phung users 2551 Jul 19 17:22 pll_ck_180.ucf
-rwxrwxrwx 1 phung users 5712 Jul 19 17:22 pll_ck_180.v
-rwxrwxrwx 1 phung users 3732 Jul 19 17:22 pll_ck_180.veo
-rw-r--r-- 1 phung users 7939 Jul 19 17:22 pll_ck_180.xco
-rwxrwxrwx 1 phung users 2986 Jul 19 17:22 pll_ck_180.xdc
-rw-r--r-- 1 phung users 4923 Jul 20 00:28 pll_ck_180.xise
-rw-r--r-- 1 phung users 2028 Jul 19 17:22 pll_ck_180_flist.txt
-rwxrwxrwx 1 phung users 5827 Jul 19 17:22 pll_ck_180_xmdf.tcl
-rw-r--r-- 1 phung users  793 Jul 19 17:25 pll_ck_270.asy
-rw-r--r-- 1 phung users 1270 Aug 17 12:50 pll_ck_270.gise
-rw-r--r-- 1 phung users 2552 Aug 16 01:28 pll_ck_270.ncf
-rw-r--r-- 1 phung users 2100 Jul 19 17:25 pll_ck_270.sym
-rwxrwxrwx 1 phung users 2551 Jul 19 17:25 pll_ck_270.ucf
-rwxrwxrwx 1 phung users 5705 Jul 19 17:25 pll_ck_270.v
-rwxrwxrwx 1 phung users 3732 Jul 19 17:25 pll_ck_270.veo
-rw-r--r-- 1 phung users 7938 Jul 19 17:25 pll_ck_270.xco
-rwxrwxrwx 1 phung users 2986 Jul 19 17:25 pll_ck_270.xdc
-rw-r--r-- 1 phung users 4923 Jul 20 00:28 pll_ck_270.xise
-rw-r--r-- 1 phung users 2028 Jul 19 17:25 pll_ck_270_flist.txt
-rwxrwxrwx 1 phung users 5827 Jul 19 17:25 pll_ck_270_xmdf.tcl
-rw-r--r-- 1 phung users  792 Jul 19 17:20 pll_ck_90.asy
-rw-r--r-- 1 phung users 1267 Aug 17 12:50 pll_ck_90.gise
-rw-r--r-- 1 phung users 2551 Aug 16 01:28 pll_ck_90.ncf
-rw-r--r-- 1 phung users 2098 Jul 19 17:20 pll_ck_90.sym
-rwxrwxrwx 1 phung users 2550 Jul 19 17:20 pll_ck_90.ucf
-rwxrwxrwx 1 phung users 5698 Jul 19 17:20 pll_ck_90.v
-rwxrwxrwx 1 phung users 3729 Jul 19 17:20 pll_ck_90.veo
-rw-r--r-- 1 phung users 7934 Jul 19 17:20 pll_ck_90.xco
-rwxrwxrwx 1 phung users 2985 Jul 19 17:20 pll_ck_90.xdc
-rw-r--r-- 1 phung users 4915 Jul 20 00:28 pll_ck_90.xise
-rw-r--r-- 1 phung users 1969 Jul 19 17:20 pll_ck_90_flist.txt
-rwxrwxrwx 1 phung users 5795 Jul 19 17:20 pll_ck_90_xmdf.tcl
-rw-r--r-- 1 phung users 1792 Jul 19 16:59 pll_ck_flist.txt
-rwxrwxrwx 1 phung users 5699 Jul 19 16:59 pll_ck_xmdf.tcl
-rw-r--r-- 1 phung users 1615 Aug 15 11:02 pll_flist.txt
-rw-r--r-- 1 phung users 1094 Aug 15 11:11 pll_tuneable.asy
-rw-r--r-- 1 phung users 1276 Aug 17 12:50 pll_tuneable.gise
-rw-r--r-- 1 phung users 2556 Aug 16 01:28 pll_tuneable.ncf
-rw-r--r-- 1 phung users 2880 Aug 15 11:11 pll_tuneable.sym
-rwxrwxrwx 1 phung users 2555 Aug 15 11:11 pll_tuneable.ucf
-rwxrwxrwx 1 phung users 6017 Aug 15 11:11 pll_tuneable.v
-rwxrwxrwx 1 phung users 3885 Aug 15 11:11 pll_tuneable.veo
-rw-r--r-- 1 phung users 7949 Aug 15 11:11 pll_tuneable.xco
-rwxrwxrwx 1 phung users 2990 Aug 15 11:11 pll_tuneable.xdc
-rw-r--r-- 1 phung users 4939 Aug 15 11:40 pll_tuneable.xise
-rw-r--r-- 1 phung users 2146 Aug 15 11:11 pll_tuneable_flist.txt
-rwxrwxrwx 1 phung users 5891 Aug 15 11:11 pll_tuneable_xmdf.tcl
-rwxrwxrwx 1 phung users 5603 Aug 15 11:01 pll_xmdf.tcl

pll:
total 40
drwxr-xr-x  6 phung users  4096 Aug 15 11:02 .
drwxr-xr-x 10 phung users 12288 Aug 16 01:28 ..
-rw-rw-rw-  1 phung users  6131 Mar 14 23:41 clk_wiz_v3_6_readme.txt
drwxr-xr-x  2 phung users  4096 Aug 15 11:02 doc
drwxr-xr-x  2 phung users  4096 Aug 15 11:02 example_design
drwxr-xr-x  2 phung users  4096 Aug 15 11:02 implement
drwxr-xr-x  4 phung users  4096 Aug 15 11:02 simulation

pll_ck:
total 40
drwxr-xr-x  6 phung users  4096 Jul 19 16:59 .
drwxr-xr-x 10 phung users 12288 Aug 16 01:28 ..
-rw-rw-rw-  1 phung users  6131 Mar 14 23:41 clk_wiz_v3_6_readme.txt
drwxr-xr-x  2 phung users  4096 Jul 19 16:59 doc
drwxr-xr-x  2 phung users  4096 Jul 19 16:59 example_design
drwxr-xr-x  2 phung users  4096 Jul 19 16:59 implement
drwxr-xr-x  4 phung users  4096 Jul 19 16:59 simulation

pll_ck_180:
total 40
drwxr-xr-x  6 phung users  4096 Jul 19 17:22 .
drwxr-xr-x 10 phung users 12288 Aug 16 01:28 ..
-rw-rw-rw-  1 phung users  6131 Mar 14 23:41 clk_wiz_v3_6_readme.txt
drwxr-xr-x  2 phung users  4096 Jul 19 17:22 doc
drwxr-xr-x  2 phung users  4096 Jul 19 17:22 example_design
drwxr-xr-x  2 phung users  4096 Jul 19 17:22 implement
drwxr-xr-x  4 phung users  4096 Jul 19 17:22 simulation

pll_ck_270:
total 40
drwxr-xr-x  6 phung users  4096 Jul 19 17:25 .
drwxr-xr-x 10 phung users 12288 Aug 16 01:28 ..
-rw-rw-rw-  1 phung users  6131 Mar 14 23:41 clk_wiz_v3_6_readme.txt
drwxr-xr-x  2 phung users  4096 Jul 19 17:25 doc
drwxr-xr-x  2 phung users  4096 Jul 19 17:25 example_design
drwxr-xr-x  2 phung users  4096 Jul 19 17:25 implement
drwxr-xr-x  4 phung users  4096 Jul 19 17:25 simulation

pll_ck_90:
total 40
drwxr-xr-x  6 phung users  4096 Jul 19 17:20 .
drwxr-xr-x 10 phung users 12288 Aug 16 01:28 ..
-rw-rw-rw-  1 phung users  6131 Mar 14 23:41 clk_wiz_v3_6_readme.txt
drwxr-xr-x  2 phung users  4096 Jul 19 17:20 doc
drwxr-xr-x  2 phung users  4096 Jul 19 17:20 example_design
drwxr-xr-x  2 phung users  4096 Jul 19 17:20 implement
drwxr-xr-x  4 phung users  4096 Jul 19 17:20 simulation

pll_tuneable:
total 40
drwxr-xr-x  6 phung users  4096 Aug 15 11:11 .
drwxr-xr-x 10 phung users 12288 Aug 16 01:28 ..
-rw-rw-rw-  1 phung users  6131 Mar 14 23:41 clk_wiz_v3_6_readme.txt
drwxr-xr-x  2 phung users  4096 Aug 15 11:11 doc
drwxr-xr-x  2 phung users  4096 Aug 15 11:11 example_design
drwxr-xr-x  2 phung users  4096 Aug 15 11:11 implement
drwxr-xr-x  4 phung users  4096 Aug 15 11:11 simulation
[phung@archlinux ipcore_dir]$

#2, I dont use Xilinx, so I dont know.  However, it's not just a file to copy, I believe you need to 'generate' the lib file in ISE.

#1,  When crossing the clock domain boundary, the compiler does internally synthesize an equivalent multicycle command for each direction for both setup and hold.  The multicycle settings deals with fixed integers and going from DDR_CK to /2 or /4 would be a multicycle of 2 or 4 in one direction and the default 1 in the other while the hold is a bit more tricky and I'm not sure about it.  Just begin to play around and see what happens.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 17, 2021, 10:40:14 am
Quote
When crossing the clock domain boundary, the compiler does internally synthesize an equivalent multicycle command for each direction for both setup and hold. 

If there is an equivalent multicycle command used internally inside the tool, why would there still be setup timing violation shown earlier (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3631325/#msg3631325) ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 17, 2021, 11:05:47 am
Quote
When crossing the clock domain boundary, the compiler does internally synthesize an equivalent multicycle command for each direction for both setup and hold. 

If there is an equivalent multicycle command used internally inside the tool, why would there still be setup timing violation shown earlier (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3631325/#msg3631325) ?

It is a compiler intelligence issue, not that the compiler actually generated the multicycle commands.
It would be akin to when you generate a huge array in verilog, some compilers are smart enough to auto infer a generation of using ram blocks instead of logic cells for you.
Multiple evenly divided clock domain sourced from a single PLL multiple outputs allows the compiler to infer the allowable multicycle range from clk# to clk#.  While going from a clk input to any of the pll outputs incur unknowns/uncertainties which cannot be accounted for, so the compiler must treat the entire system with the worst possible conditions, or what you tell it in the .sdc file.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 17, 2021, 11:35:23 am
Quote
Multiple evenly divided clock domain sourced from a single PLL multiple outputs allows the compiler to infer the allowable multicycle range from clk# to clk#.  While going from a clk input to any of the pll outputs incur unknowns/uncertainties which cannot be accounted for, so the compiler must treat the entire system with the worst possible conditions, or what you tell it in the .sdc file.

Just to clarify, why this quoted statement above ONLY applies to evenly divided clock domain ?
What about oddly divided clock domain ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 17, 2021, 12:44:48 pm
Not even or odd, but divide powers of 2.   IE divide by 2,4,8,16,32...
Otherwise, on one cycle, you may begin at a phase 0, but on the next, your timing will slice through a mid point or 1 cycle prior to the end point, messing up the reserved multicycle timing.  With powers of 2, you timing slice will always be on phase 0 and phase 360 from one power of 2 domain to the next.

Using an odd number like 7 may generate clean slots, but the compiler will be forced to always use the default multicycle of 1 and treat the all the signals crossing the clock domains will require the same precision as the full speed DDR_CK clock.  Still an improvement from the 50MHz source to the PLL outputs, but it will not have the greater magnitude of effect when using the divide by powers of 2.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 18, 2021, 03:09:07 am
Quote
Using an odd number like 7 may generate clean slots, but the compiler will be forced to always use the default multicycle of 1 and treat the all the signals crossing the clock domains will require the same precision as the full speed DDR_CK clock.

Why forced to always use the default multicycle of 1 ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 18, 2021, 03:37:07 am
Quote
Using an odd number like 7 may generate clean slots, but the compiler will be forced to always use the default multicycle of 1 and treat the all the signals crossing the clock domains will require the same precision as the full speed DDR_CK clock.

Why forced to always use the default multicycle of 1 ?
Draw out your 2 clocks manually on a graph paper, the 1x DDR_CK and the divide by /7 just below to scale for a good 100 DDR_CK cycles.
Then you tell me why for the data to always be valid from the /7 to the DDR_CK, you are forced to set the multicycle to the basic minimum 1...
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 18, 2021, 04:51:25 am
I think I got your point of skipping one cycle in actual 1/7 clock divider circuit with 50% duty cycle. (https://www.edaplayground.com/x/cY7y)
However, I am confused with your statement : set the multicycle to the basic minimum 1

Please correct me if my clock divider implementation is wrong.

(https://i.imgur.com/7AUW4MD.png)

Code: [Select]
// Adapted from http://www.fpga4fun.com/MusicBox1.html

module clk_div (i_clk, clk_slow);
 
  input i_clk;
  output reg clk_slow = 0;
 
  localparam THRESHOLD = 7;  // divides i_clk by 7 to obtain ck_stb which is the divided clock signal
 
  reg [($clog2(THRESHOLD >> 1)-1):0] counter = 0;
  reg counter_reset = 0;
 
  always @(posedge i_clk)
    counter_reset <= (counter == (THRESHOLD >> 1) - 1'b1);
 
  always @(posedge i_clk)
  begin
    if(counter_reset)
      counter <= 1;   
    else
      counter <= counter + 1;
   
    //$display("$clog2(THRESHOLD) = ", $clog2(THRESHOLD));
  end
 
  always @(posedge i_clk)
    if(counter_reset)
      clk_slow <= ~clk_slow;
 
endmodule

Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 18, 2021, 05:43:15 am
What I am saying when the slower clock outputs data, to the faster clock domain, all your data and controls need to be ready by the next rising clock cycle on the faster clock domain.  That's a multicycle of 1.  Now, if you have it set to 2, then the compiler wont care/put in the effort and to make sure that data makes it in time, instead it will allow those paths to reach the faster clock domain by the faster domains second rising clock.  The problem here is that this is an allowance, meaning, some signals may still arrive in time during the first rising edge.

Now, with a 1:2 slower clock, the compiler can design the higher speed 1x clock domain to ignore early signals by just ignoring source control signals until after every second clock.

But, with 1:7, now you may have to engineer this 'ignore' early arived paths and wait just until the end of the last clock before the /7 flips again to prevent unpredictable behavior.

There multiple ways to achieve this, however, when you operate in the 1:2 domain, the compiler just automatically handles this for you.  At least, my history with Quartus seems to handle it transparently.  No .sdc entries, no multicycle needed, and I get cross domain controls running above 400Mhz on their slowest crummy 15 year old FPGAs.


Title: Re: DDR3 initialization sequence issue
Post by: promach on August 18, 2021, 01:19:18 pm
Quote
But, with 1:7, now you may have to engineer this 'ignore' early arived paths and wait just until the end of the last clock before the /7 flips again to prevent unpredictable behavior.

Allow me some time to study and implement Multi-cycle path (MCP) formulation with feedback (http://www.verilogpro.com/clock-domain-crossing-design-part-3/)
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 19, 2021, 11:28:59 am
wait, let me clarify my doubt on the STA issue (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3629460/#msg3629460) with regards to the following relevant coding stated inside the setup timing violated path report.

It seems that the CDC concern that we had been discussing is tied around ODDR2 primitive.
C0 and C1 signals are of 350MHz clock domains, but D0 and D1 signals are of 50MHz clock domains.

Would Multi-cycle path (MCP) formulation with feedback (http://www.verilogpro.com/clock-domain-crossing-design-part-3/) really work in this particular scenario ?

Code: [Select]

wire data_read_is_ongoing = ((wait_count > TIME_RL-TIME_TRPRE) &&
((main_state == STATE_READ) || (main_state == STATE_READ_AP))) ||
  (main_state == STATE_READ_DATA);

               ODDR2 #(
.DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
.INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
.SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
)
ODDR2_udqs_iobuf_en(
.Q(udqs_iobuf_enable),  // 1-bit DDR output data
.C0(ck_90),  // 1-bit clock input
.C1(ck_270),  // 1-bit clock input
.CE(1'b1),  // 1-bit clock enable input
.D0(data_read_is_ongoing),    // 1-bit DDR data input (associated with C0)
.D1(data_read_is_ongoing),    // 1-bit DDR data input (associated with C1)
.R(1'b0),    // 1-bit reset input
.S(1'b0)     // 1-bit set input
);

(https://i.imgur.com/PxffCV2.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 19, 2021, 02:55:09 pm
It seems that the CDC concern that we had been discussing is tied around ODDR2 primitive.
C0 and C1 signals are of 350MHz clock domains, but D0 and D1 signals are of 50MHz clock domains.

Why would you need that?

At any rate, assuming the 50MHz clock and the 350MHz clock are synchronized, there's no multi-path timing issues here - the data produced by the edge of the 50MHz clock which is synchronized with a rising edge of the 350 MHz clock must be sampled by the very next edge of the 350 MHz clock. Where do you see multi-cycle path here?
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 20, 2021, 04:29:12 pm
I tried your suggestion of using PLL output of 50MHz, but the setup timing violation issue is still unresolved.

Maybe I also need synchronizer CDC technique in this case ?

(https://i.imgur.com/hYCyNcR.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 20, 2021, 06:21:57 pm
... but the setup timing violation issue is still unresolved.

You're passing the signal from clk_pll to clk_270 (I guess both at 350 MHz), so you have (1/350)*(270/360) = 2.14 ns for this and you want to fit 3 logic levels into that time. It's not going to work.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 21, 2021, 09:51:28 pm
@NorthGuy The 3 logic level comes from this data_read_is_ongoing (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1334-L1336) signal

Code: [Select]
wire data_read_is_ongoing = ((wait_count > TIME_RL-TIME_TRPRE) &&
((main_state == STATE_READ) || (main_state == STATE_READ_AP))) ||
  (main_state == STATE_READ_DATA);

So, the setup timing issue around the following ODDR2_ldqs_iobuf_en (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1477-L1496) is not trivial to solve given that ldqs_w (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L775-L789) signal is of double-data-rate signal and is driven by ck_90 and ck_270

Any advice ?

(https://i.imgur.com/gzceegf.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 22, 2021, 01:43:31 pm
So, the setup timing issue around the following ODDR2_ldqs_iobuf_en (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1477-L1496) is not trivial to solve given that ldqs_w (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L775-L789) signal is of double-data-rate signal and is driven by ck_90 and ck_270

Any advice ?

You can pipeline.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 22, 2021, 03:20:33 pm
Quote
You can pipeline.

Pipeline as in adding registers ?

If yes, then this is not as trivial as it might be given the double-data-rate nature of ldqs_iobuf_enable signal.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 22, 2021, 03:54:38 pm
Quote
You can pipeline.

Pipeline as in adding registers ?

If yes, then this is not as trivial as it might be given the double-data-rate nature of data_read_is_ongoing and ldqs_iobuf_enable signals.

You issue a read command several cycles before you need to start driving DQS. Therefore, you have plenty of cycles to use. Just create a flop which produces the "enable" signal one cycle ahead then use this last cycle to pass it to the different clock domain.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 22, 2021, 04:23:46 pm
The pipelining for data_read_is_ongoing signal has been done as follows.
However, ODDR2_ldqs_iobuf_en needs to be inside both ck_90 and ck_270 domains.

How shall I properly synchronize data_read_is_ongoing signal from ck clock domain to ck_90 and ck_270 clock domains which have 90 degrees and 270 degrees phase difference respectively ?

Code: [Select]
// wire data_read_is_ongoing = ((wait_count > TIME_RL-TIME_TRPRE) &&
// ((main_state == STATE_READ) || (main_state == STATE_READ_AP))) ||
//   (main_state == STATE_READ_DATA);

// for pipelining in order to solve STA setup timing violation issue
reg data_read_is_ongoing, data_read_is_ongoing_temp_1, data_read_is_ongoing_temp_2, data_read_is_ongoing_temp_3;

// ck is 350MHz, and the logic inside 'data_read_is_ongoing' are of 50MHz clk_pll domain
// ck and clk_pll clock domains do not have phase difference,
// but 'data_read_is_ongoing' signal needs to be used inside ck_90 and ck_270 clock domains
// which have 90 degrees and 270 degrees phase difference respectively
always @(posedge ck)
begin
data_read_is_ongoing_temp_3 <= (main_state == STATE_READ);
data_read_is_ongoing_temp_2 <= data_read_is_ongoing_temp_3 || (main_state == STATE_READ_AP);
data_read_is_ongoing_temp_1 <= data_read_is_ongoing_temp_2 && (wait_count > TIME_RL-TIME_TRPRE);
data_read_is_ongoing <= data_read_is_ongoing_temp_1 || (main_state == STATE_READ_DATA);
end

Code: [Select]
// see [url]https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=61[/url]
// 'data_read_is_ongoing' signal is not of double-data-rate signals,
// but it is connected to T port of IOBUF where its I port is fed in with double-data-rate DQS signals,
// thus the purpose of having the following ODDR2 primitives

ODDR2 #(
.DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
.INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
.SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
)
ODDR2_ldqs_iobuf_en(
.Q(ldqs_iobuf_enable),  // 1-bit DDR output data
.C0(ck_90),  // 1-bit clock input
.C1(ck_270),  // 1-bit clock input
.CE(1'b1),  // 1-bit clock enable input
.D0(data_read_is_ongoing),    // 1-bit DDR data input (associated with C0)
.D1(data_read_is_ongoing),    // 1-bit DDR data input (associated with C1)
.R(1'b0),    // 1-bit reset input
.S(1'b0)     // 1-bit set input
);
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 22, 2021, 05:19:32 pm
Other than the data_read_is_ongoing setup timing violation issue just above, any idea why ck_90 is stated to be the source clock for data_from_ram signal (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1124-L1149) ?

Code: [Select]
genvar data_index_iserdes;
generate
for(data_index_iserdes = 0; data_index_iserdes < (DQ_BITWIDTH*SERDES_RATIO);
data_index_iserdes = data_index_iserdes + DQ_BITWIDTH)
begin: data_from_ram_combine_loop

// the use of $rtoi and $floor functions are to limit the bit range of 'data_index_iserdes'
// since 'data_out_iserdes_0' and 'data_out_iserdes_1' are half the size of 'data_from_ram'

always @(*)
begin
if(((data_index_iserdes/DQ_BITWIDTH) % EVEN_RATIO) == 0)
begin
data_from_ram[data_index_iserdes +: DQ_BITWIDTH] <=
data_out_iserdes_0[DQ_BITWIDTH * $rtoi($floor(data_index_iserdes/(DQ_BITWIDTH << 1)))
+: DQ_BITWIDTH];
end

else begin
data_from_ram[data_index_iserdes +: DQ_BITWIDTH] <=
data_out_iserdes_1[DQ_BITWIDTH * $rtoi($floor(data_index_iserdes/(DQ_BITWIDTH << 1)))
+: DQ_BITWIDTH];
end
end
end
endgenerate

(https://i.imgur.com/P5wieEv.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 22, 2021, 06:09:45 pm
How shall I properly synchronize data_read_is_ongoing signal from ck clock domain to ck_90 and ck_270 clock domains which have 90 degrees and 270 degrees phase difference respectively ?

Why would you need this? The ODDR element takes signal in one clock domain and then converts them to DDR.

Other than the data_read_is_ongoing setup timing violation issue just above, any idea why ck_90 is stated to be the source clock for data_from_ram signal (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1124-L1149)?

The source clock for the path is the clock which clocks the source (i000051 in this case). data_from_ram is calculated based on various signals, which possibly are clocked with various clocks, so it may have various paths from different clock domains.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 23, 2021, 03:01:44 am
Quote
You issue a read command several cycles before you need to start driving DQS. Therefore, you have plenty of cycles to use. Just create a flop which produces the "enable" signal one cycle ahead then use this last cycle to pass it to the different clock domain.

@NorthGuy

What do you exactly mean by "use this last cycle to pass it to the different clock domain" ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 23, 2021, 01:36:54 pm
What do you exactly mean by "use this last cycle to pass it to the different clock domain" ?

I meant that you have a flip-flop clocked by 0-phase clock. The output of this flip-flop is connected directly to an input of IO block (such as ODDR) which is clocked by 270-phase clock (without any intermediate LUTs). This will work practically at any clock speed.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 24, 2021, 04:22:46 am
Quote
You issue a read command several cycles before you need to start driving DQS.

@NorthGuy Should it not be after instead of before since there are additional pipeline registers added for data_read_is_ongoing signal ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 24, 2021, 01:04:54 pm
Quote
You issue a read command several cycles before you need to start driving DQS.

@NorthGuy Should it not be after instead of before since there are additional pipeline registers added for data_read_is_ongoing signal ?

It is "before" on the wire. Sure, if you use pipelines of different length, it may be "after" somewhere deep in your code.

However, the timing problems you're experience are for the final path which ends on the hardware blocks for your IO complex. What you do earlier on that path is probably clocked with your working clock.

I don't understand why you need the 270-clock. On writes, DQS is in phase with CK (if you don't use write leveling). On reads, DQS delay is unpredictable because the round trip to the DDR3 chip is on the path. What is the 270-clock for?
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 24, 2021, 01:12:34 pm
Quote
I don't understand why you need the 270-clock. On writes, DQS is in phase with CK (if you don't use write leveling). On reads, DQS delay is unpredictable because the round trip to the DDR3 chip is on the path. What is the 270-clock for?

ck_270 is used inside the following ODDR2 primitive (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1477-L1496) to provide double-data_rate capability.

Code: [Select]
// see [url]https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=61[/url]
// 'data_read_is_ongoing' signal is not of double-data-rate signals,
// but it is connected to T port of IOBUF where its I port is fed in with double-data-rate DQS signals,
// thus the purpose of having the following ODDR2 primitives

ODDR2 #(
.DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
.INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
.SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
)
ODDR2_ldqs_iobuf_en(
.Q(ldqs_iobuf_enable),  // 1-bit DDR output data
.C0(ck_90),  // 1-bit clock input
.C1(ck_270),  // 1-bit clock input
.CE(1'b1),  // 1-bit clock enable input
.D0(data_read_is_ongoing),    // 1-bit DDR data input (associated with C0)
.D1(data_read_is_ongoing),    // 1-bit DDR data input (associated with C1)
.R(1'b0),    // 1-bit reset input
.S(1'b0)     // 1-bit set input
);

Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 24, 2021, 01:34:48 pm
ck_270 is used inside the following ODDR2 primitive (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1477-L1496) to provide double-data_rate capability.

But I assume ck_270 is the clock shifted by 270 degrees. Why do you shift the clock by 270 degrees for DQS enables?
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 24, 2021, 01:35:55 pm
this ck_270 is for ldqs_iobuf_enable , not for ldqs_r itself.  Check carefully
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 24, 2021, 01:58:23 pm
this ck_270 is for ldqs_iobuf_enable , not for ldqs_r itself.  Check carefully

I thought that earlier you found out that all the IO-related blocks must be clocked by the same clock (or clock pair).

What the ldqs_iobuf_enable is connected to?

Title: Re: DDR3 initialization sequence issue
Post by: promach on August 24, 2021, 02:22:12 pm
Quote
What the ldqs_iobuf_enable is connected to?

It is connected to direction control port of the tri-state buffer IOBUF

Code: [Select]
IOBUF IO_ldqs (
.IO(ldqs),
.I(ldqs_w),
.T(ldqs_iobuf_enable),
.O(ldqs_r)
);
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 24, 2021, 04:07:23 pm
Quote
What the ldqs_iobuf_enable is connected to?

It is connected to direction control port of the tri-state buffer IOBUF

Code: [Select]
IOBUF IO_ldqs (
.IO(ldqs),
.I(ldqs_w),
.T(ldqs_iobuf_enable),
.O(ldqs_r)
);

You probably want to drive the T pin low when you write only and leave it tri-stated at all other times. Driving the IOBUF in the idle state only wastes energy. Therefore, you can switch T synchronously to switching O when you do your DDR3 writes.

Shouldn't it be IOBUFTDS instead of IOBUF?
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 27, 2021, 10:29:57 am
My code modification for data_read_is_ongoing (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3639610/#msg3639610) still resulted in setup timing violation. How to get around ?

Note:  multi-bit wait_count signal is from 50MHz clk_pll clock domain, but data_read_is_ongoing_temp_1 is inside 350MHz ck clock domain.

(https://i.imgur.com/Pdf9HIB.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 27, 2021, 12:50:02 pm
My code modification for data_read_is_ongoing (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3639610/#msg3639610) still resulted in setup timing violation.

You still have 3 logic levels.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 27, 2021, 12:52:43 pm
Where is the 3 logic level in the following code snippet ?

Code: [Select]
// wire data_read_is_ongoing = ((wait_count > TIME_RL-TIME_TRPRE) &&
// ((main_state == STATE_READ) || (main_state == STATE_READ_AP))) ||
//   (main_state == STATE_READ_DATA);

// for pipelining in order to solve STA setup timing violation issue
reg data_read_is_ongoing, data_read_is_ongoing_temp_1, data_read_is_ongoing_temp_2, data_read_is_ongoing_temp_3;

// ck is 350MHz, and the logic inside 'data_read_is_ongoing' are of 50MHz clk_pll domain
// ck and clk_pll clock domains do not have phase difference,
// but 'data_read_is_ongoing' signal needs to be used inside ck_90 and ck_270 clock domains
// which have 90 degrees and 270 degrees phase difference respectively
always @(posedge ck)
begin
data_read_is_ongoing_temp_3 <= (main_state == STATE_READ);
data_read_is_ongoing_temp_2 <= data_read_is_ongoing_temp_3 || (main_state == STATE_READ_AP);
data_read_is_ongoing_temp_1 <= data_read_is_ongoing_temp_2 && (wait_count > TIME_RL-TIME_TRPRE);
data_read_is_ongoing <= data_read_is_ongoing_temp_1 || (main_state == STATE_READ_DATA);
end
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 27, 2021, 02:05:56 pm
Where is the 3 logic level in the following code snippet ?

Here, according to your screenshot:

Code: [Select]
data_read_is_ongoing_temp_1 <= data_read_is_ongoing_temp_2 && (wait_count > TIME_RL-TIME_TRPRE);
"wait_count" is probably long enough to require lots of logic to do the comparison.

When doing something fast, you need to stick to simple operations.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 27, 2021, 03:28:37 pm
Quote
"wait_count" is probably long enough to require lots of logic to do the comparison.

What do you exactly mean by long enough ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 27, 2021, 03:50:16 pm
Quote
"wait_count" is probably long enough to require lots of logic to do the comparison.

What do you exactly mean by long enough ?

I mean the size of the variable. The longer it is, the more logic you need. Say, to compare 32-bit numbers you need more logic than to compare 8-bit numbers.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 27, 2021, 04:36:34 pm
@NorthGuy thanks for the advice, the setup timing violation surrounding wait_count is now gone.

I still have 9 paths with setup timing violations.
I am solving the ODDR2_dq_iobuf_en setup timing violation (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1604-L1626) now. Any idea about this path ?

(https://i.imgur.com/OCxqiNJ.png)

Code: [Select]
// As for why 'dq_iobuf_enable' signal is implemented using ODDR2 primitive,
// see [url]https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=61[/url]

// ODDR2: Input Double Data Rate Output Register with Set, Reset and Clock Enable.
// Spartan-6
// Xilinx HDL Libraries Guide, version 14.7

ODDR2 #(
.DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
.INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
.SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
)
ODDR2_dq_iobuf_en(
.Q(dq_iobuf_enable[dq_index]),  // 1-bit DDR output data
.C0(ck),  // 1-bit clock input
.C1(ck_180),  // 1-bit clock input
.CE(1'b1),  // 1-bit clock enable input
.D0(data_read_is_ongoing),    // 1-bit DDR data input (associated with C0)
.D1(data_read_is_ongoing),    // 1-bit DDR data input (associated with C1)
.R(reset),    // 1-bit reset input
.S(1'b0)     // 1-bit set input
);
// End of ODDR2_inst instantiation
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 27, 2021, 04:55:10 pm
I still have 9 paths with setup timing violations.
I am solving the ODDR2_dq_iobuf_en setup timing violation (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1604-L1626) now. Any idea about this path ?

Moving between clock and the same clock shifted by 180 degrees is nearly impossible, it's like having 700 MHz clock.

The ODDR primitive gives you a choice of synchronizing inputs to either C0 or C1 clock (the DDR_ALIGNMENT parameter). You must use it. You currently have it set to "NONE". Instead, synchronize it to the clock which launches the data signal.

Title: Re: DDR3 initialization sequence issue
Post by: promach on August 28, 2021, 12:33:20 am
Quote
Instead, synchronize it to the clock which launches the data signal.

Which exact data signal were you referring in the above quoted sentence ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 28, 2021, 12:37:18 am
Quote
Instead, synchronize it to the clock which launches the data signal.

Which exact data signal were you referring in the above quoted sentence ?

D0 and D1 of the ODDR.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 28, 2021, 12:45:29 am
If I change the value of the DDR_ALIGNMENT attribute of ODDR2_dq_iobuf_en (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1604-L1625) from "NONE" to "C0" , I have the following error with ODDR2_dq_w (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1659-L1673).

Code: [Select]
ERROR:Pack:2531 - The dual data rate register
   "ddr3_control/dq_io[0].ODDR2_dq_iobuf_en" failed to join the "OLOGIC2"
   component as required.  The DDR_ALIGNMENT attribute value on DDR symbol
   "ddr3_control/dq_io[0].ODDR2_dq_w" does not match the value on DDR symbol
   "ddr3_control/dq_io[0].ODDR2_dq_iobuf_en".
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 28, 2021, 04:28:17 am
If I change the value of the DDR_ALIGNMENT attribute of ODDR2_dq_iobuf_en (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1604-L1625) from "NONE" to "C0" , I have the following error with ODDR2_dq_w (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1659-L1673).

Code: [Select]
ERROR:Pack:2531 - The dual data rate register
   "ddr3_control/dq_io[0].ODDR2_dq_iobuf_en" failed to join the "OLOGIC2"
   component as required.  The DDR_ALIGNMENT attribute value on DDR symbol
   "ddr3_control/dq_io[0].ODDR2_dq_w" does not match the value on DDR symbol
   "ddr3_control/dq_io[0].ODDR2_dq_iobuf_en".

You need to change them all.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 28, 2021, 04:36:11 am
Quote
You need to change them all.

However I cannot change for ODDR2_dq_w (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1659-L1673)
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 28, 2021, 09:11:45 am
I tried to get around using the following coding snippet (pay attention to data_read_is_ongoing_180), but still to no avail.

Code: [Select]
// wire data_read_is_ongoing = ((wait_count > TIME_RL-TIME_TRPRE) &&
// ((main_state == STATE_READ) || (main_state == STATE_READ_AP))) ||
//   (main_state == STATE_READ_DATA);

// for pipelining in order to solve STA setup timing violation issue
localparam NUM_OF_READ_PIPELINE_REGISTER_ADDED = 5;
`ifndef TESTBENCH
reg data_read_is_ongoing;
`endif
reg data_read_is_ongoing_temp_1, data_read_is_ongoing_temp_2, data_read_is_ongoing_temp_3, data_read_is_ongoing_temp_4;

// ck is 350MHz, and the logic inside 'data_read_is_ongoing' are of 50MHz clk_pll domain
// ck and clk_pll clock domains do not have phase difference,
// but 'data_read_is_ongoing' signal needs to be used inside ck_90 and ck_270 clock domains
// which have 90 degrees and 270 degrees phase difference respectively
always @(posedge ck)
begin
if(reset)
begin
data_read_is_ongoing_temp_4 <= 0;
data_read_is_ongoing_temp_3 <= 0;
data_read_is_ongoing_temp_2 <= 0;
data_read_is_ongoing_temp_1 <= 0;
data_read_is_ongoing <= 0;
end

else begin
data_read_is_ongoing_temp_4 <= (main_state == STATE_READ);
data_read_is_ongoing_temp_3 <= data_read_is_ongoing_temp_4 || (main_state == STATE_READ_AP);
data_read_is_ongoing_temp_2 <= data_read_is_ongoing_temp_3 &&
// smaller logic comparison for solving setup timing violation
(wait_count[$clog2(TIME_RL-TIME_TRPRE):0] > TIME_RL-TIME_TRPRE);
data_read_is_ongoing_temp_1 <= data_read_is_ongoing_temp_2 ||
// smaller logic comparison for solving setup timing violation
(main_state[$clog2(STATE_READ_DATA):0] == STATE_READ_DATA);
data_read_is_ongoing <= data_read_is_ongoing_temp_1;
end
end

reg data_read_is_ongoing_180;

always @(posedge ck_180)
begin
if(reset) data_read_is_ongoing_180 <= 0;

else data_read_is_ongoing_180 <= data_read_is_ongoing_temp_1;
end

(https://i.imgur.com/f5TtJUA.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 28, 2021, 01:21:30 pm
Quote
You need to change them all.

However I cannot change for ODDR2_dq_w (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1659-L1673)

Sure you can. ODDR has 2 inputs - D0 and D1 - one is for rising edges and the other one is for falling edges. You can synchronize both to either C0 or C1. This way you can do all the work in a single clock domain.

If you don't synchronize then you D0 and D1 will be in different clock domains - D0 is in C0 and D1 is in C1. One clock is inversion of the other. That's what your design does if you use "NONE". If you want to pass signals between these domains, forget about 350 MHz.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 28, 2021, 01:28:43 pm
Quote
You can synchronize both to either C0 or C1. This way you can do all the work in a single clock domain.

However, this ODDR2_dq_w (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1659-L1673) is to generate a double-data-rate outgoing DQ signal to Micron RAM.
If only ck clock domain is involved without using ck_180 clock domain, how would I be able to generate double-data-rate signal using ODDR2 primitive ?
Or did I misunderstand the mechanism of ODDR2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=224) ?


Quote
If you want to pass signals between these domains, forget about 350 MHz.

I do not understand what you exactly mean by forget about 350MHz ?
Note: ck clock domain is of 350MHz
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 28, 2021, 01:42:07 pm
Quote
You can synchronize both to either C0 or C1. This way you can do all the work in a single clock domain.

However, this ODDR2_dq_w (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1659-L1673) is to generate a double-data-rate outgoing DQ signal to Micron RAM.
If only ck clock domain is involved without using ck_180 clock domain, how would I be able to generate double-data-rate signal using ODDR2 primitive ?
Or did I misunderstand the mechanism of ODDR2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=224) ?

The DDR signal is the output of ODDR. D0 and D1 are inputs. There are 3 ways configure the ODDR

- both D0 and D1 are sampled by C0
- both D0 and D1 are sampled by C1
- D0 is sampled by C0 and D1 is sampled by C1

Regardless of what you chose, it still outputs DDR on the output - D0 on rising edges of C0, D1 on falling edges of C0 (rising edges of C1).

Quote
If you want to pass signals between these domains, forget about 350 MHz.

I do not understand what you exactly mean by forget about 350MHz ?
Note: ck clock domain is of 350MHz

With two different clock domains for D0 and D1 you won't be able to run your design at 350 MHz and will have to slow down.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 28, 2021, 04:05:41 pm
@NorthGuy I followed your suggestion, but it resulted in the following routing error.

Code: [Select]
ERROR:Route:472 -
   This design is unrouteable.
   To evaluate the problem please use fpga_editor.
Routing Conflict 1:
Net:ddr3_control/ck_180 on pin CLK1 on location ILOGIC_X0Y22
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y22
    Conflict detected on wire: PINFEED1(-64752,-28286)

Routing Conflict 2:
Net:ddr3_control/ck_180 on pin CLK1 on location ILOGIC_X0Y26
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y26
    Conflict detected on wire: PINFEED1(-64752,-8014)

Routing Conflict 3:
Net:ddr3_control/ck_180 on pin CLK1 on location ILOGIC_X0Y28
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y28
    Conflict detected on wire: PINFEED1(-64752,-4814)

Routing Conflict 4:
Net:ddr3_control/ck_180 on pin CLK1 on location ILOGIC_X0Y23
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y23
    Conflict detected on wire: PINFEED1(-64742,-28254)

Routing Conflict 5:
Net:ddr3_control/ck_180 on pin CLK1 on location ILOGIC_X0Y27
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y27
    Conflict detected on wire: PINFEED1(-64742,-7982)

Routing Conflict 6:
Net:ddr3_control/ck_180 on pin CLK1 on location ILOGIC_X0Y29
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y29
    Conflict detected on wire: PINFEED1(-64742,-4782)


Code: [Select]
diff --git a/ddr3_memory_controller.v b/ddr3_memory_controller.v
index b30e493..a3be291 100644
--- a/ddr3_memory_controller.v
+++ b/ddr3_memory_controller.v
@@ -744,14 +744,14 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  // see https://forums.xilinx.com/t5/Other-FPGA-Architecture/Place-1198-Error-Route-cause-and-possible-solution/m-p/408489/highlight/true#M34528
 
  ODDR2 #(
- .DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
+ .DDR_ALIGNMENT("C0"),  // Sets output alignment to "NONE", "C0" or "C1"
  .INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
- .SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
+ .SRTYPE("ASYNC")  // Specifies "SYNC" or "ASYNC" set/reset
  )
  ODDR2_ck_out(
  .Q(ck_out),  // 1-bit DDR output data
  .C0(ck),  // 1-bit clock input
- .C1(ck_180),  // 1-bit clock input
+ .C1(ck),  // 1-bit clock input
  .CE(1'b1),  // 1-bit clock enable input
  .D0(1'b1),    // 1-bit DDR data input (associated with C0)
  .D1(1'b0),    // 1-bit DDR data input (associated with C1)
@@ -760,14 +760,14 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  );
 
  ODDR2 #(
- .DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
+ .DDR_ALIGNMENT("C0"),  // Sets output alignment to "NONE", "C0" or "C1"
  .INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
- .SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
+ .SRTYPE("ASYNC")  // Specifies "SYNC" or "ASYNC" set/reset
  )
  ODDR2_ck_180_out(
  .Q(ck_180_out),  // 1-bit DDR output data
  .C0(ck_180),  // 1-bit clock input
- .C1(ck),  // 1-bit clock input
+ .C1(ck_180),  // 1-bit clock input
  .CE(1'b1),  // 1-bit clock enable input
  .D0(1'b1),    // 1-bit DDR data input (associated with C0)
  .D1(1'b0),    // 1-bit DDR data input (associated with C1)
@@ -1093,7 +1093,7 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  wire [DQ_BITWIDTH-1:0] dq_w_oserdes_0;  // associated with dqs_w
  wire [DQ_BITWIDTH-1:0] dq_w_oserdes_1;  // associated with dq_n_w
 
- always @(posedge ck_180)     dq_w_d0 <= dq_w_oserdes_0;  // for C0, D0 of ODDR2 primitive
+ always @(posedge ck)     dq_w_d0 <= dq_w_oserdes_0;  // for C0, D0 of ODDR2 primitive
  always @(posedge ck) dq_w_d1 <= dq_w_oserdes_1;  // for C1, D1 of ODDR2 primitive
 
 
@@ -1241,7 +1241,7 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  .data_in(data_in_oserdes_0),
 
  // fast clock domain
- .high_speed_clock(ck_270),
+ .high_speed_clock(ck_90),
  .data_out(dq_w_oserdes_0)
  );
 
@@ -1609,14 +1609,14 @@ wire data_write_is_ongoing = ((wait_count > TIME_WL-TIME_TWPRE) &&
  // Xilinx HDL Libraries Guide, version 14.7
 
  ODDR2 #(
- .DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
+ .DDR_ALIGNMENT("C0"),  // Sets output alignment to "NONE", "C0" or "C1"
  .INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
- .SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
+ .SRTYPE("ASYNC")  // Specifies "SYNC" or "ASYNC" set/reset
  )
  ODDR2_dq_iobuf_en(
  .Q(dq_iobuf_enable[dq_index]),  // 1-bit DDR output data
  .C0(ck),  // 1-bit clock input
- .C1(ck_180),  // 1-bit clock input
+ .C1(ck),  // 1-bit clock input
  .CE(1'b1),  // 1-bit clock enable input
  .D0(data_read_is_ongoing),    // 1-bit DDR data input (associated with C0)
  .D1(data_read_is_ongoing),    // 1-bit DDR data input (associated with C1)
@@ -1657,14 +1657,14 @@ wire data_write_is_ongoing = ((wait_count > TIME_WL-TIME_TWPRE) &&
  // Xilinx HDL Libraries Guide, version 14.7
 
  ODDR2 #(
- .DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
+ .DDR_ALIGNMENT("C0"),  // Sets output alignment to "NONE", "C0" or "C1"
  .INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
- .SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
+ .SRTYPE("ASYNC")  // Specifies "SYNC" or "ASYNC" set/reset
  )
  ODDR2_dq_w(
  .Q(dq_w[dq_index]),  // 1-bit DDR output data
  .C0(ck),  // 1-bit clock input
- .C1(ck_180),  // 1-bit clock input
+ .C1(ck),  // 1-bit clock input
  .CE(1'b1),  // 1-bit clock enable input
  .D0(dq_w_d1[dq_index]),    // 1-bit DDR data input (associated with C0)
  .D1(dq_w_d0[dq_index]),    // 1-bit DDR data input (associated with C1)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 28, 2021, 04:41:50 pm
@NorthGuy I followed your suggestion, but it resulted in the following routing error.

OLOGIC and ILOGIC are parts of the IO object and they must be clocked the same (as the docs say). I think this is because, internally, all their clock pins are fed by the same wire. So, you should feed all the C0,CLK0  with a normal clock. All the C1,CLK1 pins should be fed with an inverted (shifted by 180 degrees) clock.

I don't understand how changing the DDR_ALIGNMENT from "NONE" to "C0" could have changed this though.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 28, 2021, 04:45:00 pm
Quote
you should feed all the C0,CLK0  with a normal clock. All the C1,CLK1 pins should be fed with an inverted (shifted by 180 degrees) clock.

ODDR2 primitive only have C0 and C1 pins.  Where did you find CLK0 and CLK1 pins ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 28, 2021, 05:35:37 pm
Quote
you should feed all the C0,CLK0  with a normal clock. All the C1,CLK1 pins should be fed with an inverted (shifted by 180 degrees) clock.

ODDR2 primitive only have C0 and C1 pins.  Where did you find CLK0 and CLK1 pins ?

Here:

Code: [Select]
Routing Conflict 1:
Net:ddr3_control/ck_180 on pin CLK1 on location ILOGIC_X0Y22
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y22
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 28, 2021, 06:24:37 pm
Quote
OLOGIC and ILOGIC are parts of the IO object and they must be clocked the same (as the docs say)

Could you point to the exact page of the Xilinx doc that mentions the above ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 28, 2021, 11:05:01 pm
Quote
OLOGIC and ILOGIC are parts of the IO object and they must be clocked the same (as the docs say)

Could you point to the exact page of the Xilinx doc that mentions the above ?

Somewhere in the middle of the thread you have posted a big table which lists all the compatible clock modes, from which it was clear that ILOGIC and OLOGIC are clocked with the same clock(s). Since then, along the thread, this issue has been discussed several times.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 29, 2021, 01:29:12 am
ok, found it. (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=51)

So, I presume that the routing error is actually due to DDR_ALIGNMENT needs to be modified for both IDDR and ODDR , instead of just ODDR which is what I did.

However, when I change for both IDDR and ODDR, I have mapping error though.  The error content sentences seem contradictory to one another.

Code: [Select]
ERROR:Pack:2403 - Due to a hardware restriction when an input dual-data register
   has the same clock signal on its clock pins the clock pins needed to have an
   opposite inversion. Please correct inversions on register
   ddr3_control/dq_io[2].IDDR2_dq_r.
ERROR:Pack:2403 - Due to a hardware restriction when an input dual-data register
   has the same clock signal on its clock pins the clock pins needed to have an
   opposite inversion. Please correct inversions on register
   ddr3_control/dq_io[3].IDDR2_dq_r.
ERROR:Pack:2403 - Due to a hardware restriction when an input dual-data register
   has the same clock signal on its clock pins the clock pins needed to have an
   opposite inversion. Please correct inversions on register
   ddr3_control/dq_io[4].IDDR2_dq_r.
ERROR:Pack:2403 - Due to a hardware restriction when an input dual-data register
   has the same clock signal on its clock pins the clock pins needed to have an
   opposite inversion. Please correct inversions on register
   ddr3_control/dq_io[5].IDDR2_dq_r.
ERROR:Pack:2403 - Due to a hardware restriction when an input dual-data register
   has the same clock signal on its clock pins the clock pins needed to have an
   opposite inversion. Please correct inversions on register
   ddr3_control/dq_io[6].IDDR2_dq_r.
ERROR:Pack:2403 - Due to a hardware restriction when an input dual-data register
   has the same clock signal on its clock pins the clock pins needed to have an
   opposite inversion. Please correct inversions on register
   ddr3_control/dq_io[7].IDDR2_dq_r.
ERROR:Pack:2529 - The dual data rate register "ddr3_control/dq_io[2].IDDR2_dq_r"
   failed to join an ILOGIC component as required.
ERROR:Pack:2529 - The dual data rate register "ddr3_control/dq_io[7].IDDR2_dq_r"
   failed to join an ILOGIC component as required.
ERROR:Pack:2529 - The dual data rate register "ddr3_control/dq_io[5].IDDR2_dq_r"
   failed to join an ILOGIC component as required.
ERROR:Pack:2529 - The dual data rate register "ddr3_control/dq_io[3].IDDR2_dq_r"
   failed to join an ILOGIC component as required.
ERROR:Pack:2529 - The dual data rate register "ddr3_control/dq_io[6].IDDR2_dq_r"
   failed to join an ILOGIC component as required.
ERROR:Pack:2529 - The dual data rate register "ddr3_control/dq_io[4].IDDR2_dq_r"
   failed to join an ILOGIC component as required.

(https://i.imgur.com/a7Wk1zL.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 29, 2021, 02:23:00 am
Code: [Select]
ERROR:Pack:2403 - Due to a hardware restriction when an input dual-data register
   has the same clock signal on its clock pins the clock pins needed to have an
   opposite inversion. Please correct inversions on register
   ddr3_control/dq_io[2].IDDR2_dq_r.

I guess it says that C1 must be an inversion of C0. [/code]
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 29, 2021, 02:27:08 am
Thanks for the advice and I got past the mapping error, but I am back to the routing error again.

(https://i.imgur.com/h79uPTU.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 29, 2021, 03:02:40 am
I did some tweaking here and there, and I got past the routing error.

Code: [Select]
diff --git a/ddr3_memory_controller.v b/ddr3_memory_controller.v
index b30e493..8be1957 100644
--- a/ddr3_memory_controller.v
+++ b/ddr3_memory_controller.v
@@ -744,14 +744,14 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  // see https://forums.xilinx.com/t5/Other-FPGA-Architecture/Place-1198-Error-Route-cause-and-possible-solution/m-p/408489/highlight/true#M34528
 
  ODDR2 #(
- .DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
+ .DDR_ALIGNMENT("C0"),  // Sets output alignment to "NONE", "C0" or "C1"
  .INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
- .SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
+ .SRTYPE("ASYNC")  // Specifies "SYNC" or "ASYNC" set/reset
  )
  ODDR2_ck_out(
  .Q(ck_out),  // 1-bit DDR output data
  .C0(ck),  // 1-bit clock input
- .C1(ck_180),  // 1-bit clock input
+ .C1(ck),  // 1-bit clock input
  .CE(1'b1),  // 1-bit clock enable input
  .D0(1'b1),    // 1-bit DDR data input (associated with C0)
  .D1(1'b0),    // 1-bit DDR data input (associated with C1)
@@ -760,14 +760,14 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  );
 
  ODDR2 #(
- .DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
+ .DDR_ALIGNMENT("C0"),  // Sets output alignment to "NONE", "C0" or "C1"
  .INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
- .SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
+ .SRTYPE("ASYNC")  // Specifies "SYNC" or "ASYNC" set/reset
  )
  ODDR2_ck_180_out(
  .Q(ck_180_out),  // 1-bit DDR output data
  .C0(ck_180),  // 1-bit clock input
- .C1(ck),  // 1-bit clock input
+ .C1(ck_180),  // 1-bit clock input
  .CE(1'b1),  // 1-bit clock enable input
  .D0(1'b1),    // 1-bit DDR data input (associated with C0)
  .D1(1'b0),    // 1-bit DDR data input (associated with C1)
@@ -1049,7 +1049,7 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  // will implement dynamic (real-time) phase calibration as project progresses
  wire idelay_cal_dqs_r = &iodelay_startup_counter;  // Wait for IODELAY to be available
 
-
+/*
  IODELAY2 #(
  .DATA_RATE      ("DDR"), // <SDR>, DDR
  .IDELAY_VALUE  (0), // {0 ... 255}
@@ -1078,7 +1078,7 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  .RST      (idelay_is_busy_previously & (~idelay_is_busy)), // Reset delay line
  .BUSY      (idelay_is_busy) // output signal indicating sync circuit has finished / calibration has finished
  );
-
+*/
 
  // RAM -> IOBUF (for inout) -> IDELAY (DQS Centering) -> IDDR2 (input DDR buffer) -> ISERDES
  // OSERDES -> ODDR2 (output DDR buffer) -> ODELAY (DQS Centering) -> IOBUF (for inout) -> RAM
@@ -1093,7 +1093,7 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  wire [DQ_BITWIDTH-1:0] dq_w_oserdes_0;  // associated with dqs_w
  wire [DQ_BITWIDTH-1:0] dq_w_oserdes_1;  // associated with dq_n_w
 
- always @(posedge ck_180)     dq_w_d0 <= dq_w_oserdes_0;  // for C0, D0 of ODDR2 primitive
+ always @(posedge ck)     dq_w_d0 <= dq_w_oserdes_0;  // for C0, D0 of ODDR2 primitive
  always @(posedge ck) dq_w_d1 <= dq_w_oserdes_1;  // for C1, D1 of ODDR2 primitive
 
 
@@ -1172,7 +1172,7 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  dq_iserdes_1
  (
  // fast clock domain
- .high_speed_clock(ck_270),
+ .high_speed_clock(ck_90),
  .data_in(dq_r_q1),
 
  // slow clock domain
@@ -1241,7 +1241,7 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  .data_in(data_in_oserdes_0),
 
  // fast clock domain
- .high_speed_clock(ck_270),
+ .high_speed_clock(ck_90),
  .data_out(dq_w_oserdes_0)
  );
 
@@ -1609,9 +1609,9 @@ wire data_write_is_ongoing = ((wait_count > TIME_WL-TIME_TWPRE) &&
  // Xilinx HDL Libraries Guide, version 14.7
 
  ODDR2 #(
- .DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
+ .DDR_ALIGNMENT("C0"),  // Sets output alignment to "NONE", "C0" or "C1"
  .INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
- .SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
+ .SRTYPE("ASYNC")  // Specifies "SYNC" or "ASYNC" set/reset
  )
  ODDR2_dq_iobuf_en(
  .Q(dq_iobuf_enable[dq_index]),  // 1-bit DDR output data
@@ -1634,10 +1634,10 @@ wire data_write_is_ongoing = ((wait_count > TIME_WL-TIME_TWPRE) &&
  // Xilinx HDL Libraries Guide, version 14.7
 
  IDDR2 #(
- .DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
+ .DDR_ALIGNMENT("C0"),  // Sets output alignment to "NONE", "C0" or "C1"
  .INIT_Q0(1'b0),  // Sets initial state of the Q0 output to 1'b0 or 1'b1
  .INIT_Q1(1'b0),  // Sets initial state of the Q1 output to 1'b0 or 1'b1
- .SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
+ .SRTYPE("ASYNC")  // Specifies "SYNC" or "ASYNC" set/reset
  )
  IDDR2_dq_r(
  .Q0(dq_r_q0[dq_index]),  // 1-bit output captured with C0 clock
@@ -1657,9 +1657,9 @@ wire data_write_is_ongoing = ((wait_count > TIME_WL-TIME_TWPRE) &&
  // Xilinx HDL Libraries Guide, version 14.7
 
  ODDR2 #(
- .DDR_ALIGNMENT("NONE"),  // Sets output alignment to "NONE", "C0" or "C1"
+ .DDR_ALIGNMENT("C0"),  // Sets output alignment to "NONE", "C0" or "C1"
  .INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
- .SRTYPE("SYNC")  // Specifies "SYNC" or "ASYNC" set/reset
+ .SRTYPE("ASYNC")  // Specifies "SYNC" or "ASYNC" set/reset
  )
  ODDR2_dq_w(
  .Q(dq_w[dq_index]),  // 1-bit DDR output data
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 29, 2021, 06:49:18 am
Now, I am solving setup timing violation for dq_r_q0 (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1113-L1169)

I do not see anything wrong with deserializer module (https://github.com/promach/DDR/blob/main/deserializer.v) that could contribute to setup timing issue.

Any advice ?

(https://i.imgur.com/UsOlkEm.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 29, 2021, 01:08:56 pm
Now, I am solving setup timing violation for dq_r_q0 (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1113-L1169)

I do not see anything wrong with deserializer module (https://github.com/promach/DDR/blob/main/deserializer.v) that could contribute to setup timing issue.

Any advice ?

When you pass a signal from a clock to a different clock which only differs in phase, the phase difference is very important.

It is relatively easy to do if the phase difference is 270 degrees.

It is hard (if at all possible) at high frequencies if the phase difference is 180 degrees.

It is impossible at 350 MHz when the phase difference is 90 degrees. You can aim for the next cycle (that is -270 degree difference), but most likely it will be impossible to meet hold violations.

The solution is to use IDELAY to shift DQ and align it with a regular clock.

You need to shift DQ anyway because the phase relationship between DQ and CK depends on the round trip delay:

CK->DDR3 chip->DQ->IDDR

and therefore is unpredictable. So, your controller should be ready for any phase difference.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 29, 2021, 04:13:18 pm
Quote
The solution is to use IDELAY to shift DQ and align it with a regular clock.

I am using @BrianHG method of PLL dynamic phase shift approach (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L692-L712).

With the following modification, I had mapping error.
I checked inside this Xilinx forum post (https://forums.xilinx.com/t5/PCIe-and-CPM/ERROR-Place-1108-for-a-clock-connected-to-Global-clock-pin/m-p/681132/highlight/true#M7305), I do not see how site A10 is related to BUFGMUX_X2Y12

Code: [Select]
diff --git a/ddr3_memory_controller.v b/ddr3_memory_controller.v
index b663904..a9f0e22 100644
--- a/ddr3_memory_controller.v
+++ b/ddr3_memory_controller.v
@@ -684,6 +684,7 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
 
  localparam PLL_STATUS_BITWIDTH = 3;
 
+ wire ck_dynamic, ck_dynamic_180;
  wire locked_dynamic;
  wire [PLL_STATUS_BITWIDTH-1:0] pll_read_status;
  wire input_clk_stopped;
@@ -695,7 +696,8 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  .clk(clk),  // IN 50MHz
 
  // Clock out ports
- .ck_dynamic(ck_dynamic),  // OUT 400MHz, 0 phase shift
+ .ck_dynamic(ck_dynamic),  // OUT 350MHz, 0 phase shift
+ .ck_dynamic_180(ck_dynamic_180),  // OUT 350MHz, 180 phase shift
 
  // Dynamic phase shift ports
  .psclk(udqs_r),  // IN
@@ -1161,7 +1163,7 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  dq_iserdes_0
  (
  // fast clock domain
- .high_speed_clock(ck_90),
+ .high_speed_clock(ck_dynamic),
  .data_in(dq_r_q0),
 
  // slow clock domain
@@ -1172,7 +1174,7 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
  dq_iserdes_1
  (
  // fast clock domain
- .high_speed_clock(ck_90),
+ .high_speed_clock(ck_dynamic),
  .data_in(dq_r_q1),
 
  // slow clock domain
@@ -1647,8 +1649,8 @@ wire data_write_is_ongoing = ((wait_count > TIME_WL-TIME_TWPRE) &&
  IDDR2_dq_r(
  .Q0(dq_r_q0[dq_index]),  // 1-bit output captured with C0 clock
  .Q1(dq_r_q1[dq_index]),  // 1-bit output captured with C1 clock
- .C0(ck),  // 1-bit clock input
- .C1(ck_180),  // 1-bit clock input
+ .C0(ck_dynamic),  // 1-bit clock input
+ .C1(ck_dynamic_180),  // 1-bit clock input
  .CE(1'b1),  // 1-bit clock enable input
  .D(dq_r[dq_index]),    // 1-bit DDR data input
  .R(reset),    // 1-bit reset input

Code: [Select]
ERROR:Place:1108 - A clock IOB / BUFGMUX clock component pair have been found
   that are not placed at an optimal clock IOB / BUFGMUX site pair. The clock
   IOB component <clk> is placed at site <A10>. The corresponding BUFG component
   <ddr3_control/pll_read/clkin1_buf> is placed at site <BUFGMUX_X2Y12>. There
   is only a select set of IOBs that can use the fast path to the Clocker
   buffer, and they are not being used. You may want to analyze why this problem
   exists and correct it. If this sub optimal condition is acceptable for this
   design, you may use the CLOCK_DEDICATED_ROUTE constraint in the .ucf file to
   demote this message to a WARNING and allow your design to continue. However,
   the use of this override is highly discouraged as it may lead to very poor
   timing results. It is recommended that this error condition be corrected in
   the design. A list of all the COMP.PINs used in this clock placement rule is
   listed below. These examples can be used directly in the .ucf file to
   override this clock rule.
   < NET "clk" CLOCK_DEDICATED_ROUTE = FALSE; >

(https://i.imgur.com/AWhfJhB.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 29, 2021, 06:45:56 pm
I am using @BrianHG method of PLL dynamic phase shift approach (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L692-L712).

Input and output must be clocked by the same clock. On writes, DQS is aligned to CK, so DQS must be clocked by CK. On reads, DQS may have any phase relationship with CK, so it cannot be the same clock as CK. The only way you can use BrianHG's method is if you dynamically re-align the clocks in-between reads and writes (or if you switch the clock dynamically). In this case, you will have a read clock domain and a write clock domain which should be treated as asynchronous - meaning you cannot use STA for transitions between these domains. Instead, you should use some sort of CDC solution.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 29, 2021, 09:39:26 pm
According to this illustration here: https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3612509/#msg3612509 (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3612509/#msg3612509)

I think the issue may be with the 'Clock Doubler' module.

Shows you can have a different inclock and outclock, but you probably need to use the 'DDR Two BUFIO2s'.  There may be a restriction on which 2 sets of clocks you are allowed to use at the same time from the PLL.  Those 2 just become your priority when setting up your pll.


Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 30, 2021, 02:37:41 am
'DDR Two BUFIO2s'.

One BUFIO2s is 180 degrees shifted from the other. Hence you need two BUFIOs (or two BUFG, or one BUFG which it'll make into a pair internally). If you want DDR on both input and output, this is "... Only possible when the two BUFGs are common for both input and output", or "...Only possible when the two BUFIO2s are common for both input and output". Thus, the same pair feeds both input and output.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 30, 2021, 03:03:14 am
What do you exactly mean by Only possible when the two BUFIO2s are common for both input and output ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 30, 2021, 04:10:26 am
What do you exactly mean by Only possible when the two BUFIO2s are common for both input and output ?

This is a citation from the Xilinx table.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 30, 2021, 04:20:49 am
Quote
Shows you can have a different inclock and outclock, but you probably need to use the 'DDR Two BUFIO2s'.

@BrianHG Does this mean for DQ (or DQS) double-data-rate IO signal, I could not use PLL dynamic phase shift approach ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 30, 2021, 01:33:42 pm
Since BUFIO2 itself already had USE_DOUBLER=True attribute  (https://www.xilinx.com/support/documentation/user_guides/ug382.pdf#page=48), then why still need "Clock Doubler" module (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3612509/#msg3612509) ?

By the way, I do not quite understand how my current issue is related to the need of using two BUFIO2

(https://i.imgur.com/uS6L3hy.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 30, 2021, 04:48:12 pm
Since BUFIO2 itself already had USE_DOUBLER=True attribute  (https://www.xilinx.com/support/documentation/user_guides/ug382.pdf#page=48),

If you use DDR, the ISERDES/OSERDES clock will have double frequency compared to SDR. That's all there is to it. Read the docs:

Quote
USE_DOUBLER - Used for ISERDES2/OSERDES2 with DATA_RATE = DDR. When set to TRUE, doubles the DIVCLK and SERDESSTROBE frequencies. FDIVCLK = (2 * FIN) / DIVIDE

DIVIDE - Sets the DIVCLK and SERDESSTROBE divider divide-by values.
  FDIVCLK = FIN / DIVIDE  <USE_DOUBLER = FALSE>
  FDIVCLK = (2 * FIN) / DIVIDE  <USE_DOUBLER = TRUE>

If you use IDDR/ODDR in place of ISERDES/OSERDES, DATA_RATE is always DDR (implied), so you always set USE_DOUBLER in BUFIO, so that the clock going to the fabric has the same frequency as the clock going to IDDR/ODDR

If you use BUFG, the BUFG itself can clock the fabric, so you don't need a special fabric clock as you do in case of BUFIO.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 30, 2021, 04:55:34 pm
Quote
If you use BUFG, the BUFG itself can clock the fabric, so you don't need a special fabric clock as you do in case of BUFIO.

I am using BUFG (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3653809/#msg3653809) , so I suppose I do not need to worry about BUFIO in this case ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 30, 2021, 05:53:58 pm
Quote
If you use BUFG, the BUFG itself can clock the fabric, so you don't need a special fabric clock as you do in case of BUFIO.

I am using BUFG (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3653809/#msg3653809) , so I suppose I do not need to worry about BUFIO in this case ?

If you don't use BUFIO then you certainly don't need to worry about it. However, BUFIO can run much faster than BUFG.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 31, 2021, 01:44:44 am
Quote
BUFIO can run much faster than BUFG.

What do you exactly mean by faster ?

By the way, how to approach and solve the mapping error for BUFG (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3653809/#msg3653809) ?

Title: Re: DDR3 initialization sequence issue
Post by: promach on August 31, 2021, 02:54:32 am
Quote
Shows you can have a different inclock and outclock, but you probably need to use the 'DDR Two BUFIO2s'.  There may be a restriction on which 2 sets of clocks you are allowed to use at the same time from the PLL.  Those 2 just become your priority when setting up your pll.

I have solved the mapping error for BUFG by removing BUFG from the input clk for the pll_tuneable IP.

However, I have the following routing error which is related to the need of using 'DDR Two BUFIO2s'
@BrianHG could you advise ?

Note: The routing error arises from the C1 pins of the IDDR2_dq_r and ODDR2_dq_w

Code: [Select]
ERROR:Route:472 -
   This design is unrouteable.
   To evaluate the problem please use fpga_editor.
Routing Conflict 1:
Net:ddr3_control/ck_dynamic_180 on pin CLK1 on location ILOGIC_X0Y22
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y22
    Conflict detected on wire: PINFEED1(-64752,-28286)

Routing Conflict 2:
Net:ddr3_control/ck_dynamic_180 on pin CLK1 on location ILOGIC_X0Y26
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y26
    Conflict detected on wire: PINFEED1(-64752,-8014)

Routing Conflict 3:
Net:ddr3_control/ck_dynamic_180 on pin CLK1 on location ILOGIC_X0Y28
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y28
    Conflict detected on wire: PINFEED1(-64752,-4814)

Routing Conflict 4:
Net:ddr3_control/ck_dynamic_180 on pin CLK1 on location ILOGIC_X0Y23
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y23
    Conflict detected on wire: PINFEED1(-64742,-28254)

Routing Conflict 5:
Net:ddr3_control/ck_dynamic_180 on pin CLK1 on location ILOGIC_X0Y27
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y27
    Conflict detected on wire: PINFEED1(-64742,-7982)

Routing Conflict 6:
Net:ddr3_control/ck_dynamic_180 on pin CLK1 on location ILOGIC_X0Y29
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y29
    Conflict detected on wire: PINFEED1(-64742,-4782)

Code: [Select]
// IODDR2 primitives are needed because the 'dq' signals are of double-data-rate
// [url]https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=123[/url]

// IDDR2: Input Double Data Rate Input Register with Set, Reset and Clock Enable.
// Spartan-6
// Xilinx HDL Libraries Guide, version 14.7

IDDR2 #(
.DDR_ALIGNMENT("C0"),  // Sets output alignment to "NONE", "C0" or "C1"
.INIT_Q0(1'b0),  // Sets initial state of the Q0 output to 1'b0 or 1'b1
.INIT_Q1(1'b0),  // Sets initial state of the Q1 output to 1'b0 or 1'b1
.SRTYPE("ASYNC")  // Specifies "SYNC" or "ASYNC" set/reset
)
IDDR2_dq_r(
.Q0(dq_r_q0[dq_index]),  // 1-bit output captured with C0 clock
.Q1(dq_r_q1[dq_index]),  // 1-bit output captured with C1 clock
.C0(ck_dynamic),  // 1-bit clock input
.C1(ck_dynamic_180),  // 1-bit clock input
.CE(1'b1),  // 1-bit clock enable input
.D(dq_r[dq_index]),    // 1-bit DDR data input
.R(reset),    // 1-bit reset input
.S(1'b0)     // 1-bit set input
);
// End of IDDR2_inst instantiation


// ODDR2: Input Double Data Rate Output Register with Set, Reset and Clock Enable.
// Spartan-6
// Xilinx HDL Libraries Guide, version 14.7

ODDR2 #(
.DDR_ALIGNMENT("C0"),  // Sets output alignment to "NONE", "C0" or "C1"
.INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
.SRTYPE("ASYNC")  // Specifies "SYNC" or "ASYNC" set/reset
)
ODDR2_dq_w(
.Q(dq_w[dq_index]),  // 1-bit DDR output data
.C0(ck),  // 1-bit clock input
.C1(ck),  // 1-bit clock input
.CE(1'b1),  // 1-bit clock enable input
.D0(dq_w_d1[dq_index]),    // 1-bit DDR data input (associated with C0)
.D1(dq_w_d0[dq_index]),    // 1-bit DDR data input (associated with C1)
.R(reset),    // 1-bit reset input
.S(1'b0)     // 1-bit set input
);
// End of ODDR2_inst instantiation
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 31, 2021, 03:13:40 am
Sorry, but I am no expert on Xilinx.

However, with Altera, usually when you want to use the DDR IO buffer's local inversion/clock doubler, usually the second C1 clock input is either left open/unconnected/none and you set a specific parameter for the buffer so it knows to self-generate the 180 degree clock internally, or, you would place the same clock net source as C0 input for C1, except you add a ' ! ' in front of the net name and the compiler will know to use the buffer's internal clock inversion circuitry.

  Separate of DDR3, having a complete new separate clock path for C1 is useful for odd custom phase offset for special purposes.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 31, 2021, 04:28:17 am
Quote
Separate of DDR3, having a complete new separate clock path for C1 is useful for odd custom phase offset for special purposes.

Why new separate clock path ONLY for C1 when DDR_ALIGNMENT is set to C0 ?  Which special purposes were you referring to in the above quoted sentence ?


It seems that both IDDR2_dq_r and ODDR2_dq_w need exactly similar clock inputs for C0 and C1 pins (https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=51).  Please correct me if my understanding is wrong.

If this is the case, then how exactly would the incoming DQ bits be sampled properly at the middle of a bit (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L692-L712) ?

(https://i.imgur.com/a7Wk1zL.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on August 31, 2021, 05:05:35 am
Are you using 2 buffers?  An input and an output?
When Xilinx says C0, C1, are they talking about the clock source for the even and odd latches of the DDR pin buffers?

Maybe asking on Xilinx's forum would be best.

I'm used to dealing with a DDR input, or DDR output manually tied together, or an IO DDR which has the 2 together, except instead of a C0, C1, IE four clocks in the first case, I would have a clock in and a clock out.  Or, a clock in0, clock in1, clock out0, clock out1.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 31, 2021, 06:58:44 am
Quote
Are you using 2 buffers?  An input and an output?
When Xilinx says C0, C1, are they talking about the clock source for the even and odd latches of the DDR pin buffers?

IODDR2 primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=123) is to handle double-data-rate signal, while IOBUF primitive (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/spartan6_hdl.pdf#page=126) is to handle FPGA IO pin related signal
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 31, 2021, 02:46:50 pm
Code: [Select]
Routing Conflict 1:
Net:ddr3_control/ck_dynamic_180 on pin CLK1 on location ILOGIC_X0Y22
Net:ddr3_control/ck on pin CLK1 on location OLOGIC_X0Y22

You're doing this again. The C0 pin of IDDR must be connected to the same clock as C0 pin of the corresponding ODDR. The C1 pin of IDDR must be connected to the same clock as C1 pin of the corresponding ODDR. Same clocks must be used with IDELAY/ODELAY if you use delays. I think that internally they are fed by the same wire, so you cannot separate them.

Also, C1 must be 180 degree shift of C0. C1 can be a wire from the source, or can be "not C0" if you use BUFG.

This is not designed for DDR3 memory. This is designed for interfaces where the single clock is produced by FPGA (through BUFG) or by the other side, such as FT600 (through BUFIO). There's no prescribed way to use this for DDR3, you need to re-purpose these things on your own.

Or switch to 7-series which is much better for DDR3.
Title: Re: DDR3 initialization sequence issue
Post by: promach on August 31, 2021, 03:13:12 pm
Quote
The C0 pin of IDDR must be connected to the same clock as C0 pin of the corresponding ODDR. The C1 pin of IDDR must be connected to the same clock as C1 pin of the corresponding ODDR.

How to re-purpose IDDR and ODDR for PLL dynamic phase shift since incoming read DQS needs to be tuned to 90 degree apart from read DQ bits ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on August 31, 2021, 05:43:57 pm
Quote
The C0 pin of IDDR must be connected to the same clock as C0 pin of the corresponding ODDR. The C1 pin of IDDR must be connected to the same clock as C1 pin of the corresponding ODDR.

How to re-purpose IDDR and ODDR for PLL dynamic phase shift since incoming read DQS needs to be tuned to 90 degree apart from read DQ bits ?

You can either re-adjust PLL phase dynamically, so it's either tuned for inputs or for outputs (but not both at the same time). This is probably too long a process for quick read-to-write and write-to-read transitions.

Or you can use a clock switcher. There must be one in Spartan-6, but I'm not sure. Bbut your reads and writes still will be in different clock domains.

Or you can insert IDELAY into DQS and DQ paths and adjust them so that they're properly aligned to CK.

Or you can switch to 7-series.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 01, 2021, 12:32:29 am
Quote
You can either re-adjust PLL phase dynamically, so it's either tuned for inputs or for outputs (but not both at the same time). This is probably too long a process for quick read-to-write and write-to-read transitions.

I do not understand what you exactly meant by "re-adjust" ?


Quote
Or you can use a clock switcher. There must be one in Spartan-6, but I'm not sure. Bbut your reads and writes still will be in different clock domains.

But clock switcher still need to be connected to IDDR and ODDR, which means the issue will still be there.


Quote
Or you can insert IDELAY into DQS and DQ paths and adjust them so that they're properly aligned to CK.

I had tried using IDELAY, but it caused a lot more routing error than PLL dynamic phase shift approach
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 01, 2021, 02:17:14 am
Quote
You can either re-adjust PLL phase dynamically, so it's either tuned for inputs or for outputs (but not both at the same time). This is probably too long a process for quick read-to-write and write-to-read transitions.

I do not understand what you exactly meant by "re-adjust" ?

I mean changing phase.

Quote
Or you can use a clock switcher. There must be one in Spartan-6, but I'm not sure. Bbut your reads and writes still will be in different clock domains.

But clock switcher still need to be connected to IDDR and ODDR, which means the issue will still be there.

The output of clock switcher would feed BUFG which then feeds IDDR and ODDR. This is not a very good solution though.

Quote
Or you can insert IDELAY into DQS and DQ paths and adjust them so that they're properly aligned to CK.

I had tried using IDELAY, but it caused a lot more routing error than PLL dynamic phase shift approach

You probably connected something wrong.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 01, 2021, 07:20:02 am
Quote
I mean changing phase.

How would this solution exactly help to eliminate the IODDR same clock domain restriction ?


Quote
The output of clock switcher would feed BUFG which then feeds IDDR and ODDR. This is not a very good solution though.

This is a simple, straightforward solution (need to verify first), but I am not sure why you said that it is not a very good solution ?


Quote
You probably connected something wrong.

Not really. I had talked to litedram author and he is using PLL dynamic phase shift for spartan-6.
Not only that, IODELAY primitive on spartan-6 has some internal hardware design issue that would cause data bit  loss.  So, not recommended to use IODELAY in this case.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 01, 2021, 01:03:43 pm
Quote
I mean changing phase.

How would this solution exactly help to eliminate the IODDR same clock domain restriction ?

read - change phase for wrtie - write - change phase for read - read etc.

This will make things very slow.

Quote
The output of clock switcher would feed BUFG which then feeds IDDR and ODDR. This is not a very good solution though.

This is a simple, straightforward solution (need to verify first), but I am not sure why you said that it is not a very good solution ?

I think it may be difficult to deal with passing data between the switching clock domain and others. May be not. You need to evaluate this.

Not really. I had talked to litedram author and he is using PLL dynamic phase shift for spartan-6.

For DDR3? You can ask him how he dealt with the IDDR/ODDR clocking restrictions. May be there are some tricks we don't know about. Then evaluate if you can do the same.

Not only that, IODELAY primitive on spartan-6 has some internal hardware design issue that would cause data bit  loss.  So, not recommended to use IODELAY in this case.

You mean this: https://www.xilinx.com/support/answers/41083.html (https://www.xilinx.com/support/answers/41083.html)

The problem is only at higher data rates. Look up your Spartan in their table. May not apply to you.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 01, 2021, 01:43:38 pm
Quote
read - change phase for wrtie - write - change phase for read - read etc.

This will make things very slow.

Wait, how is changing phase able to get around same clock (similar frequency and phase shift) restriction of IODDR ?


Quote
I think it may be difficult to deal with passing data between the switching clock domain and others. May be not. You need to evaluate this.

I am confuse whether to use BUFGMUX (https://www.xilinx.com/support/documentation/user_guides/ug382.pdf#page=38) primitive or BUFGMUX_1 (https://www.xilinx.com/support/documentation/user_guides/ug382.pdf#page=40) primitive.


Quote
For DDR3? You can ask him how he dealt with the IDDR/ODDR clocking restrictions. May be there are some tricks we don't know about. Then evaluate if you can do the same.

Yes, for DDR3 on spartan-6 (https://github.com/litex-hub/litex-boards/blob/master/litex_boards/targets/saanlima_pipistrello.py#L69-L104).


Quote
You mean this: https://www.xilinx.com/support/answers/41083.html (https://www.xilinx.com/support/answers/41083.html)

The problem is only at higher data rates. Look up your Spartan in their table. May not apply to you.

Does that value of the maximum data rate Mb/s apply to x16 (16 bits) ?

By the way, litedram author mentioned that fine delay was done using fixed phase shift of the clocks with the PLL and coarse delays with the bitslip.


Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 01, 2021, 01:53:59 pm
Quote
read - change phase for wrtie - write - change phase for read - read etc.

This will make things very slow.

Wait, how is changing phase able to get around same clock (similar frequency and phase shift) restriction of IODDR?

Same as clock switching, except instead of switching you change phase of a single clock.

Does that value of the maximum data rate Mb/s apply to x16 (16 bits) ?

These are single pin rates - clock multiplied by x2.

By the way, litedram author mentioned that fine delay was done using fixed phase shift of the clocks with the PLL and coarse delays with the bitslip.

This means that he somehow bypassed the single clock restriction for IDDR/ODDR, ask him how.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 01, 2021, 04:59:21 pm
I am still waiting for litedram's author reply.

I have linked to the wrong litedram file in previous post.

See https://github.com/enjoy-digital/litedram/blob/master/litedram/phy/s6ddrphy.py instead for DDR3 PHY for litedram on spartan-6
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 01, 2021, 06:04:19 pm
I am still waiting for litedram's author reply.

I have linked to the wrong litedram file in previous post.

See https://github.com/enjoy-digital/litedram/blob/master/litedram/phy/s6ddrphy.py instead for DDR3 PHY for litedram on spartan-6

It's python :) I guess they use it to generate HDL

Looks like they use different clocks - sdram_full_rd_clk and sdram_full_wr_clk
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 01, 2021, 06:09:28 pm
Also their SERDES blocks are in "SDR" mode - go figure.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 02, 2021, 12:42:42 pm
I am also using serdes in SDR mode (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1237-L1257) for DDR3 controller. There is nothing wrong with this.

Please correct me if wrong.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 02, 2021, 04:41:04 pm
I am also using serdes in SDR mode (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1237-L1257) for DDR3 controller. There is nothing wrong with this.

Please correct me if wrong.

I thought you use IDDR/ODDR in place of ISERDES/OSERDES.

As far as can understand thir python, they use OSERDESE2 in "SDR" mode which produces SDR signal on the output pin. You need DDR for DDR3. Or you will write every second bit only - the odd bits, which has to be set on the falling edge will always be the same as the corresponding even bit. If you do this, this will use only half of the memory.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 03, 2021, 02:31:01 am
For DDR3 controller, I am using BOTH Xilinx IODDR (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1634-L1679) , and home-made IOSERDES (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1237-L1257).

Note: It is your suggestion to use TWO home-made serializers and deserializers on SDR mode to work for DDR mode.



Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 03, 2021, 02:58:21 am
For DDR3 controller, I am using BOTH Xilinx IODDR (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1634-L1679) , and home-made IOSERDES (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1237-L1257).

I misunderstood what you said. To me, all-capital letters refer to built-in blocks. Like you use home-made serdes (serializers), but they use Xilinx's ISERDES/OSERDES.

The difference is that yours are in fabric and you can connect them as you wish, so you connect them to ODDR/IDDR do deal with DDR signals.

But theirs are built-in and can only be connected directly to IOBUF and hence must deal with DDR signals. However, they seem to use "SDR" mode, which will miss all the bits on falling edges. I have no idea how this can possibly work. You can ask them this.

<edit>Actually, SDR may, sort of, work, but it will access only half of the memory (miss every second bit), and will achieve only half of the full bandwidth. Is that what they're doing?
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 03, 2021, 03:06:44 am
Is it possible at all to re-code my own homemade IODDR in order to get around the same clock restriction, hence the routing issue ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 03, 2021, 03:44:55 am
I have solved the routing conflict for IODDR2 (and solved the setup timing issue for dq_r_q0 signal) (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3652996/#msg3652996) by using my own homemade IDDR (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1663-L1676),  I will replace all of the existing Xilinx IODDR2 primitive with my own homemade version later.

Now I am solving the last few setup timing violation

Any idea about the following highlighted path on ODDR2_ldqs_iobuf_en (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1522-L1541) ?

Code: [Select]
// see [url]https://www.xilinx.com/support/documentation/user_guides/ug381.pdf#page=61[/url]
// 'data_read_is_ongoing' signal is not of double-data-rate signals,
// but it is connected to T port of IOBUF where its I port is fed in with double-data-rate DQS signals,
// thus the purpose of having the following ODDR2 primitives

ODDR2 #(
.DDR_ALIGNMENT("C0"),  // Sets output alignment to "NONE", "C0" or "C1"
.INIT(1'b0),  // Sets initial state of the Q output to 1'b0 or 1'b1
.SRTYPE("ASYNC")  // Specifies "SYNC" or "ASYNC" set/reset
)
ODDR2_ldqs_iobuf_en(
.Q(ldqs_iobuf_enable),  // 1-bit DDR output data
.C0(ck_90),  // 1-bit clock input
.C1(ck_90),  // 1-bit clock input
.CE(1'b1),  // 1-bit clock enable input
.D0(data_read_is_ongoing),    // 1-bit DDR data input (associated with C0)
.D1(data_read_is_ongoing),    // 1-bit DDR data input (associated with C1)
.R(1'b0),    // 1-bit reset input
.S(1'b0)     // 1-bit set input
);

(https://i.imgur.com/gHqfx1I.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 03, 2021, 01:04:55 pm
I have solved the routing conflict for IODDR2 (and solved the setup timing issue for dq_r_q0 signal) (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3652996/#msg3652996) by using my own homemade IDDR (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1663-L1676),  I will replace all of the existing Xilinx IODDR2 primitive with my own homemade version later.

Yes, custom iddr may work. The only problem is that the receiving flops won't be in IOB, so there will be some extra jitter.

Now I am solving the last few setup timing violation

You cannot pass the signal from ck to ck_90 - that's too fast. You need to solve this problem somehow. For example, you can pass from ck to ck_270 rather easily (e.g. synchronizing the ODDR with C1). Or, you can use an async FIFO to pass all your data from ck to ck_90. Or, you can ask @BrianHG how he did it.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 03, 2021, 03:29:48 pm
Quote
You cannot pass the signal from ck to ck_90 - that's too fast. You need to solve this problem somehow. For example, you can pass from ck to ck_270 rather easily (e.g. synchronizing the ODDR with C1). Or, you can use an async FIFO to pass all your data from ck to ck_90. Or, you can ask @BrianHG how he did it.

Would Multi-cycle path (MCP) formulation with feedback acknowledge (http://www.verilogpro.com/clock-domain-crossing-design-part-3/) help ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 03, 2021, 08:07:50 pm
I have solved the routing conflict for IODDR2 (and solved the setup timing issue for dq_r_q0 signal) (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3652996/#msg3652996) by using my own homemade IDDR (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1663-L1676),  I will replace all of the existing Xilinx IODDR2 primitive with my own homemade version later.

Yes, custom iddr may work. The only problem is that the receiving flops won't be in IOB, so there will be some extra jitter.

Now I am solving the last few setup timing violation

You cannot pass the signal from ck to ck_90 - that's too fast. You need to solve this problem somehow. For example, you can pass from ck to ck_270 rather easily (e.g. synchronizing the ODDR with C1). Or, you can use an async FIFO to pass all your data from ck to ck_90. Or, you can ask @BrianHG how he did it.
I used a huge synchronization chain forced into logic cells away from auto-generated ram blocks allowing the compiler to re-time the logic cells along that chain to gracefully convert the data from from the ck_0 domain into the ck_90 domain, then once the data is in the ck_90 domain, it then feeds the DDR primitive's data input to drive the pin's output.

If I did not do so, I end up with either huge negative slack feeding the DDR buffer, or, a horrible ck_0 FMAX limit due to hold restrictions depending on the path the compiler chose to optimize my design.

My sync chain also holds in parallel the DM and the OE for the DQ.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 04, 2021, 01:43:04 am
Quote
I used a huge synchronization chain forced into logic cells away from auto-generated ram blocks allowing the compiler to re-time the logic cells along that chain to gracefully convert the data from from the ck_0 domain into the ck_90 domain, then once the data is in the ck_90 domain, it then feeds the DDR primitive's data input to drive the pin's output.

What do you exactly mean by huge synchronization chain ?


Quote
My sync chain also holds in parallel the DM and the OE for the DQ.

How do you make sure that the vendor tool will actually implement (and route) the sync chain to be in parallel with control signals for the DQ ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 04, 2021, 03:43:50 am
I mean that on the core ck_0, I have the:

{wd_h, wd_l, wd_oe, wm_h, wm_l} ready every clock.

on every ck_90, I have:

Code: [Select]
before_DDR_BUFFERS [4] <= {wd_h, wd_l, wd_oe, wm_h, wm_l};
before_DDR_BUFFERS [3] <= before_DDR_BUFFERS [4] ;
before_DDR_BUFFERS [2] <= before_DDR_BUFFERS [3] ;
before_DDR_BUFFERS [1] <= before_DDR_BUFFERS [2] ;
before_DDR_BUFFERS [0] <= before_DDR_BUFFERS [1] ;

DDR_BUFFERS_OUT <= before_DDR_BUFFERS [0] ;
This adds a serial chain of 6 clocks inside the ck_90 domain allowing the fitter to skew the clock timing of each step in that chain to allow error free data to shift it's ck_0 phase to ck_90 phase before it reaches the IO buffer's input DFF.  With a chain this size, this also means the source data {wd_h, wd_l, wd_oe, wm_h, wm_l} needs to be ready 6 clocks in advance.

Obviously, DDR_BUFFERS_OUT would be the DQ IODDR receiving the {wd_h, wd_l} as data input and the {wd_oe} drives the OE for those DQ buffers while the DM DDROUT receives the {wm_h, wm_l}.

See my source code 'BrianHG_DDR3_IO_PORT_ALTERA.sv' V1.00 beginning at line 585 until the end of the code.  You will also see I have a parameter option WDQ_CLK_270 to use a 270 degree write clock in place of the 90 degree which needs to swap the _h & _l, plus advance the _l by 1 write clock.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 04, 2021, 03:50:16 am
Quartus is smart and tries to move my above logic chain into a ram block.  However, ram blocks cannot shift time at every step along the way, only at the input or the output.  Also, ram blocks have a hard FMAX of 250MHz on the cheap small Cyclones/MAX 10s I made this ram controller to work on.  So I have used a keyword in declaring the logic cells ' before_DDR_BUFFERS [ x ] ' to force it to be placed in normal logic cells instead of inferring it into ram cells.  This might not be an issue with Xilinx and the pipe size might not need to be so huge.

Title: Re: DDR3 initialization sequence issue
Post by: promach on September 04, 2021, 08:08:42 am
Reading https://www.verilogpro.com/clock-domain-crossing-part-1/ (https://www.verilogpro.com/clock-domain-crossing-part-1/) makes me ask this question:

Is conventional synchronizers circuit really enough ?

(https://i.imgur.com/uDsAQ9w.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 04, 2021, 08:17:11 am
Hun?  The 2 clock domains ck_0 and ck_90/270 are the same frequency, just a different phase.  Not 1.5x, not 0.75x.  The data moves through unaltered.

And besides, I have no trouble going from 2:1 and 1:2 clock domains so long as those clocks are generated by the same PLL so long as both clocks have the same phase.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 04, 2021, 08:53:19 am
What is the technical rationale for the need of SIX FF synchronizers ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 04, 2021, 03:03:53 pm
For 90 degrees: Only needed 3 for a 300MHz DDR3 ram controller on a -6 speed grade Cyclone.  Needed 4 for a slower -8 speed grade Cyclone.  5 was required to get a -6 speed grade to run at 400MHz, but, sometimes when I loaded up other code a lot of logic in the FPGA, some nets wouldn't make the cut.  I couldn't go any further since I exceeded tWL, so, I also had to add a pipe to my entire command, bank and address outputs to keep everything in sync giving me that additional extra 1 I wanted to achieve a 400MHz timing with everything in the black.

For 270 degrees: Subtract 1 from everything above.  Going up to 6 allowed 500MHz to run fine for the first time.  However, at 450MHz, tWL is also a clock longer, so, I did not need to add any other coding changes as my code allows dynamically configuring this pipe size with the parameter 'WDQ_SYNC_CHAIN'.  In fact, my IO port section allows dynamically configuring the register depth of read and write and command pipes all individually with parameters.

Different FPGA, different manufacturers & compilers will require more or less steps to shift the clock with different frequencies.  You just need to test and play.  There is an advantage here to using FPGAs which have delays on the DDR IO buffers, but they do have their drawbacks of compatibility across different types.


IE: I just increased the number until FMAX was steady and fast every compile.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 04, 2021, 04:07:13 pm
I am curious as in how adding more level of FF synchronizers to DQS-related signal would actually lead to tWL (https://www.systemverilog.io/understanding-ddr4-timing-parameters#write) violation ?

(https://i.imgur.com/z0tDRuo.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 04, 2021, 06:08:26 pm
Not DQS, it's the DQ path.  DQS is only on the ck_0 clock exclusively.

Note that everything else in my DDR controller path has been adapted to have the same delay.  (well, with the recognition of the CWL delay of course)  Otherwise the ram wouldn't work.  The data would be in the wrong place compared to where the command was sent and where the ram was expecting it.

DQS only has an output enable, no data.  It is generating a fixed clock.

I generate the DQS' OE at lines 477-482 in my code and it runs on ck_0.

And why are you showing me tDQSS(min).  That is the worst possible alignment.  You want the DQSnom (nominal), Otherwise, some ram chips or modules along the path will be outside the DDR_CK -> DQS alignment.  The 'min' gives you 0 room for error and routing and FPGA IO jitter and pin-pin tolerances.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 05, 2021, 02:13:24 am
Quote
sometimes when I loaded up other code a lot of logic in the FPGA, some nets wouldn't make the cut.  I couldn't go any further since I exceeded tWL, so, I also had to add a pipe to my entire command, bank and address outputs to keep everything in sync giving me that additional extra 1 I wanted to achieve a 400MHz timing with everything in the black.

I guess I need to simulate my code with Micron simulation model inside Modelsim for all the setup timing coding patches I have applied.
This will involved some emulation of Xilinx primitive using some home-made version just to prove that all the other DDR3 timings are not affected by those coding patches.

Note: Micron simulation model does not work within Xilinx ISIM simulator.

ck_270 (3T/4) has more data capture cycle margin (https://www.eetimes.com/understanding-clock-domain-crossing-issues/) compared to ck_90 (T/4) , but what I do not understand is how adding more level of FF synchronizers (in your case, SIX) is able to increase the data capture cycle margin ?

(https://i.imgur.com/GMXhXo7.png)

Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 05, 2021, 02:56:32 am

ck_270 (3T/4) has more data capture cycle margin (https://www.eetimes.com/understanding-clock-domain-crossing-issues/) compared to ck_90 (T/4) , but what I do not understand is how adding more level of FF synchronizers (in your case, SIX) is able to increase the data capture cycle margin ?

The fitter can decisively place the logic latches at any point on the fabric to optimize the maximum required hold timing.  It can also delay the latching clock time or delay the output at each logic DFF step to meet those required setup and hold timings.  1 -DFF is enough if the logic is fast enough to meet the timing 3T/4.  In the case of altera's maximum 400-500MHz core DFF-to-adjacent-DFF, no logic inbetween, a few steps are needed to ease the transfer and get it ready synced to the DDRIOBUF input clock.  If your FPGA can internally run DFF-DFF at 800MHz, to do that 3T/4 at 300MHz, you will only need 1-DFF, or none at all depending on fabric routing.  For a 300MHz controller, I bet you need at least 1 even with Xilinx where as the minimum with Altera Cyclones, I could get it to work with was originally 2.  But if I fill my core fabric with a ton of other functional logic, I would be either forced to buy the full Quartus and manually place the location of my ram controller on the fabric, or add that large sync chain and use the free version of Quartus and allow it to randomly build the fabric anyway it likes, and then the timing to the IO buffers will still make it no matter the fabric routing.  The later means anyone can build my design and still achieve FMAX without fancy manual core placement.

Also, what would happen if you have 4x16bit DDR3 ram chips with a 484pin FPGA?  2 on the left and 2 on the right?  How will the IO timing to the core which now needs to siphon data from IOs on opposite sides of the FPGA?  I know from experience that my design has the necessary length of pipe to coalesce the data coming from and going to opposite sides of the FPGA will achieve the desired FMAX.

Remember, just getting a DDR3 controller to work with 1 ram chip all wired to 1 corner of the FPGA, no multiport manager, just wired to a JTAG test port or my RS232 debugger would require almost no such extravagant effort at all.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 08, 2021, 12:53:39 pm
In the following simulation waveform, using 50MHz crystal clock (clk signal) (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L341-L344) is not suitable since 350MHz PLL clock (ck signal) will feed from the OSERDES data at 7 times the speed of the 50MHz clock.

Note: I have two separate home-made OSERDES (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1185-L1264) as suggested by @NorthGuy

So, this means the data output to the DDR3 RAM is not following the proper ordering.  Each set of 8-pieces of data ({5, 6, 7, 8, 9, 10, 11, 12}) should only be sent to DDR3 RAM ONCE
Note: 8 pieces of data is due to SERIALIZATION_FACTOR = 8

I tried the following code modification but I am stucked with CDC from fast clock (350MHz ck_dynamic signal (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L693-L714)) to slow clock (87.5MHz clk_serdes signal)
Note: 350MHz divided by (SERDES_RATIO >> 1) equals 87.5MHz

Conventional FF synchronizer only works for CDC from slow clock to fast clock (https://electronics.stackexchange.com/questions/585787/why-1-5x-ratio-limitation-for-synchronizing-slow-signals-into-fast-clock-domain)

(https://i.imgur.com/wHOCIRg.png)

Code: [Select]
[phung@archlinux DDR]$ git status --short
 M ddr3_memory_controller.v
 M test_ddr3_memory_controller.v
[phung@archlinux DDR]$ git diff ddr3_memory_controller.v
diff --git a/ddr3_memory_controller.v b/ddr3_memory_controller.v
index 55f2823..78ec1e2 100644
--- a/ddr3_memory_controller.v
+++ b/ddr3_memory_controller.v
@@ -181,6 +181,8 @@ module ddr3_memory_controller
                output reg data_read_is_ongoing,
        `endif
       
+       output clk_serdes,  // 87.5MHz
+       
        output reg ck_en, // CKE
        output reg cs_n, // chip select signal
        output reg odt, // on-die termination
@@ -632,6 +634,7 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
 `else
 
        wire clk_pll;
+       wire clk_serdes;
        wire ck, ck_out;
        wire ck_90;
        wire ck_180, ck_180_out;
@@ -647,6 +650,11 @@ reg MPR_ENABLE, MPR_Read_had_finished;  // for use within MR3 finite state machi
                       
                        // Clock out ports
                        .clk_pll(clk_pll),  // OUT 50MHz, 0 phase shift, for solving CDC issues
+                       
+                       // SERDES_RATIO = 8, but 2 separate serdes are used due to double-data-rate restriction
+                       // So, 350MHz divided by (SERDES_RATIO >> 1) equals 87.5MHz
+                       .clk_serdes(clk_serdes),  // OUT 87.5MHz, 0 phase shift, for SERDES use
+                       
                        .ck(ck),  // OUT 350MHz, 0 phase shift
                        .ck_90(ck_90),  // OUT 350MHz, 90 phase shift, for dq phase shifting purpose
                        .ck_180(ck_180),  // OUT 350MHz, 180 phase shift
[phung@archlinux DDR]$
[phung@archlinux DDR]$ git diff test_ddr3_memory_controller.v
diff --git a/test_ddr3_memory_controller.v b/test_ddr3_memory_controller.v
index 4c8acbf..e87225f 100644
--- a/test_ddr3_memory_controller.v
+++ b/test_ddr3_memory_controller.v
@@ -305,6 +305,8 @@ wire ldqs_iobuf_enable;
 wire data_read_is_ongoing;
 `endif
 
+wire clk_serdes;  // 87.5MHz
+
 reg [BANK_ADDRESS_BITWIDTH+ADDRESS_BITWIDTH-1:0] i_user_data_address;  // the DDR memory address for which the user wants to write/read the data
 
 `ifdef HIGH_SPEED
@@ -338,11 +340,7 @@ reg done_writing, done_reading;
                        data_write_index = data_write_index + 1)
                begin: data_write_loop
        `endif
-               `ifdef TESTBENCH                       
-                       always @(posedge clk_sim)
-               `else
-                       always @(posedge clk)
-               `endif
+                       always @(posedge clk_serdes)
                        begin
                                if(reset)
                                begin
@@ -387,7 +385,7 @@ reg done_writing, done_reading;
                                                `endif
                                        `endif
                                       
-                                       test_data <= test_data + 1;
+                                       test_data <= test_data + SERDES_RATIO;
                                        write_enable <= (test_data < (STARTING_VALUE_OF_TEST_DATA+NUM_OF_TEST_DATA-1));  // writes up to 'NUM_OF_TEST_DATA' pieces of data
                                        read_enable <= (test_data >= (STARTING_VALUE_OF_TEST_DATA+NUM_OF_TEST_DATA-1));  // starts the readback operation
                                        done_writing <= (test_data >= (STARTING_VALUE_OF_TEST_DATA+NUM_OF_TEST_DATA-1));  // stops writing since readback operation starts
@@ -575,6 +573,8 @@ ddr3_control
                .data_read_is_ongoing(data_read_is_ongoing),
        `endif
       
+       .clk_serdes(clk_serdes),  // 87.5MHz
+       
        .ck_en(ck_en), // CKE
        .cs_n(cs_n), // chip select signal
        .odt(odt), // on-die termination
[phung@archlinux DDR]$


(https://i.imgur.com/8xaSvqG.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 08, 2021, 01:42:55 pm
In the following simulation waveform, using 50MHz crystal clock (clk signal) (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L341-L344) is not suitable since 350MHz PLL clock (ck signal) will feed from the OSERDES data at 7 times the speed of the 50MHz clock.

1:7 is very inconvenient. Create clocks which are easy to work with. Start from the max speed you can get from BUFG (250 to 400 MHz depending on your speed grade). For example -2 grade allows 375 MHz. This is the frequency you can feed to ODDR. Divide it by 4. It'll be 93.75 MHz. Create a 93.75 MHz clock and use it for your main clock throughout FPGA. This way you'll need two 4:1 serializers and ODDR. Or use 187.5 MHz. This way you'll need two 2:1 serializers and ODDR.

If all your clocks come from the same PLL, they're all automatically aligned.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 08, 2021, 05:19:17 pm
but this method still could not avoid the CDC from from fast clock (350MHz ck_dynamic signal (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L693-L714)) to slow clock (87.5MHz clk_serdes signal)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 08, 2021, 06:05:13 pm
but this method still could not avoid the CDC from from fast clock (350MHz ck_dynamic signal (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L693-L714)) to slow clock (87.5MHz clk_serdes signal)

When clocks are produced by the same PLL, they are aligned (that is the edge of the slower clocks nominally coinsides with an edge of the faster clock), so you don't need any special CDC. Your serializers/deserializer will be clocked with the slower clock on the fabric side and with the faster clock on the IO side. That's the only place where these clocks need to meet.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 08, 2021, 06:10:14 pm
I believe there is some underlying hardware limitation that prohibits PLL (with dynamic phase shift option enabled) from generating frequencies larger than 374.5318MHz as shown in screenshot below.

(https://i.imgur.com/owrgGtw.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 08, 2021, 06:30:48 pm
I believe there is some underlying hardware limitation that prohibits PLL (with dynamic phase shift option enabled) from generating frequencies larger than 374.5318MHz as shown in screenshot below.

Look at the datasheet (ds162). BUFG can only be 375 MHz for -2 grade despite the PLL itself may go faster.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 08, 2021, 06:35:22 pm
I am using xc6slx16-3ftg256

PLL (without dynamic phase shift option enabled) could generate output frequencies above 400MHz.

I do not understand what you meant by Or use 187.5 MHz. This way you'll need two 2:1 serializers and ODDR.
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 08, 2021, 07:06:53 pm
I am using xc6slx16-3ftg256

PLL (without dynamic phase shift option enabled) could generate output frequencies above 400MHz.

Then you can go to 400 MHz. Did you tell ISE that you have -3 grade.

I do not understand what you meant by Or use 187.5 MHz. This way you'll need two 2:1 serializers and ODDR.

If you use 400 MHz fast clock then using 2:1 serializer will require 200 MHz clock on the other side.
If you use 4:1 serializer, it will require 100 MHz slow clock.

Either of these is acceptable for you.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 08, 2021, 07:16:32 pm
wait, it is the turning on of dynamic phase shift option of the PLL that actually caused the maximum frequency limit to drop from 400MHz to 374.5MHz.

And since dynamic phase shift option is needed, so I could not have 8:1 (no 400MHz for me),  this is why I am only having 7:1 (350MHz)

again, I am confused as in why use 4:1 serializer, it will require 100 MHz slow clock ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 08, 2021, 08:27:46 pm
wait, it is the turning on of dynamic phase shift option of the PLL that actually caused the maximum frequency limit to drop from 400MHz to 374.5MHz.

Then you can only do 375 MHz.

And since dynamic phase shift option is needed, so I could not have 8:1 (no 400MHz for me),  this is why I am only having 7:1 (350MHz)

Any frequency can be divided by 8 (or by 4, or by any other number). 375/8 = 46.875 MHz, or 375/4 = 93.75 MHz, or 375/2 = 187.5 MHz. Take your pick and create the corresponding clock. Abandon your oscillator clock for anything except for PLL feed.

again, I am confused as in why use 4:1 serializer, it will require 100 MHz slow clock ?

The X:1 serializer will require 375/X frequency. On the IO side you have 375 Mb/s. On the fabric side you have X wires - the data gets spread over these wires, so each wire carriers 375/X Mb/s.

When you connect two of such serializers to an ODDR or IDDR, you get double data rate outside of FPGA - 375*2 = 750 Mb/s. Between IDDR/ODDR and serializers, the signal is carried by two wires (D0 and D1), each carrying 375 Mb/s. On the fabric side you get 2*X wires (each of the two serializers has X wires). Each wire carries 750/(2*X) = 375/X Mb/s = 375/X MHz.

So, if you want two 4:1 serializers (X = 4), your slow clock must be 375/X = 375/4 = 93.75 MHz

If you want two 2:1 serializers (X = 2), your slow clock must be 375/X = 375/2 = 187.5 MHz

If you abandon serializers, and only use ODDR/IDDR (X = 1), your slow clock must be 375/X = 375/1 = 375 MHz, which is the same as ODDR/IDDR clock.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 09, 2021, 02:27:48 am
The PLL maximum output frequency limit is 374.5318MHz , not 375MHz

So, clk_serdes will have frequency of 350MHz / 4 = 87.5MHz for the case of 4:1 serializers

In this case, CDC from fast clock (350MHz ck_dynamic) to slow clock (87.5MHz clk_serdes) still could not be avoided.

If I eliminate serializer and uses only 350MHz ck, then I will have pack error as follows:

Code: [Select]
ERROR:Pack:1107 - Pack was unable to combine the symbols listed below into a
   single IOB component because the site type selected is not compatible.

   Further explanation:
   The pad symbol ck is connected to a symbol that is outside of I/O comp. There
   is no routing resource between them.

   Symbols involved:
    BUF symbol "ddr3_control/OBUF_ck" (Output Signal = ck)
    PAD symbol "ck" (Pad Signal = ck)

(https://i.imgur.com/8xaSvqG.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 09, 2021, 01:37:06 pm
In this case, CDC from fast clock (350MHz ck_dynamic) to slow clock (87.5MHz clk_serdes) still could not be avoided.

Since you decided to shift clock phase, you probably need a shifted slow clock for your serializers as well. Otherwise, serializers won't work. Then, you need to find a way to pass the data from the shifted slow clock to your main clock.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 09, 2021, 01:38:46 pm
What do you exactly mean by shift clock phase ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 09, 2021, 02:18:10 pm
What do you exactly mean by shift clock phase ?

There are two ways to align data with the clock. Either you shift data (with IODELAY), or you shift the clock (with PLL). You decided to do the second. Now your read clock is shifted relative to the write clock.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 09, 2021, 05:15:59 pm
In this case, CDC from fast clock (350MHz ck_dynamic) to slow clock (87.5MHz clk_serdes) still could not be avoided.

Since you decided to shift clock phase, you probably need a shifted slow clock for your serializers as well. Otherwise, serializers won't work. Then, you need to find a way to pass the data from the shifted slow clock to your main clock.
My serializers operate on the full 400MHz read clock itself, so I don't need to care about this problem as it doesn't exist in my design.  My final 128bit data chunk going to the slower 100Mhz clock has a special function to adapt for any phase shift on the 400MHz read clock, though I do first transfer that 128bit chunk to the 400MHz ck_0 first to aid in metastability with the same phase corrective tech, then pass that ck_0 128bit chunk to my slower 200MHz/100Mhz core clock.  To save on logic cells, if your dual-port, dual clock ram blocks can operate at 400MHz, you may use it here instead.

My write data serializers also operate on the full 400MHz ck_0 clock, and only the smaller 16 bit phase shifted pipe runs on the ck_90 clock.  Once again, if your dual-port, dual-clock ram ports can operate at 400MHz, it could be used in place of the logic based serial shifters and serializers.

Title: Re: DDR3 initialization sequence issue
Post by: promach on September 10, 2021, 04:32:41 am
Quote
My final 128bit data chunk going to the slower 100Mhz clock has a special function to adapt for any phase shift on the 400MHz read clock

@BrianHG In your case, why would a slower 100MHz clock be needed to help with metastability arising from the use of PLL dynamic phase shift on read clock ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 10, 2021, 04:54:37 am
Quote
My final 128bit data chunk going to the slower 100Mhz clock has a special function to adapt for any phase shift on the 400MHz read clock

@BrianHG In your case, why would a slower 100MHz clock be needed to help with metastability arising from the use of PLL dynamic phase shift on read clock ?

My 100/200MHz clock is like your 50MHz clock since my controller operates at 200MHz.

The enhanced metastability comes from first transferring the randomly phase shifted read clock aligned 128 bit chunk to the ck_0 clock, still 400MHz, (note that my data ready toggle is delayed by 1 clock going to the ck_0 to ensure that the entire 128bit bus have all first reached their destination regardless of FPGA route timing), then, the data now on that guaranteed ck_0 400MHz 128 bit latch can be sent to any other PLL clock frequency as long as that destination clock also has a fixed 0 degree phase without timing violations or errors incurred due to a random phase clock way at the beginning.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 10, 2021, 04:06:44 pm
Quote
then, the data now on that guaranteed ck_0 400MHz 128 bit latch can be sent to any other PLL clock frequency as long as that destination clock also has a fixed 0 degree phase without timing violations or errors incurred due to a random phase clock way at the beginning.

In my case, I could not generate "other PLL clock frequency" that has the same fixed 0 degree phase due to some internal hardware limitation of the Xilinx PLL IP core (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3675805/#msg3675805).

Could you advise ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 10, 2021, 04:56:48 pm
Quote
then, the data now on that guaranteed ck_0 400MHz 128 bit latch can be sent to any other PLL clock frequency as long as that destination clock also has a fixed 0 degree phase without timing violations or errors incurred due to a random phase clock way at the beginning.

In my case, I could not generate "other PLL clock frequency" that has the same fixed 0 degree phase due to some internal hardware limitation of the Xilinx PLL IP core (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3675805/#msg3675805).

Could you advise ?

I am not aware of such limitations. The PLL primitive has 6 clock outputs, each of which lets you specify the divider (CLKOUTx_DIVIDE) and the phase (CLKCOUTx_PHASE). All you need to do is to find appropriate VCO frequency (between 400 and 1080 MHz for your speed grade) and specify the dividers for your clock. If a divider for your fast clock is X, then the divider for your slow clock will be (2*X) or (4*X), depending on your needs.

Say, if you want 350 MHz fast clock, create 700 MHz VCO (your 50 MHz crystal multipled by 14), then divide by 2 to get 350 MHz clock, by 4 to get 175 MHz clock, or by 8 to get 87.5 MHz clock.

Title: Re: DDR3 initialization sequence issue
Post by: promach on September 10, 2021, 05:02:07 pm
Quote
Say, if you want 350 MHz fast clock, create 700 MHz VCO (your 50 MHz crystal multipled by 14), then divide by 2 to get 350 MHz clock, by 4 to get 175 MHz clock, or by 8 to get 87.5 MHz clock.

why do you even need to generate 700MHz ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 10, 2021, 05:15:41 pm
Quote
Say, if you want 350 MHz fast clock, create 700 MHz VCO (your 50 MHz crystal multipled by 14), then divide by 2 to get 350 MHz clock, by 4 to get 175 MHz clock, or by 8 to get 87.5 MHz clock.

why do you even need to generate 700MHz ?

VCO must be between 400 and 1080 MHz for your speed grade. If you want to get 350 MHz from the PLL, you need to use either 700 or 1050 MHz VCO.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 10, 2021, 05:17:25 pm
Which exact Xilinx document did you see such VCO specification ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 10, 2021, 05:42:51 pm
Which exact Xilinx document did you see such VCO specification ?

The datasheet - ds162.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 11, 2021, 02:23:47 am
@BrianHG

For clocks with the different frequency but zero phase difference (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3678799/#msg3678799), I suppose there will not be any issue with CDC from fast clock domain to slow clock domain (http://www.verilogpro.com/clock-domain-crossing-part-1/) ?


(https://i.imgur.com/qNdx3Bt.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 11, 2021, 03:17:14 am
Use my toggle logic.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 11, 2021, 03:17:57 am
What do you exactly mean by toggle logic ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 11, 2021, 03:25:24 am
On the read clock, when each BL8 128bit has been latched, toggle a DFF.
On the read clock, delay that toggle output DFF by 1 clock.
Remember, the read clock is 400MHz, so during an uninterrupted continuous burst, that toggle DFF will invert every 4 clocks.

On your 400MHz ck_0 clock, latch the 128bit when that 1-delayed-toggle has flipped.
Also, on the ck_0 clock, DFF latch that 1-delayed-toggle as well.

Now, feed all that ck_0 bulk into your slower clock domain and by the slower clock domain's clock, capture the data when is sees the ck_0 toggle output has flipped.
Also generate a BL8_read_data_ready when that capture has been done.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 11, 2021, 03:32:39 am
Read my 'BrianHG_DDR3_IO_PORT_ALTERA.sv' v1.00, lines 535-575.

Please read around my added manual generated 2+1 to 1 fan-out buffers.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 11, 2021, 03:36:24 am
I feel that 'BrianHG_DDR3_IO_PORT_ALTERA.sv' v1.00, lines 535-575 (https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/main/BrianHG_DDR3/BrianHG_DDR3_IO_PORT_ALTERA.sv#L535-L575) is conceptually similar to Multi-cycle path (MCP) formulation with feedback (http://www.verilogpro.com/clock-domain-crossing-design-part-3/).

Please correct me if wrong.

What do you exactly mean by added manual generated 2+1 to 1 fan-out buffers. ?

By the way, why are you not using asynchronous FIFO instead ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 11, 2021, 03:47:51 am
My solution us unidirectional.  All I have done is make the 'toggle' a next data valid signal which is slow enough so that it is impossible to miss on the slower clock domain unless that clock is slower than 1:4 as the BL8 will run too fast to keep up with unless you insert NOP commands with the read burst.

That added 1 clock delay ensures the random phase of the 128 read bit read buffers which may arrive +/-1 clock to the next domain depending on routing, since the read wont be 0 degrees, that the ck_0 acknowledge will only register after the rest of the 128 bits have all arrived at their destination.

The fanout buffer aid is for large DDR3 configurations, like 512bit BL8 reads.  I broke the latch enable into 2 banks so that with 1 control in, I get 2 outs so that those data latch enables only feed the clock enable on half the data bits instead of all 512.  1 DFF feeding 512 inputs is a heavy capacitive load on the FPGA fabric and will cut into FMAX.  The +1 is the separate toggle delay feeding my slower 200MHz clock domain so that no route from the first 2 banks of 256 bits enable signals will not need to share seeding the full 512 enable signals in my 200MHz domain.  This helps optimize routing of that separate data valid toggle closer where it needs to be in the slower clock domain.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 11, 2021, 03:54:39 am
Altera's Async FIFOs max out at 315MHz for a -6 and 238MHz for a -8.  My home made DFF SCFIFO maxes out at ~472MHz on a -6 and ~402MHz on a -8.  My core runs at 400MHz, so I cannot use Altera's FIFOs unless I use a 4:1 fifo on the pins instead of my current 2:1.  IE, my FIFO would have to be 64bit in, 128 bit out instead of the current 32bit in and 128bit out.

My logic is compatible with Altera, Lattice, Xilinx, Gowin...
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 11, 2021, 04:08:32 am
What do you mean by SCFIFO and 4:1 fifo ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 11, 2021, 04:18:16 am
Altera's SCFIFO function is their synchronous clock fifo which uses their core ram blocks instead of logic cells.
If I used 3 of those at 3 strategic points in my design, my DDR3 controller would be well under 1K logic elements, but, I would also use 3 ram blocks.

Well as for the 4:1, well, if the IO data is running at 800Mbps, the normal 2:1 DDRIO would require a fifo's input running at 400MHz 32 bit wide for a 16bit DDR3.  If my DDRIO was running in 4:1 mode instead, the fifo's input side will need to run at 200MHz with 64bits for a 16bit DDR3.

You could say I can use the Altera FIFO with a speed grade of -6 running the ram at 300MHz with the normal 2:1 DDR input.  But not anywhere near 400MHz or even be overclock-able to 500MHz.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 11, 2021, 07:47:13 am
I believe what you meant by 4:1 fifo is actually 4:1 serializer which is a term more commonly used by me and @NorthGuy

Please correct me if wrong.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 11, 2021, 08:27:34 am
The DDRIO I describe is my primary 2:1 and 1:2 serializer.  In your case, you are saying that you are using a 4:1 and 1:4 DDRIO?  I know Lattice has these as they are called a ?DDROUT/IN2x?

My code after the DDRIO 1:2 / 2:1 is my software second serializer which accumulates the BL8 into single 128bit parallel chunk.  This is the part of my code which I have been referencing.  It is a secondary home-made 1:4 deserializer with a correction ability to re-align to the first read preamble if needed.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 11, 2021, 05:04:34 pm
Yes, I am using 4:1 serializer and 1:4 deserializer.

I have just a made a github commit coding modification (https://github.com/promach/DDR/commit/c356f0a7110ea2d74e1f3861ab099ff67b2c87c8) to use 350MHz / 4 = 87.5MHz clk_serdes as the "slow" clock domain.

However, I found that the simulation waveform for both data_in_oserdes_0 and data_in_oserdes_1 signals which are the input for 4:1 serializers (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1287-L1311) should be updated 90 degrees in advance of clk_serdes domain.

Note: these 2 signals are driven by test_data (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L404) inside clk_serdes domain

Do I have some easier countermeasure solution than creating a clk_serdes_270 domain which has a 90 degree phase lead of clk_serdes domain ?

(https://i.imgur.com/pOkrdEx.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 12, 2021, 11:01:36 am
Solved using this code modification commit (https://github.com/promach/DDR/commit/13f2686a8ddb274d9de32ccae2e903c063b1ffee)
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 17, 2021, 01:03:21 pm
For https://github.com/promach/DDR , I have at least solved most of the simulation issues inside Xilinx ISIM simulator.

Now, I could not simulate the actual Micron simulation model inside Xilinx ISIM simulator because it does not accept some systemverilog syntax inside Micron model.

Do you guys have any suggestions other than modifying the Micron model itself ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 17, 2021, 11:37:33 pm
Try setting Xilinx preferred default language syntax to SystemVerilog, or rename Micron's .v to a .sv.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 18, 2021, 02:14:57 am
Xilinx ISE (https://www.xilinx.com/products/design-tools/ise-design-suite.html) does not support systemverilog
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 18, 2021, 04:16:31 am
I think a more possible workaround is to use Vivado due to systemverilog support limitation of Xilinx ISE tool.

So, need to migrate ODDR2 (https://forums.xilinx.com/t5/Other-FPGA-Architecture/ODDR2-not-supported-for-Artix-7-equivalent/td-p/277576) and PLL dynamic phase shift (https://www.xilinx.com/support/documentation/ip_documentation/clk_wiz/v6_0/pg065-clk-wiz.pdf#page=12) primitives

What do you guys think ?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 18, 2021, 01:12:24 pm
I think a more possible workaround is to use Vivado due to systemverilog support limitation of Xilinx ISE tool.

So, need to migrate ODDR2 (https://forums.xilinx.com/t5/Other-FPGA-Architecture/ODDR2-not-supported-for-Artix-7-equivalent/td-p/277576) and PLL dynamic phase shift (https://www.xilinx.com/support/documentation/ip_documentation/clk_wiz/v6_0/pg065-clk-wiz.pdf#page=12) primitives

What do you guys think ?

Vivado doesn't support Spartan-6.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 18, 2021, 01:13:36 pm
Simulation is different from bitstream generation.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 23, 2021, 01:13:31 pm
Why do I have tIS violation inside the following Vivado simulation ?

(https://i.imgur.com/0WMBcBH.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 23, 2021, 10:06:24 pm
tIS is measured with reference to your DDR_CK output.
Where is your DDR_CK signal in that simulation?

If you are doing a true Gate-Level timing simulation, the output and input timing will not match infinite and perfect IO speed of your old RTL simulations within Modelsim depending on the way you programed your IOs.

Good luck.

Note that in my sims, I was using Altera's authentic DDR-IO buffers & PLL to generate the outputs to feed Micron's DDR3 model.  I was not using a home-made simple logic DDR buffer or software generated PLL.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 24, 2021, 02:47:02 am
In this case, how would I go around the IO timing issue resulted from the use of vendor's DDR-IO buffers and PLL primitives ?

Note: I am using clk_serdes from PLL to drive CKE , RAS_N , and CAS_N signals
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 24, 2021, 03:38:26 am
In Quartus, the test bench simulations are a separate entity from a complete FPGA project.
I would normally design my FPGA and do a full compile then run a gate-level timing simulation.

My top_tb.v is a separate entity which only has access to the IO pins of the FPGA project.

The actual IO pins which are wired on my PCB to the DDR3 ram chip are what are wired to Micron's DDR3 model.  This is the only way you can see the true timing at the IO pins in the DDR3 controller.

In my RTL sims, I am doing the same thing for just the DDR3 IOs, however, gate-level timing sims does alter the timing of the read data.  Though, my power-up auto-read-phase pll-tuning function is wide enough to catch and correct for this automatically during initialization.

This method offers good true timing sims with ModelSim with the older FPGA like Cyclone IV and earlier.  This method is now deprecated with the latest generation of FPGAs from Intel for strict board-timing analysis tools which no longer support this type of simulation.

I do not know how this is done in Xilinx.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 24, 2021, 03:42:10 am
Remember, for true timing sims, you need to compile an actual FPGA as that timing is affected by things like the FPGA grade, IO voltage standard, and which pins you are actually using and where they are located on the FPGA.

This true timing information just isn't available to Modelsim under a normal logic RTL compile.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 24, 2021, 03:57:41 am
Quote
however, gate-level timing sims does alter the timing of the read data.  Though, my power-up auto-read-phase pll-tuning function is wide enough to catch and correct for this automatically during initialization.

What do you exactly mean by power-up auto-read-phase pll-tuning function ?
Is it some MPR read functions ?
Or some other things ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 24, 2021, 04:05:01 am
Quote
however, gate-level timing sims does alter the timing of the read data.  Though, my power-up auto-read-phase pll-tuning function is wide enough to catch and correct for this automatically during initialization.

What do you exactly mean by power-up auto-read-phase pll-tuning function ?
Is it some MPR read functions ?
Or some other things ?
This is when I perform the MPR System Read Calibration and tune my PLL's read phase during power-up initialization.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 24, 2021, 07:06:06 am
I suppose my current issue is not related to read phase alignment.
Could you advise about clk_serdes (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L660) ?

(https://i.imgur.com/CvF85zj.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 24, 2021, 07:20:29 am
You showed me a tIS violation.
You told me you were clocking Micron's 'ddr3.v' with your clk_serdes.

I'm trying to tell you you should have a proper generated DDR3_CK output from your design to feed Micron's 'ddr3.v' s CK input.

tIS is a setup timing relationship/clearance error between the CK and the command inputs.

If your FPGA's CK output is authentically true to clk_serdes as it is with all the command lines, then you have a tIS problem and it will also be a problem when you build an actual FPGA.

In fact, I know you have a tIS setup error visibly within you simulation waveform.

Now, there is a cheat to fix this, but, applying such a cheat is not proper form and it will come back to haunt you when you build an FPGA and you have not properly accommodated for a true CK output from the FPGA.  Either properly generate your true CK output or use the cheat.

Maybe searching for some other vendor's app notes on FPGA DDR3 implementations might help you out.  Lattice has some good info on how they generate their clocks, command, and latch data.  They kind of sit in-between Altera's old method and Xilinx current implementation.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 24, 2021, 07:27:10 am
Quote
tIS is a setup timing relationship/clearance error between the CK and the command inputs.

Command inputs (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L2203-L2224) are driven using 87.5MHz clk_serdes , not 350MHz ck
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 24, 2021, 07:38:11 am
 :palm: Just look at the picture...
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 24, 2021, 08:13:44 am
Ok, I have solved tIS violation with your hint/suggestion just above.

However, what is wrong with the following tMRD violation ?

(https://i.imgur.com/YoVgE4g.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 24, 2021, 08:16:27 am
We have covered this one already.
Go back around 22 pages.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 24, 2021, 08:18:16 am
I think I know what is wrong now.  <-- no consecutive MRD command.

In this case, this means all command inputs need to be driven using 350MHz ck signal
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 24, 2021, 09:52:13 am
Why vivado simulator aborts (with fseek error) at line 665 of Micron simulation model (https://github.com/promach/DDR/blob/main/ddr3.v#L664-L665) ?

Code: [Select]
restart
INFO: [Simtcl 6-17] Simulation restarted
run 710 us
test_ddr3_memory_controller.mem.file_io_open: at time                    0 WARNING: no +model_data option specified, using /tmp.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.0.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.1.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.2.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.3.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.4.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.5.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.6.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.7.
test_ddr3_memory_controller.mem.cmd_task: at time 701901528.0 ps INFO: Load Mode 2
test_ddr3_memory_controller.mem.cmd_task: at time 701901528.0 ps INFO: Load Mode 2 Partial Array Self Refresh = Bank 0-7
test_ddr3_memory_controller.mem.cmd_task: at time 701901528.0 ps INFO: Load Mode 2 CAS Write Latency =           5
test_ddr3_memory_controller.mem.cmd_task: at time 701901528.0 ps INFO: Load Mode 2 Auto Self Refresh = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701901528.0 ps INFO: Load Mode 2 Self Refresh Temperature = Normal
test_ddr3_memory_controller.mem.cmd_task: at time 701901528.0 ps INFO: Load Mode 2 Dynamic ODT = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701915814.0 ps INFO: Load Mode 3
test_ddr3_memory_controller.mem.cmd_task: at time 701915814.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
test_ddr3_memory_controller.mem.cmd_task: at time 701915814.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701930100.0 ps INFO: Load Mode 1
test_ddr3_memory_controller.mem.cmd_task: at time 701930100.0 ps INFO: Load Mode 1 DLL Enable = Enabled
test_ddr3_memory_controller.mem.cmd_task: at time 701930100.0 ps INFO: Load Mode 1 Output Drive Strength =          34 Ohm
test_ddr3_memory_controller.mem.cmd_task: at time 701930100.0 ps INFO: Load Mode 1 ODT Rtt = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701930100.0 ps INFO: Load Mode 1 Additive Latency = 0
test_ddr3_memory_controller.mem.cmd_task: at time 701930100.0 ps INFO: Load Mode 1 Write Levelization = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701930100.0 ps INFO: Load Mode 1 TDQS Enable = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701930100.0 ps INFO: Load Mode 1 Qoff = Enabled
test_ddr3_memory_controller.mem.cmd_task: at time 701944385.0 ps INFO: Load Mode 0
test_ddr3_memory_controller.mem.cmd_task: at time 701944385.0 ps INFO: Load Mode 0 Burst Length =  8
test_ddr3_memory_controller.mem.cmd_task: at time 701944385.0 ps INFO: Load Mode 0 Burst Order = Sequential
test_ddr3_memory_controller.mem.cmd_task: at time 701944385.0 ps INFO: Load Mode 0 CAS Latency =           5
test_ddr3_memory_controller.mem.cmd_task: at time 701944385.0 ps INFO: Load Mode 0 DLL Reset = Reset DLL
test_ddr3_memory_controller.mem.cmd_task: at time 701944385.0 ps INFO: Load Mode 0 Write Recovery =           5
test_ddr3_memory_controller.mem.cmd_task: at time 701944385.0 ps INFO: Load Mode 0 Power Down Mode = DLL on
test_ddr3_memory_controller.mem.cmd_task: at time 701981528.0 ps INFO: ZQ        long = 1
test_ddr3_memory_controller.mem.cmd_task: at time 701981528.0 ps INFO: Initialization Sequence is complete
test_ddr3_memory_controller.mem.main: at time 703412957.0 ps ERROR: Write Recovery =           5 is illegal @tCK(avg) = 2857.144531
test_ddr3_memory_controller.mem.cmd_task: at time 703452957.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703455814.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703458671.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703461528.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703464385.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703467242.0 ps INFO: Load Mode 3
test_ddr3_memory_controller.mem.cmd_task: at time 703467242.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
test_ddr3_memory_controller.mem.cmd_task: at time 703467242.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Enabled
test_ddr3_memory_controller.mem.cmd_task: at time 703504385.0 ps INFO: Read      bank 3 col 000, auto precharge 0
test_ddr3_memory_controller.mem.read_from_file: at time 703517242.0 ps ERROR: fseek to           x failed
$finish called at time : 703517242 ps : File "/home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.srcs/sources_1/imports/DDR/ddr3.v" Line 665
run: Time (s): cpu = 00:00:16 ; elapsed = 00:01:39 . Memory (MB): peak = 7895.703 ; gain = 0.000 ; free physical = 1212 ; free virtual = 7871
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 24, 2021, 09:56:59 am
Ok, the culprit seems to comes from the following line of log, but how to get around it ?

Code: [Select]
test_ddr3_memory_controller.mem.main: at time 703412957.0 ps ERROR: Write Recovery =           5 is illegal @tCK(avg) = 2857.144531
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 24, 2021, 10:45:46 am
ok, why does the Micron simulation model complains about Write Recovery = 5 is illegal before the fseek error comes into place ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 24, 2021, 11:29:19 am
When the command inputs are driven using 350MHz ck_270 signal, I have a lot of setup timing violations.

Is there any other way to go around the tMRD violations (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3707746/#msg3707746) without resorting to the use of 350MHz ck_270 signal ?

(https://i.imgur.com/vyg6cJF.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 24, 2021, 10:29:40 pm
Shouldn't the commands be on the ck_0 signal?
What's your command synchronization length?

I have 1 on my CK_0/2 plus 2 on my ck_0 side, then that feeds the DDR Output buffers.

It's beginning to look like Xilinx is not much faster is any than Altera except for the peak throughput of their DDR-IO pin buffers.

Looking at Lattice, it is the same story, but a little worse as their DDR buffers cannot achieve full speed unless you use them in DDRX2 mode, which is actually a QDR mode, IE 4:1 serdes.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 01:13:42 am
Quote
Shouldn't the commands be on the ck_0 signal?
What's your command synchronization length?

ck_270 is to generate 90 degrees phase LEAD relative to ck, since the command inputs bits need to be sampled at their middlemost bit position.

Those setup timing violations are not related to FF synchronizers length.

The logic inside the ck_270 is just a bit large.



Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 25, 2021, 02:03:58 am
All the command, controls and address are sampled right on the rise of CK_0, not the read position.

Only the DDR3 reading the DQ and DM sample on the CK_90 and CK_270 positions.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 02:54:22 am
no, check the initialization sequence you sent earlier.

all command inputs are sent to DDR3 RAM with a 90 degree phase difference with respect to ck signal.

And there is no phase difference between DQ and DQS for write operation.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 02:57:49 am
And almost all the setup timing violations within ck_270 domain is due to multi-level comparison hardware for wait_count signal.

I think I will halve the bitwidth of wait_count, and set up another counter tracking variable to sequentially increment wait_count with multiple stages
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 25, 2021, 03:37:59 am
no, check the initialization sequence you sent earlier.

all command inputs are sent to DDR3 RAM with a 90 degree phase difference with respect to ck signal.

And there is no phase difference between DQ and DQS for write operation.
Show me where and I will prove you wrong.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 03:38:55 am
See https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3707713/#msg3707713 (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3707713/#msg3707713)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 25, 2021, 03:52:43 am
My photo showing Micron's example 'B' has the command's valid point dead center on the rising of the DDR3 CK.
It also has the don't care dead center over the inverted CK#.

tIS and tIH (not shown) is just the bare minimum setup and tIH hold time, not the best case scenario.

Now, to get the maximum tIS and tIH out of the FPGA outputs, you would target the transition of all command and address lines on the CK# transition.  This is not ck_270.

I get the feeling you have never designed a synchronous bus interface before.

Take a look at my old Modelsim snapshots.  They demonstrate this exactly.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 03:56:29 am
Quote
Now, to get the maximum tIS and tIH out of the FPGA outputs, you would target the transition of all command and address lines on the CK# transition.  This is not ck_270.

Let's back up a bit.

ck_270 is 90 degree phase ADVANCE/LEAD , not phase LAG.

and therefore ck_270 will be able to center the command inputs bits to the posedge of ck, and at the same time trivially satisfies tIS requirement.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 25, 2021, 03:58:18 am
 :palm:  Your CK_0 is the DDR3 CK, the DDR3 CK# would be your CK_180.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 03:59:59 am
yes you are right, and my code actually already adhered to what you just mentioned.

I suppose my coding is just too long for you to inspect more carefully.

Please correct me if I miss anything.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 25, 2021, 04:03:18 am
and therefore ck_270 will be able to center the command inputs bits to the posedge of ck, and at the same time trivially satisfies tIS requirement.

'Satisfies' so long as all your FPGA outputs switch fast enough within the specified period, which also become a interface IO timing issue when you want to change system frequencies.

This will also make board trace timing crucial for all the control lines, it will make them just as sensitive to length matching as DQS and DQ making routing a nightmare for your controller.  The control lines were supposed to be the easiest to route having such a large timing clearance that you can go almost double length, or have more ram chips driven in parallel adding load capacitance without worries.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 25, 2021, 04:05:10 am
I would say give it a rest and just invert the CK output to your DDR3 and run all your logic internally on CK_0 like the way I have it.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 04:08:25 am
May I know why use ck instead of ck_270 ?

I do not get what you meant by implication of routing nightmare in your reply post just above.

Note: ck_obuf is similar to ck

(https://i.imgur.com/Od3N7D9.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 25, 2021, 04:15:20 am
Please zoom and time these 2 points in the vertical red line for me.
I assume you are running at 350MHz.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 04:47:12 am
I think I should use ck_180 instead of ck_270 to drive the command inputs.

See below the timing differences as requested by you.

Note: Vivado simulator seems to have only 1 cursor

(https://i.imgur.com/97sWsrR.png)

(https://i.imgur.com/YmIgXln.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 25, 2021, 04:50:37 am
You just couldn't do the math and give me a number?

Well, here is what I get on my end:

And my number is 1428ps.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 04:52:41 am
Sorry, my number is 0.814ns or 814ps

By the way, I think I should use ck_180 instead of ck_270 to drive the command inputs.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 09:37:31 am
Why test_ddr3_memory_controller.mem.main: at time 703390100.0 ps ERROR: Write Recovery =           5 is illegal @tCK(avg) = 2857.142578 ?

The relevant WRITE_RECOVERY coding could be found inside STATE_INIT_MRS_1 (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L2634)

Note: 350MHz clock has  2.857142578ns period

(https://i.imgur.com/4kmebIf.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 25, 2021, 09:56:54 am
Write recovery is inside MRS0, not MRS1.
Yes, at 350MHz, a setting of 5 would be an error.
So, there is nothing wrong with the ddr3.v verilog model.  It is telling you the truth.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 11:34:38 am
Quote
test_ddr3_memory_controller.mem.read_from_file: at time 703494385.0 ps ERROR: fseek to           x failed
$finish called at time : 703494385 ps : File "/home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.srcs/sources_1/imports/DDR/ddr3.v" Line 665

Why fseek error (https://github.com/promach/DDR/blob/main/ddr3.v#L664-L665) for MPR read function (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L2571-L2597) during initial calibration process ?

(https://i.imgur.com/mtLlSa3.png)

Code: [Select]
restart
INFO: [Simtcl 6-17] Simulation restarted
run 710 us
test_ddr3_memory_controller.mem.file_io_open: at time                    0 WARNING: no +model_data option specified, using /tmp.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.0.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.1.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.2.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.3.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.4.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.5.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.6.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening /tmp/test_ddr3_memory_controller.mem.open_bank_file.7.
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 Partial Array Self Refresh = Bank 0-7
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 CAS Write Latency =           5
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 Auto Self Refresh = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 Self Refresh Temperature = Normal
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 Dynamic ODT = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701892957.0 ps INFO: Load Mode 3
test_ddr3_memory_controller.mem.cmd_task: at time 701892957.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
test_ddr3_memory_controller.mem.cmd_task: at time 701892957.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 DLL Enable = Enabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 Output Drive Strength =          34 Ohm
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 ODT Rtt = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 Additive Latency = 0
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 Write Levelization = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 TDQS Enable = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 Qoff = Enabled
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 Burst Length =  8
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 Burst Order = Sequential
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 CAS Latency =           5
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 DLL Reset = Reset DLL
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 Write Recovery =           6
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 Power Down Mode = DLL on
test_ddr3_memory_controller.mem.cmd_task: at time 701958671.0 ps INFO: ZQ        long = 1
test_ddr3_memory_controller.mem.cmd_task: at time 701958671.0 ps INFO: Initialization Sequence is complete
test_ddr3_memory_controller.mem.cmd_task: at time 703430100.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703432957.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703435814.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703438671.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703441528.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703444385.0 ps INFO: Load Mode 3
test_ddr3_memory_controller.mem.cmd_task: at time 703444385.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
test_ddr3_memory_controller.mem.cmd_task: at time 703444385.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Enabled
test_ddr3_memory_controller.mem.cmd_task: at time 703481528.0 ps INFO: Read      bank 3 col 000, auto precharge 0
test_ddr3_memory_controller.mem.read_from_file: at time 703494385.0 ps ERROR: fseek to           x failed
$finish called at time : 703494385 ps : File "/home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.srcs/sources_1/imports/DDR/ddr3.v" Line 665
run: Time (s): cpu = 00:00:24 ; elapsed = 00:01:39 . Memory (MB): peak = 7725.891 ; gain = 0.000 ; free physical = 1122 ; free virtual = 8664
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 25, 2021, 11:38:37 am
I don't know.
Could you possibly be reading an uninitialized or blank section of ram?
Maybe Xilinx cant do fseek?
Do the 'bank' files exist?
Does the simulator have system privileges to access the folder where the bank files exist?
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 11:53:37 am
From googling, I found multiple usage of fseek with Xilinx, so I suppose this is not issue with Xilinx.

And I also have Initialization Sequence is complete from Micron simulation model log output.

What else could be wrong ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 25, 2021, 12:00:30 pm
Do the 'bank' files exist?
Does the simulator have system privileges to access the folder where the bank files exist?
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 12:09:18 pm
You are right, it is really some file permission issue which I had overlooked.

I have already done sudo chmod -R u+r ./* on /tmp and /home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/ directories, but why the read permission error dialog box keeps popping up ?

(https://i.imgur.com/FtZtfO4.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on September 25, 2021, 12:14:41 pm
OS directory/file permissions is out of my league.

If I had the same error in Win7, I would have run Modelsim with Admin Privileges to circumvent the problem.  However, since Modelsim places the tmp folder on C:\tmp\ , there isn't a problem.

You will have to solve this problem on your own.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 12:19:09 pm
ok, the culprit seems to be that the memory bank files are all empty ??  This is also why fopen() is okay but fseek() is not okay.

Code: [Select]
[phung@archlinux DDR]$ ls -hal /tmp/test_ddr3_memory_controller*
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.0
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.1
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.2
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.3
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.4
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.5
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.6
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.7
-rw-r--r-- 1 phung users 122M Sep 25 20:14 /tmp/test_ddr3_memory_controller_be_3477_1632569013.xilwvdat
[phung@archlinux DDR]$
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 25, 2021, 03:35:20 pm
I tried to set the xsim.simulate.xsim.more_options to -testplusarg model_data+./ (https://forums.xilinx.com/t5/Memory-Interfaces-and-NoC/Simulation-not-working-in-DDR-example-design/m-p/1110547/highlight/true#M17075) , but the read permission error dialog box still keeps popping up.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 26, 2021, 10:04:29 am
I suspect that those red X for DQS signals are due to ODDR2 primitives had not been migrated for Vivado environment (https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_3/7series_hdl.pdf#page=327), hence resulting in the incoming READ DQS that corresponds to the first READ preamble bit conflicting with the WRITE DQS (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L1681).

May I know should I use OPPOSITE_EDGE mode or SAME_EDGE mode for ODDR primitive inside Vivado in this case?

(https://i.imgur.com/YWLx4F6.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 26, 2021, 11:13:47 am
I got around the fseek error (https://github.com/promach/DDR/commit/9cda34aa35917f039747fbdf69261598a280a429), it seems to be related to conflicting DQS from READ and WRITE operation.

Now, I have tCCD timing violation.  Any idea ?

(https://i.imgur.com/xOwZpTa.png)

Code: [Select]
launch_simulation: Time (s): cpu = 00:00:11 ; elapsed = 00:00:07 . Memory (MB): peak = 7797.934 ; gain = 0.000 ; free physical = 891 ; free virtual = 7223
restart
INFO: [Simtcl 6-17] Simulation restarted
run 710 us
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.0.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.1.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.2.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.3.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.4.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.5.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.6.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.7.
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 Partial Array Self Refresh = Bank 0-7
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 CAS Write Latency =           5
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 Auto Self Refresh = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 Self Refresh Temperature = Normal
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 Dynamic ODT = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701892957.0 ps INFO: Load Mode 3
test_ddr3_memory_controller.mem.cmd_task: at time 701892957.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
test_ddr3_memory_controller.mem.cmd_task: at time 701892957.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 DLL Enable = Enabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 Output Drive Strength =          34 Ohm
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 ODT Rtt = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 Additive Latency = 0
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 Write Levelization = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 TDQS Enable = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 Qoff = Enabled
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 Burst Length =  8
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 Burst Order = Sequential
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 CAS Latency =           5
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 DLL Reset = Reset DLL
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 Write Recovery =           6
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 Power Down Mode = DLL on
test_ddr3_memory_controller.mem.cmd_task: at time 701958671.0 ps INFO: ZQ        long = 1
test_ddr3_memory_controller.mem.cmd_task: at time 701958671.0 ps INFO: Initialization Sequence is complete
test_ddr3_memory_controller.mem.cmd_task: at time 703430100.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703432957.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703435814.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703438671.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703441528.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703444385.0 ps INFO: Load Mode 3
test_ddr3_memory_controller.mem.cmd_task: at time 703444385.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
test_ddr3_memory_controller.mem.cmd_task: at time 703444385.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Enabled
test_ddr3_memory_controller.mem.cmd_task: at time 703504385.0 ps INFO: Read      bank 3 col 000, auto precharge 1
test_ddr3_memory_controller.mem.chk_err: at time 703507242.0 ps ERROR:  tCCD violation during Read      to bank 3
test_ddr3_memory_controller.mem.cmd_task: at time 703507242.0 ps ERROR: Read      Failure.  Illegal burst interruption.
$stop called at time : 703507242 ps
run: Time (s): cpu = 00:00:18 ; elapsed = 00:01:37 . Memory (MB): peak = 7797.934 ; gain = 0.000 ; free physical = 430 ; free virtual = 7227
Title: Re: DDR3 initialization sequence issue
Post by: SMB784 on September 26, 2021, 12:26:57 pm
ok, the culprit seems to be that the memory bank files are all empty ??  This is also why fopen() is okay but fseek() is not okay.

Code: [Select]
[phung@archlinux DDR]$ ls -hal /tmp/test_ddr3_memory_controller*
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.0
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.1
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.2
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.3
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.4
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.5
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.6
-rw-r--r-- 1 phung users    0 Sep 25 20:04 /tmp/test_ddr3_memory_controller.mem.open_bank_file.7
-rw-r--r-- 1 phung users 122M Sep 25 20:14 /tmp/test_ddr3_memory_controller_be_3477_1632569013.xilwvdat
[phung@archlinux DDR]$

Vivado hates most $f system task commands and won't work with many of them. There is a list of $f commands it works with on page 232 of UG901 (see attached screenshot). It's a very small list, which really limits what you can accomplish. I learned this the hard way when trying to parametrize $readmem tasks for multiple predefined RAM files.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 26, 2021, 03:48:22 pm
I have solved the tCCD timing violation (https://github.com/promach/DDR/commit/538aac2bd4303d609011a357ecf828010e007560), but both the fseek error and conflicting DQS issues come back again.

(https://i.imgur.com/MTBrAmu.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 29, 2021, 11:52:57 am
Why ldqs signal became X when only 4.5 cycles of ck_obuf clock signal had passed ?

(https://i.imgur.com/kVFF7p2.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 29, 2021, 01:46:13 pm
Why ldqs signal became X when only 4.5 cycles of ck_obuf clock signal had passed ?

That is correct. When you issue a read command you get a burst of 8 bits - 8 edges of DQS.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 29, 2021, 02:31:00 pm
My CL setting is only 5 (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L500), so why 8 edges of DQS ?

(https://i.imgur.com/vXAxY43.png)
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 29, 2021, 10:18:28 pm
My CL setting is only 5 (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L500), so why 8 edges of DQS ?

CL determines the latency between the read command an the first edge of DQS. Then you get 8 DQS edges (or 4 if you read half-burst) after which DQS gets released by the DDR3 chip and gets pulled to VDD/2.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 30, 2021, 08:34:04 am
Wait, CL setting is computed with respect to posedge of ck signal ?

And why there is X at the end of my simulation waveform  (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3718330/#msg3718330)?
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 30, 2021, 01:43:58 pm
Wait, CL setting is computed with respect to posedge of ck signal ?

The read command is read by the DDR3 chip at pos edge, then, after the specified number of clocks, the chip produces the first DQS edge (which coincides with the clock edge pos edge). You will receive it later because of the round-trip delay. Your simulation is behavioural, so it probably doesn't model this delay.

And why there is X at the end of my simulation waveform  (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3718330/#msg3718330)?

Because the transmission is over. Physically, both DQS and #DQS go to VDD/2. From the simulation viewpoint, the signal is undefined.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 30, 2021, 02:11:15 pm
The FIRST piece of read data from MPR_READ_function had not even been transferred yet.

Code: [Select]
start_gui
open_project /home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.xpr
open_project /home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.xpr
Scanning sources...
Finished scanning sources
INFO: [IP_Flow 19-234] Refreshing IP repositories
INFO: [IP_Flow 19-1704] No user IP repositories specified
INFO: [IP_Flow 19-2313] Loaded Vivado IP repository '/opt/Xilinx/Vivado/2021.1/data/ip'.
INFO: [IP_Flow 19-3899] Cannot get the environment domain name variable for the component vendor name. Setting the vendor name to 'user.org'.
update_compile_order -fileset sources_1
launch_simulation
Command: launch_simulation
INFO: [Vivado 12-12493] Simulation top is 'test_ddr3_memory_controller'
INFO: [Vivado 12-5698] Checking validity of IPs in the design for the 'XSim' simulator...
INFO: [Vivado 12-5682] Launching behavioral simulation in '/home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.sim/sim_1/behav/xsim'
INFO: [Vivado 12-4795] Using compiled simulation libraries for IPs
INFO: [IP_Flow 19-3899] Cannot get the environment domain name variable for the component vendor name. Setting the vendor name to 'user.org'.
INFO: [SIM-utils-51] Simulation object is 'sim_1'
INFO: [SIM-utils-72] Using boost library from '/opt/Xilinx/Vivado/2021.1/tps/boost_1_72_0'
INFO: [USF-XSim-7] Finding pre-compiled libraries...
INFO: [USF-XSim-11] File '/opt/Xilinx/Vivado/2021.1/data/xsim/xsim.ini' copied to run dir:'/home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.sim/sim_1/behav/xsim'
INFO: [SIM-utils-54] Inspecting design source files for 'test_ddr3_memory_controller' in fileset 'sim_1'...
INFO: [USF-XSim-97] Finding global include files...
INFO: [USF-XSim-98] Fetching design files from 'sim_1'...
INFO: [USF-XSim-2] XSim::Compile design
INFO: [USF-XSim-61] Executing 'COMPILE and ANALYZE' step in '/home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.sim/sim_1/behav/xsim'
xvlog --incr --relax -L uvm -prj test_ddr3_memory_controller_vlog.prj
Waiting for jobs to finish...
No pending jobs, compilation finished.
INFO: [USF-XSim-69] 'compile' step finished in '2' seconds
INFO: [USF-XSim-3] XSim::Elaborate design
INFO: [USF-XSim-61] Executing 'ELABORATE' step in '/home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.sim/sim_1/behav/xsim'
xelab -wto 05448333b6914b52aac1122a43e7e957 --incr --debug typical --relax --mt 8 -L xil_defaultlib -L uvm -L unisims_ver -L unimacro_ver -L secureip -L xpm --snapshot test_ddr3_memory_controller_behav xil_defaultlib.test_ddr3_memory_controller xil_defaultlib.glbl -log elaborate.log
Vivado Simulator v2021.1
Copyright 1986-1999, 2001-2021 Xilinx, Inc. All Rights Reserved.
Running: /opt/Xilinx/Vivado/2021.1/bin/unwrapped/lnx64.o/xelab -wto 05448333b6914b52aac1122a43e7e957 --incr --debug typical --relax --mt 8 -L xil_defaultlib -L uvm -L unisims_ver -L unimacro_ver -L secureip -L xpm --snapshot test_ddr3_memory_controller_behav xil_defaultlib.test_ddr3_memory_controller xil_defaultlib.glbl -log elaborate.log
Using 8 slave threads.
Starting static elaboration
Pass Through NonSizing Optimizer
WARNING: [VRFC 10-3091] actual bit length 1 differs from formal bit length 2 for port 'tdqs_n' [/home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.srcs/sources_1/imports/DDR/test_ddr3_memory_controller.v:662]
Completed static elaboration
INFO: [XSIM 43-4323] No Change in HDL. Linking previously generated obj files to create kernel
INFO: [USF-XSim-69] 'elaborate' step finished in '1' seconds
INFO: [USF-XSim-4] XSim::Simulate design
INFO: [USF-XSim-61] Executing 'SIMULATE' step in '/home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.sim/sim_1/behav/xsim'
INFO: [USF-XSim-98] *** Running xsim
   with args "test_ddr3_memory_controller_behav -key {Behavioral:sim_1:Functional:test_ddr3_memory_controller} -tclbatch {test_ddr3_memory_controller.tcl} -view {/home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/test_ddr3_memory_controller_behav.wcfg} -log {simulate.log} -testplusarg model_data+./."
INFO: [USF-XSim-8] Loading simulator feature
Time resolution is 1 ps
open_wave_config /home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/test_ddr3_memory_controller_behav.wcfg
source test_ddr3_memory_controller.tcl
# set curr_wave [current_wave_config]
# if { [string length $curr_wave] == 0 } {
#   if { [llength [get_objects]] > 0} {
#     add_wave /
#     set_property needs_save false [current_wave_config]
#   } else {
#      send_msg_id Add_Wave-1 WARNING "No top level signals found. Simulator will start without a wave window. If you want to open a wave window go to 'File->New Waveform Configuration' or type 'create_wave_config' in the TCL console."
#   }
# }
# run 1000ns
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.0.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.1.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.2.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.3.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.4.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.5.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.6.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.7.
INFO: [USF-XSim-96] XSim completed. Design snapshot 'test_ddr3_memory_controller_behav' loaded.
INFO: [USF-XSim-97] XSim simulation ran for 1000ns
launch_simulation: Time (s): cpu = 00:00:12 ; elapsed = 00:00:06 . Memory (MB): peak = 7602.480 ; gain = 46.828 ; free physical = 164 ; free virtual = 7649
restart
INFO: [Simtcl 6-17] Simulation restarted
run 710 us
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.0.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.1.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.2.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.3.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.4.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.5.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.6.
test_ddr3_memory_controller.mem.open_bank_file: at time 0 INFO: opening ././test_ddr3_memory_controller.mem.open_bank_file.7.
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 Partial Array Self Refresh = Bank 0-7
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 CAS Write Latency =           5
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 Auto Self Refresh = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 Self Refresh Temperature = Normal
test_ddr3_memory_controller.mem.cmd_task: at time 701878671.0 ps INFO: Load Mode 2 Dynamic ODT = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701892957.0 ps INFO: Load Mode 3
test_ddr3_memory_controller.mem.cmd_task: at time 701892957.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
test_ddr3_memory_controller.mem.cmd_task: at time 701892957.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 DLL Enable = Enabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 Output Drive Strength =          34 Ohm
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 ODT Rtt = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 Additive Latency = 0
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 Write Levelization = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 TDQS Enable = Disabled
test_ddr3_memory_controller.mem.cmd_task: at time 701907242.0 ps INFO: Load Mode 1 Qoff = Enabled
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 Burst Length =  8
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 Burst Order = Sequential
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 CAS Latency =           5
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 DLL Reset = Reset DLL
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 Write Recovery =           6
test_ddr3_memory_controller.mem.cmd_task: at time 701921528.0 ps INFO: Load Mode 0 Power Down Mode = DLL on
test_ddr3_memory_controller.mem.cmd_task: at time 701958671.0 ps INFO: ZQ        long = 1
test_ddr3_memory_controller.mem.cmd_task: at time 701958671.0 ps INFO: Initialization Sequence is complete
test_ddr3_memory_controller.mem.cmd_task: at time 703430100.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703432957.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703435814.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703438671.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703441528.0 ps INFO: Precharge All
test_ddr3_memory_controller.mem.cmd_task: at time 703444385.0 ps INFO: Load Mode 3
test_ddr3_memory_controller.mem.cmd_task: at time 703444385.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
test_ddr3_memory_controller.mem.cmd_task: at time 703444385.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Enabled
test_ddr3_memory_controller.mem.cmd_task: at time 703501528.0 ps INFO: Read      bank 3 col 000, auto precharge 1
test_ddr3_memory_controller.mem.read_from_file: at time 703514385.0 ps ERROR: fseek to           x failed
$finish called at time : 703514385 ps : File "/home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.srcs/sources_1/imports/DDR/ddr3.v" Line 665
run: Time (s): cpu = 00:00:19 ; elapsed = 00:01:44 . Memory (MB): peak = 7611.617 ; gain = 7.004 ; free physical = 158 ; free virtual = 7637
Title: Re: DDR3 initialization sequence issue
Post by: NorthGuy on September 30, 2021, 02:27:34 pm
The FIRST piece of read data from MPR_READ_function had not even been transferred yet.

You see pulses on DQS meaning the DDR3 chip is transmitting something.
Title: Re: DDR3 initialization sequence issue
Post by: promach on September 30, 2021, 02:42:00 pm
I think the root cause for X on DQS is still because of conflicting write DQS and read DQS, hence the fseek error.

Let me check my ODDR primitive for DQS.
Title: Re: DDR3 initialization sequence issue
Post by: SMB784 on September 30, 2021, 03:10:39 pm
I think the root cause for X on DQS is still because of conflicting write DQS and read DQS, hence the fseek error.

Let me check my ODDR primitive for DQS.

Are you sure $fseek is a valid command in vivado?  I don't see it listed in UG901, which usually means it doesn't work.  The only ones that are guaranteed to work are the ones listed as supported in UG901
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 01, 2021, 01:12:57 am
I think the root cause for X on DQS is still because of conflicting write DQS and read DQS, hence the fseek error.

Let me check my ODDR primitive for DQS.

Are you sure $fseek is a valid command in vivado?  I don't see it listed in UG901, which usually means it doesn't work.  The only ones that are guaranteed to work are the ones listed as supported in UG901

LOL, if the command isn't know, then how would vivado know that $fseek is a command which seeks to a position?

Code: [Select]
test_ddr3_memory_controller.mem.read_from_file: at time 703514385.0 ps ERROR: fseek to           x failed
Wouldn't is say it is an unsupported or unknown command?
Title: Re: DDR3 initialization sequence issue
Post by: SMB784 on October 01, 2021, 06:06:29 pm
I think the root cause for X on DQS is still because of conflicting write DQS and read DQS, hence the fseek error.

Let me check my ODDR primitive for DQS.

Are you sure $fseek is a valid command in vivado?  I don't see it listed in UG901, which usually means it doesn't work.  The only ones that are guaranteed to work are the ones listed as supported in UG901

LOL, if the command isn't know, then how would vivado know that $fseek is a command which seeks to a position?

Code: [Select]
test_ddr3_memory_controller.mem.read_from_file: at time 703514385.0 ps ERROR: fseek to           x failed
Wouldn't is say it is an unsupported or unknown command?

You would be surprised at how little Vivado will tell you when something is wrong.  I only bring up the issue because I tried various ways to manipulate memory modules and many of them failed due to vivado not recognizing a certain $f command without providing any error or warning messages.

That said, its possible that $fseek is perfectly ok, but it would be wise to check it in vivado to make sure it does indeed work, given that it isn't explicitly listed in UG901 as a supported command
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 02, 2021, 04:32:24 am
The fseek error happen alongside with the read permission error (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3710194/#msg3710194)

Note: I already tried setting absolute path for the following simulation property/parameter

Code: [Select]
set_property -name {xsim.simulate.xsim.more_options} -value {-testplusarg model_data+/home/phung/Downloads/DDR_backup/DDR_Xilinx_Vivado/DDR_Xilinx_Vivado.sim/sim_1/behav/xsim} -objects [get_filesets sim_1]

INFO: [IP_Flow 19-3899] Cannot get the environment domain name variable for the component vendor name. Setting the vendor name to 'user.org'.

Besides, the fseek error does not seem to be related to DQS as shown below.  I previously suspected that conflicting DQS (https://www.eevblog.com/forum/fpga/ddr3-initialization-sequence-issue/msg3718330/#msg3718330) might be the reason for the fseek error.

Could anyone advise ?

(https://i.imgur.com/ArGJNpW.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 02, 2021, 11:39:25 am
I managed to eliminate all conflicting DQS/DQ issues.

As you can see in the second picture below, $fseek() is recognized as a valid command, changing the naming of the command will rendered it undetected by Vivado code pre-processing highlight interpreter.

So, all I could say is that Vivado might had done something strange when it comes to reading file content during fseek() operation.

Please correct me if wrong.

(https://i.imgur.com/eOy1pOW.png)

(https://i.imgur.com/K70x8Iq.png)

(https://i.imgur.com/FtZtfO4.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 08, 2021, 03:34:34 pm
@BrianHG

As for the fseek() issue, I think I will try to migrate the work to test inside Quartus while I try to check what is wrong inside Xilinx Vivado.

I noticed that there are 3 different types of PLL IP core inside Quartus IP Catalog.
May I know which IP core should I use for dynamic phase shift on the incoming READ DQS strobe signal ?

Which special parameters inside the PLL IP core should I pay attention to in this case ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 08, 2021, 04:33:51 pm
Older Cyclone, MAX5/MAX10, Stratix I/II/II and Arria I/II/III PLL:
https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/75a3d5fe0ef248d7826fdf5fab9c369686c6aa2f/BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L305

Newer Cyclone V, Arria V and Stratix V PLL: (Note that these FPGA PLLs also contain support for multiple PLL chaining and precision DLL blocks which I am not using.)
https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/75a3d5fe0ef248d7826fdf5fab9c369686c6aa2f/BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L472
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 08, 2021, 04:49:07 pm
You are using BOTH altpll (https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/75a3d5fe0ef248d7826fdf5fab9c369686c6aa2f/BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L305) and altera_pll (https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/75a3d5fe0ef248d7826fdf5fab9c369686c6aa2f/BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L472) ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 08, 2021, 05:31:08 pm
You are using BOTH altpll (https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/75a3d5fe0ef248d7826fdf5fab9c369686c6aa2f/BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L305) and altera_pll (https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/75a3d5fe0ef248d7826fdf5fab9c369686c6aa2f/BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L472) ?

I'm using 1 type of PLL at a time.  It depends on which FPGA type you choose to build my project for.
Both are in my code and one is auto-selected based on parameter string 'FPGA_FAMILY'.

You may read altera data sheets on both.
Altera also has a megafunction wizard which will auto-setup the PLL for you and generate a .v code example.
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 09, 2021, 03:01:27 am
Quote
I'm using 1 type of PLL at a time.  It depends on which FPGA type you choose to build my project for.
Both are in my code and one is auto-selected based on parameter string 'FPGA_FAMILY'.

You may read altera data sheets on both.

Where did you exactly find altera_pll IP core (https://github.com/BrianHGinc/BrianHG-DDR3-Controller/search?q=altera_pll) inside IP catalog ?
And why are you not using ALTPLL_RECONFIG ?

(https://i.imgur.com/8WsA29F.png)


Quote
Altera also has a megafunction wizard which will auto-setup the PLL for you and generate a .v code example.

You mean the following ALTCLKCTRL megafunction wizard (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_altclock.pdf) ?

(https://i.imgur.com/05pnJHu.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 09, 2021, 04:26:45 am
No, that one is for making a PLL which you can in-system reconfigure.
You only need the ALTPLL which is the one I'm using.
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 10, 2021, 02:53:24 am
For Xilinx Vivado fseek() issue, I think I have found the root cause, but I have no feasible workaround so far.

There is 0.1ns difference between ck_n_obuf (which is connected directly to DDR3 RAM (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L763-L771)) and ck_180 (which drives the FPGA DDR3 RAM controller logic (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L2350))

(https://i.imgur.com/bOB14o2.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 15, 2021, 03:00:10 pm
There is no option for dynamic phase shift ?

(https://i.imgur.com/MYkgCl4.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 15, 2021, 06:32:05 pm
You already have it enabled, see the inputs in the diagram...

You can phase shift every clock output individually with the phasecounterselect input.

Read the documentation.

The phasecounterselect doesn't exactly match the c# core number output with altpll since there is an address which allows you to adjust the internal global phase as well as the feedback phase if I remember correctly.

The clock phase shift setting you see on every c# core page is what each core output will default to after power-up or a PLL reset.
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 16, 2021, 04:44:33 am
Quote
The phasecounterselect doesn't exactly match the c# core number output with altpll since there is an address which allows you to adjust the internal global phase as well as the feedback phase if I remember correctly.

The clock phase shift setting you see on every c# core page is what each core output will default to after power-up or a PLL reset.

According to alt_pll user guide (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_altpll.pdf#page=31) , why do the signals phasecounterselect[3..0], phaseupdown, phasestep need to be of tri0 or tri1 type ?

Besides, what do you exactly mean by internal global phase as well as the feedback phase ?

Code: [Select]
// megafunction wizard: %ALTPLL%
// GENERATION: STANDARD
// VERSION: WM1.0
// MODULE: altpll

// ============================================================
// File Name: pll_tuneable.v
// Megafunction Name(s):
// altpll
//
// Simulation Library Files(s):
// altera_mf
// ============================================================
// ************************************************************
// THIS IS A WIZARD-GENERATED FILE. DO NOT EDIT THIS FILE!
//
// 20.1.1 Build 720 11/11/2020 SJ Standard Edition
// ************************************************************


//Copyright (C) 2020  Intel Corporation. All rights reserved.
//Your use of Intel Corporation's design tools, logic functions
//and other software and tools, and any partner logic
//functions, and any output files from any of the foregoing
//(including device programming or simulation files), and any
//associated documentation or information are expressly subject
//to the terms and conditions of the Intel Program License
//Subscription Agreement, the Intel Quartus Prime License Agreement,
//the Intel FPGA IP License Agreement, or other applicable license
//agreement, including, without limitation, that your use is for
//the sole purpose of programming logic devices manufactured by
//Intel and sold by Intel or its authorized distributors.  Please
//refer to the applicable agreement for further details, at
//https://fpgasoftware.intel.com/eula.


// synopsys translate_off
`timescale 1 ps / 1 ps
// synopsys translate_on
module pll_tuneable (
areset,
inclk0,
pfdena,
phasecounterselect,
phasestep,
phaseupdown,
scanclk,
c0,
c1,
locked,
phasedone);

input   areset;
input   inclk0;
input   pfdena;
input [2:0]  phasecounterselect;
input   phasestep;
input   phaseupdown;
input   scanclk;
output   c0;
output   c1;
output   locked;
output   phasedone;
`ifndef ALTERA_RESERVED_QIS
// synopsys translate_off
`endif
tri0   areset;
tri1   pfdena;
tri0 [2:0]  phasecounterselect;
tri0   phasestep;
tri0   phaseupdown;
`ifndef ALTERA_RESERVED_QIS
// synopsys translate_on
`endif

wire [4:0] sub_wire0;
wire  sub_wire3;
wire  sub_wire4;
wire [0:0] sub_wire7 = 1'h0;
wire [1:1] sub_wire2 = sub_wire0[1:1];
wire [0:0] sub_wire1 = sub_wire0[0:0];
wire  c0 = sub_wire1;
wire  c1 = sub_wire2;
wire  locked = sub_wire3;
wire  phasedone = sub_wire4;
wire  sub_wire5 = inclk0;
wire [1:0] sub_wire6 = {sub_wire7, sub_wire5};

altpll altpll_component (
.areset (areset),
.inclk (sub_wire6),
.pfdena (pfdena),
.phasecounterselect (phasecounterselect),
.phasestep (phasestep),
.phaseupdown (phaseupdown),
.scanclk (scanclk),
.clk (sub_wire0),
.locked (sub_wire3),
.phasedone (sub_wire4),
.activeclock (),
.clkbad (),
.clkena ({6{1'b1}}),
.clkloss (),
.clkswitch (1'b0),
.configupdate (1'b0),
.enable0 (),
.enable1 (),
.extclk (),
.extclkena ({4{1'b1}}),
.fbin (1'b1),
.fbmimicbidir (),
.fbout (),
.fref (),
.icdrclk (),
.pllena (1'b1),
.scanaclr (1'b0),
.scanclkena (1'b1),
.scandata (1'b0),
.scandataout (),
.scandone (),
.scanread (1'b0),
.scanwrite (1'b0),
.sclkout0 (),
.sclkout1 (),
.vcooverrange (),
.vcounderrange ());
defparam
altpll_component.bandwidth_type = "AUTO",
altpll_component.clk0_divide_by = 1,
altpll_component.clk0_duty_cycle = 50,
altpll_component.clk0_multiply_by = 8,
altpll_component.clk0_phase_shift = "0",
altpll_component.clk1_divide_by = 1,
altpll_component.clk1_duty_cycle = 50,
altpll_component.clk1_multiply_by = 8,
altpll_component.clk1_phase_shift = "1250",
altpll_component.compensate_clock = "CLK0",
altpll_component.inclk0_input_frequency = 20000,
altpll_component.intended_device_family = "MAX 10",
altpll_component.lpm_hint = "CBX_MODULE_PREFIX=pll_tuneable",
altpll_component.lpm_type = "altpll",
altpll_component.operation_mode = "NORMAL",
altpll_component.pll_type = "AUTO",
altpll_component.port_activeclock = "PORT_UNUSED",
altpll_component.port_areset = "PORT_USED",
altpll_component.port_clkbad0 = "PORT_UNUSED",
altpll_component.port_clkbad1 = "PORT_UNUSED",
altpll_component.port_clkloss = "PORT_UNUSED",
altpll_component.port_clkswitch = "PORT_UNUSED",
altpll_component.port_configupdate = "PORT_UNUSED",
altpll_component.port_fbin = "PORT_UNUSED",
altpll_component.port_inclk0 = "PORT_USED",
altpll_component.port_inclk1 = "PORT_UNUSED",
altpll_component.port_locked = "PORT_USED",
altpll_component.port_pfdena = "PORT_USED",
altpll_component.port_phasecounterselect = "PORT_USED",
altpll_component.port_phasedone = "PORT_USED",
altpll_component.port_phasestep = "PORT_USED",
altpll_component.port_phaseupdown = "PORT_USED",
altpll_component.port_pllena = "PORT_UNUSED",
altpll_component.port_scanaclr = "PORT_UNUSED",
altpll_component.port_scanclk = "PORT_USED",
altpll_component.port_scanclkena = "PORT_UNUSED",
altpll_component.port_scandata = "PORT_UNUSED",
altpll_component.port_scandataout = "PORT_UNUSED",
altpll_component.port_scandone = "PORT_UNUSED",
altpll_component.port_scanread = "PORT_UNUSED",
altpll_component.port_scanwrite = "PORT_UNUSED",
altpll_component.port_clk0 = "PORT_USED",
altpll_component.port_clk1 = "PORT_USED",
altpll_component.port_clk2 = "PORT_UNUSED",
altpll_component.port_clk3 = "PORT_UNUSED",
altpll_component.port_clk4 = "PORT_UNUSED",
altpll_component.port_clk5 = "PORT_UNUSED",
altpll_component.port_clkena0 = "PORT_UNUSED",
altpll_component.port_clkena1 = "PORT_UNUSED",
altpll_component.port_clkena2 = "PORT_UNUSED",
altpll_component.port_clkena3 = "PORT_UNUSED",
altpll_component.port_clkena4 = "PORT_UNUSED",
altpll_component.port_clkena5 = "PORT_UNUSED",
altpll_component.port_extclk0 = "PORT_UNUSED",
altpll_component.port_extclk1 = "PORT_UNUSED",
altpll_component.port_extclk2 = "PORT_UNUSED",
altpll_component.port_extclk3 = "PORT_UNUSED",
altpll_component.self_reset_on_loss_lock = "OFF",
altpll_component.vco_frequency_control = "MANUAL_PHASE",
altpll_component.vco_phase_shift_step = 1,
altpll_component.width_clock = 5,
altpll_component.width_phasecounterselect = 3;


endmodule

// ============================================================
// CNX file retrieval info
// ============================================================
// Retrieval info: PRIVATE: ACTIVECLK_CHECK STRING "0"
// Retrieval info: PRIVATE: BANDWIDTH STRING "1.000"
// Retrieval info: PRIVATE: BANDWIDTH_FEATURE_ENABLED STRING "1"
// Retrieval info: PRIVATE: BANDWIDTH_FREQ_UNIT STRING "MHz"
// Retrieval info: PRIVATE: BANDWIDTH_PRESET STRING "Low"
// Retrieval info: PRIVATE: BANDWIDTH_USE_AUTO STRING "1"
// Retrieval info: PRIVATE: BANDWIDTH_USE_PRESET STRING "0"
// Retrieval info: PRIVATE: CLKBAD_SWITCHOVER_CHECK STRING "0"
// Retrieval info: PRIVATE: CLKLOSS_CHECK STRING "0"
// Retrieval info: PRIVATE: CLKSWITCH_CHECK STRING "0"
// Retrieval info: PRIVATE: CNX_NO_COMPENSATE_RADIO STRING "0"
// Retrieval info: PRIVATE: CREATE_CLKBAD_CHECK STRING "0"
// Retrieval info: PRIVATE: CREATE_INCLK1_CHECK STRING "0"
// Retrieval info: PRIVATE: CUR_DEDICATED_CLK STRING "c0"
// Retrieval info: PRIVATE: CUR_FBIN_CLK STRING "c0"
// Retrieval info: PRIVATE: DEVICE_SPEED_GRADE STRING "Any"
// Retrieval info: PRIVATE: DIV_FACTOR0 NUMERIC "1"
// Retrieval info: PRIVATE: DIV_FACTOR1 NUMERIC "1"
// Retrieval info: PRIVATE: DUTY_CYCLE0 STRING "50.00000000"
// Retrieval info: PRIVATE: DUTY_CYCLE1 STRING "50.00000000"
// Retrieval info: PRIVATE: EFF_OUTPUT_FREQ_VALUE0 STRING "400.000000"
// Retrieval info: PRIVATE: EFF_OUTPUT_FREQ_VALUE1 STRING "400.000000"
// Retrieval info: PRIVATE: EXPLICIT_SWITCHOVER_COUNTER STRING "0"
// Retrieval info: PRIVATE: EXT_FEEDBACK_RADIO STRING "0"
// Retrieval info: PRIVATE: GLOCKED_COUNTER_EDIT_CHANGED STRING "1"
// Retrieval info: PRIVATE: GLOCKED_FEATURE_ENABLED STRING "0"
// Retrieval info: PRIVATE: GLOCKED_MODE_CHECK STRING "0"
// Retrieval info: PRIVATE: GLOCK_COUNTER_EDIT NUMERIC "1048575"
// Retrieval info: PRIVATE: HAS_MANUAL_SWITCHOVER STRING "1"
// Retrieval info: PRIVATE: INCLK0_FREQ_EDIT STRING "50.000"
// Retrieval info: PRIVATE: INCLK0_FREQ_UNIT_COMBO STRING "MHz"
// Retrieval info: PRIVATE: INCLK1_FREQ_EDIT STRING "100.000"
// Retrieval info: PRIVATE: INCLK1_FREQ_EDIT_CHANGED STRING "1"
// Retrieval info: PRIVATE: INCLK1_FREQ_UNIT_CHANGED STRING "1"
// Retrieval info: PRIVATE: INCLK1_FREQ_UNIT_COMBO STRING "MHz"
// Retrieval info: PRIVATE: INTENDED_DEVICE_FAMILY STRING "MAX 10"
// Retrieval info: PRIVATE: INT_FEEDBACK__MODE_RADIO STRING "1"
// Retrieval info: PRIVATE: LOCKED_OUTPUT_CHECK STRING "1"
// Retrieval info: PRIVATE: LONG_SCAN_RADIO STRING "1"
// Retrieval info: PRIVATE: LVDS_MODE_DATA_RATE STRING "Not Available"
// Retrieval info: PRIVATE: LVDS_MODE_DATA_RATE_DIRTY NUMERIC "0"
// Retrieval info: PRIVATE: LVDS_PHASE_SHIFT_UNIT0 STRING "deg"
// Retrieval info: PRIVATE: LVDS_PHASE_SHIFT_UNIT1 STRING "deg"
// Retrieval info: PRIVATE: MANUAL_PHASE_SHIFT_STEP_EDIT STRING "1.00000000"
// Retrieval info: PRIVATE: MANUAL_PHASE_SHIFT_STEP_UNIT STRING "ps"
// Retrieval info: PRIVATE: MIG_DEVICE_SPEED_GRADE STRING "Any"
// Retrieval info: PRIVATE: MIRROR_CLK0 STRING "0"
// Retrieval info: PRIVATE: MIRROR_CLK1 STRING "0"
// Retrieval info: PRIVATE: MULT_FACTOR0 NUMERIC "8"
// Retrieval info: PRIVATE: MULT_FACTOR1 NUMERIC "8"
// Retrieval info: PRIVATE: NORMAL_MODE_RADIO STRING "1"
// Retrieval info: PRIVATE: OUTPUT_FREQ0 STRING "100.00000000"
// Retrieval info: PRIVATE: OUTPUT_FREQ1 STRING "100.00000000"
// Retrieval info: PRIVATE: OUTPUT_FREQ_MODE0 STRING "0"
// Retrieval info: PRIVATE: OUTPUT_FREQ_MODE1 STRING "0"
// Retrieval info: PRIVATE: OUTPUT_FREQ_UNIT0 STRING "MHz"
// Retrieval info: PRIVATE: OUTPUT_FREQ_UNIT1 STRING "MHz"
// Retrieval info: PRIVATE: PHASE_RECONFIG_FEATURE_ENABLED STRING "1"
// Retrieval info: PRIVATE: PHASE_RECONFIG_INPUTS_CHECK STRING "1"
// Retrieval info: PRIVATE: PHASE_SHIFT0 STRING "0.00000000"
// Retrieval info: PRIVATE: PHASE_SHIFT1 STRING "180.00000000"
// Retrieval info: PRIVATE: PHASE_SHIFT_STEP_ENABLED_CHECK STRING "1"
// Retrieval info: PRIVATE: PHASE_SHIFT_UNIT0 STRING "deg"
// Retrieval info: PRIVATE: PHASE_SHIFT_UNIT1 STRING "deg"
// Retrieval info: PRIVATE: PLL_ADVANCED_PARAM_CHECK STRING "0"
// Retrieval info: PRIVATE: PLL_ARESET_CHECK STRING "1"
// Retrieval info: PRIVATE: PLL_AUTOPLL_CHECK NUMERIC "1"
// Retrieval info: PRIVATE: PLL_ENHPLL_CHECK NUMERIC "0"
// Retrieval info: PRIVATE: PLL_FASTPLL_CHECK NUMERIC "0"
// Retrieval info: PRIVATE: PLL_FBMIMIC_CHECK STRING "0"
// Retrieval info: PRIVATE: PLL_LVDS_PLL_CHECK NUMERIC "0"
// Retrieval info: PRIVATE: PLL_PFDENA_CHECK STRING "1"
// Retrieval info: PRIVATE: PLL_TARGET_HARCOPY_CHECK NUMERIC "0"
// Retrieval info: PRIVATE: PRIMARY_CLK_COMBO STRING "inclk0"
// Retrieval info: PRIVATE: RECONFIG_FILE STRING "pll_tuneable.mif"
// Retrieval info: PRIVATE: SACN_INPUTS_CHECK STRING "0"
// Retrieval info: PRIVATE: SCAN_FEATURE_ENABLED STRING "1"
// Retrieval info: PRIVATE: SELF_RESET_LOCK_LOSS STRING "0"
// Retrieval info: PRIVATE: SHORT_SCAN_RADIO STRING "0"
// Retrieval info: PRIVATE: SPREAD_FEATURE_ENABLED STRING "0"
// Retrieval info: PRIVATE: SPREAD_FREQ STRING "50.000"
// Retrieval info: PRIVATE: SPREAD_FREQ_UNIT STRING "KHz"
// Retrieval info: PRIVATE: SPREAD_PERCENT STRING "0.500"
// Retrieval info: PRIVATE: SPREAD_USE STRING "0"
// Retrieval info: PRIVATE: SRC_SYNCH_COMP_RADIO STRING "0"
// Retrieval info: PRIVATE: STICKY_CLK0 STRING "1"
// Retrieval info: PRIVATE: STICKY_CLK1 STRING "1"
// Retrieval info: PRIVATE: STICKY_CLK2 STRING "0"
// Retrieval info: PRIVATE: STICKY_CLK3 STRING "0"
// Retrieval info: PRIVATE: STICKY_CLK4 STRING "0"
// Retrieval info: PRIVATE: SWITCHOVER_COUNT_EDIT NUMERIC "1"
// Retrieval info: PRIVATE: SWITCHOVER_FEATURE_ENABLED STRING "1"
// Retrieval info: PRIVATE: SYNTH_WRAPPER_GEN_POSTFIX STRING "0"
// Retrieval info: PRIVATE: USE_CLK0 STRING "1"
// Retrieval info: PRIVATE: USE_CLK1 STRING "1"
// Retrieval info: PRIVATE: USE_CLKENA0 STRING "0"
// Retrieval info: PRIVATE: USE_CLKENA1 STRING "0"
// Retrieval info: PRIVATE: USE_MIL_SPEED_GRADE NUMERIC "0"
// Retrieval info: PRIVATE: ZERO_DELAY_RADIO STRING "0"
// Retrieval info: LIBRARY: altera_mf altera_mf.altera_mf_components.all
// Retrieval info: CONSTANT: BANDWIDTH_TYPE STRING "AUTO"
// Retrieval info: CONSTANT: CLK0_DIVIDE_BY NUMERIC "1"
// Retrieval info: CONSTANT: CLK0_DUTY_CYCLE NUMERIC "50"
// Retrieval info: CONSTANT: CLK0_MULTIPLY_BY NUMERIC "8"
// Retrieval info: CONSTANT: CLK0_PHASE_SHIFT STRING "0"
// Retrieval info: CONSTANT: CLK1_DIVIDE_BY NUMERIC "1"
// Retrieval info: CONSTANT: CLK1_DUTY_CYCLE NUMERIC "50"
// Retrieval info: CONSTANT: CLK1_MULTIPLY_BY NUMERIC "8"
// Retrieval info: CONSTANT: CLK1_PHASE_SHIFT STRING "1250"
// Retrieval info: CONSTANT: COMPENSATE_CLOCK STRING "CLK0"
// Retrieval info: CONSTANT: INCLK0_INPUT_FREQUENCY NUMERIC "20000"
// Retrieval info: CONSTANT: INTENDED_DEVICE_FAMILY STRING "MAX 10"
// Retrieval info: CONSTANT: LPM_TYPE STRING "altpll"
// Retrieval info: CONSTANT: OPERATION_MODE STRING "NORMAL"
// Retrieval info: CONSTANT: PLL_TYPE STRING "AUTO"
// Retrieval info: CONSTANT: PORT_ACTIVECLOCK STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_ARESET STRING "PORT_USED"
// Retrieval info: CONSTANT: PORT_CLKBAD0 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_CLKBAD1 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_CLKLOSS STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_CLKSWITCH STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_CONFIGUPDATE STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_FBIN STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_INCLK0 STRING "PORT_USED"
// Retrieval info: CONSTANT: PORT_INCLK1 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_LOCKED STRING "PORT_USED"
// Retrieval info: CONSTANT: PORT_PFDENA STRING "PORT_USED"
// Retrieval info: CONSTANT: PORT_PHASECOUNTERSELECT STRING "PORT_USED"
// Retrieval info: CONSTANT: PORT_PHASEDONE STRING "PORT_USED"
// Retrieval info: CONSTANT: PORT_PHASESTEP STRING "PORT_USED"
// Retrieval info: CONSTANT: PORT_PHASEUPDOWN STRING "PORT_USED"
// Retrieval info: CONSTANT: PORT_PLLENA STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_SCANACLR STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_SCANCLK STRING "PORT_USED"
// Retrieval info: CONSTANT: PORT_SCANCLKENA STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_SCANDATA STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_SCANDATAOUT STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_SCANDONE STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_SCANREAD STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_SCANWRITE STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_clk0 STRING "PORT_USED"
// Retrieval info: CONSTANT: PORT_clk1 STRING "PORT_USED"
// Retrieval info: CONSTANT: PORT_clk2 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_clk3 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_clk4 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_clk5 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_clkena0 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_clkena1 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_clkena2 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_clkena3 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_clkena4 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_clkena5 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_extclk0 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_extclk1 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_extclk2 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: PORT_extclk3 STRING "PORT_UNUSED"
// Retrieval info: CONSTANT: SELF_RESET_ON_LOSS_LOCK STRING "OFF"
// Retrieval info: CONSTANT: VCO_FREQUENCY_CONTROL STRING "MANUAL_PHASE"
// Retrieval info: CONSTANT: VCO_PHASE_SHIFT_STEP NUMERIC "1"
// Retrieval info: CONSTANT: WIDTH_CLOCK NUMERIC "5"
// Retrieval info: CONSTANT: WIDTH_PHASECOUNTERSELECT NUMERIC "3"
// Retrieval info: USED_PORT: @clk 0 0 5 0 OUTPUT_CLK_EXT VCC "@clk[4..0]"
// Retrieval info: USED_PORT: areset 0 0 0 0 INPUT GND "areset"
// Retrieval info: USED_PORT: c0 0 0 0 0 OUTPUT_CLK_EXT VCC "c0"
// Retrieval info: USED_PORT: c1 0 0 0 0 OUTPUT_CLK_EXT VCC "c1"
// Retrieval info: USED_PORT: inclk0 0 0 0 0 INPUT_CLK_EXT GND "inclk0"
// Retrieval info: USED_PORT: locked 0 0 0 0 OUTPUT GND "locked"
// Retrieval info: USED_PORT: pfdena 0 0 0 0 INPUT VCC "pfdena"
// Retrieval info: USED_PORT: phasecounterselect 0 0 3 0 INPUT GND "phasecounterselect[2..0]"
// Retrieval info: USED_PORT: phasedone 0 0 0 0 OUTPUT GND "phasedone"
// Retrieval info: USED_PORT: phasestep 0 0 0 0 INPUT GND "phasestep"
// Retrieval info: USED_PORT: phaseupdown 0 0 0 0 INPUT GND "phaseupdown"
// Retrieval info: USED_PORT: scanclk 0 0 0 0 INPUT_CLK_EXT VCC "scanclk"
// Retrieval info: CONNECT: @areset 0 0 0 0 areset 0 0 0 0
// Retrieval info: CONNECT: @inclk 0 0 1 1 GND 0 0 0 0
// Retrieval info: CONNECT: @inclk 0 0 1 0 inclk0 0 0 0 0
// Retrieval info: CONNECT: @pfdena 0 0 0 0 pfdena 0 0 0 0
// Retrieval info: CONNECT: @phasecounterselect 0 0 3 0 phasecounterselect 0 0 3 0
// Retrieval info: CONNECT: @phasestep 0 0 0 0 phasestep 0 0 0 0
// Retrieval info: CONNECT: @phaseupdown 0 0 0 0 phaseupdown 0 0 0 0
// Retrieval info: CONNECT: @scanclk 0 0 0 0 scanclk 0 0 0 0
// Retrieval info: CONNECT: c0 0 0 0 0 @clk 0 0 1 0
// Retrieval info: CONNECT: c1 0 0 0 0 @clk 0 0 1 1
// Retrieval info: CONNECT: locked 0 0 0 0 @locked 0 0 0 0
// Retrieval info: CONNECT: phasedone 0 0 0 0 @phasedone 0 0 0 0
// Retrieval info: GEN_FILE: TYPE_NORMAL pll_tuneable.v TRUE
// Retrieval info: GEN_FILE: TYPE_NORMAL pll_tuneable.ppf TRUE
// Retrieval info: GEN_FILE: TYPE_NORMAL pll_tuneable.inc TRUE
// Retrieval info: GEN_FILE: TYPE_NORMAL pll_tuneable.cmp TRUE
// Retrieval info: GEN_FILE: TYPE_NORMAL pll_tuneable.bsf TRUE
// Retrieval info: GEN_FILE: TYPE_NORMAL pll_tuneable_inst.v TRUE
// Retrieval info: GEN_FILE: TYPE_NORMAL pll_tuneable_bb.v TRUE
// Retrieval info: LIB_FILE: altera_mf
// Retrieval info: CBX_MODULE_PREFIX: ON
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 16, 2021, 04:03:29 pm
Why is Quartus giving the following error relating to i_user_data_address (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L366) and write_enable (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L373) signals alone ?

Note: Both Vivado and ISE tools do not have any error during synthesis process.

Code: [Select]
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[16]" at test_ddr3_memory_controller.v(351)
Error (10029): Constant driver at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[15]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[14]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[13]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[12]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[11]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[10]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[9]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[8]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[7]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[6]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[5]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[4]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[3]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[2]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[1]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "i_user_data_address[0]" at test_ddr3_memory_controller.v(351)
Error (10028): Can't resolve multiple constant drivers for net "write_enable" at test_ddr3_memory_controller.v(351)
Error (12153): Can't elaborate top-level user hierarchy
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 16, 2021, 06:18:08 pm
It means in 2 or more different locations your code, you made 'i_user_data_address' = to 1 value, then another.

Or you made an = assignment and also somewhere else in logic a <= to the reg 'i_user_data_address'.

Or, you are making i_user_data_address <= to a value and have it also tied to a sub-module's .xxx(i_user_data_address) who is set to an output, ie that module is setting i_user_data_address to a value while in you top modules main code, you are also making it <= to a value as well.  IE, 2 drivers to the same reg.

Last case may have to do with making the reg <= a value in 1 clock domain while simultaneously making it <= to another value in another clock domain.
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 17, 2021, 03:55:54 am
All 'i_user_data_address' assignment logic are in the same always block.

And it is an input to the submodule.

I have commented out main_state inside the if-statement which might be the cause for clock domain conflict, but not helping.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 17, 2021, 05:38:58 am
Maybe it is 'clk_serdes' on line '351'.
How is 'clk_serdes' generated?
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 17, 2021, 06:39:37 am
'clk_serdes' is generated from ALT_PLL (https://github.com/promach/DDR/blob/main/ddr3_memory_controller.v#L931) core.

Note: The github version is targeted at Xilinx platform, so you might need to turn on the ifdef ALTERA option manually for code debugging purpose.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 17, 2021, 12:27:44 pm
Well, the ifdef in circuit will be ignored is not selected.

The modelsim which came with altera knows how to properly simulated altera IP like it's ALTPLL and DDRIO.  If you select CycloneIV as an FPGA, it can be setup to simulate with actual true IO timing.

Have you tried changing which default Verilog version Quartus uses?
In 'Assignment Settings / Compiler Settings / Verilog HDL Input', you may choose:
Verilog 1995
Verilog 2005
SystemVerilog

If this does work, it will not tell you what you have done which made your code not work with for example 'Verilog 1995'.  (Well maybe you will see a 'warning' in the system messages window during compile which might offer a better clue.)  It will be then up to you whether you want to further hunt down the issue.

Warning about Cyclone IV/V and Max10, they use different types of DDR ip and Quartus may compile Cyclone's DDR ip when generating a Max10 FPGA, but, it will not work on silicon and there is no warning.  I did complain on Intel's forum, but not much came of it.
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 17, 2021, 12:31:36 pm
Modelsim will not launch unless Quartus had finished the synthesis process.

I am using systemverilog for the file type if that is what you are asking.

I am really stucked here with ISE tool not being able to simulate Micron simulation model in systemverilog, Vivado tool not being able to get past fseek() error, and Quartus tool with multiple driver synthesis error.

Let me also change the DDR IP as you had suggested for Max10 and see if this helps with the synthesis error.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 17, 2021, 12:43:18 pm
Modelsim will not launch unless Quartus had finished the synthesis process.

I am using systemverilog for the file type if that is what you are asking.

I am really stucked here with ISE tool not being able to simulate Micron simulation model in systemverilog, Vivado tool not being able to get past fseek() error, and Quartus tool with multiple driver synthesis error.

Let me also change the DDR IP as you had suggested for Max10 and see if this helps with the synthesis error.

Yes, Altera modelsim will launch without Quartus, just click on it's icon.  You just need to know the setup commands and how to include project files in the transcript window.  In fact, I now go to Quartus last as working in Modelsim alone only takes around 1-2 seconds to completely re-compile a build.  The instructions are in my DDR3 build, in the simulation instructions on my setup_****.do and run_****.do script files.  They have the library includes for the altera IP on the 'vsim' line while the vlog's I use to include my project source files.  If you want full timing simulations, yes it must first completely compile for a FPGA in Quartus.
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 17, 2021, 01:11:50 pm
Ok, it seems that the multiple driver synthesis error might be originating from the generate for loop (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L344-L348)
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 17, 2021, 01:35:05 pm
I have solved the multiple driver synthesis error in Quartus.

May I know where I could find tri-state buffer primitive (https://www.intel.com/content/www/us/en/programmable/quartushelp/13.0/mergedProjects/hdl/prim/prim_file_alt_outbuf_tri.htm) for Altera MAX10 ?

I tried to search for keyword "buf" inside the IP catalog, but nothing came up.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 17, 2021, 01:54:19 pm
Functions to look at in attached images.
The Max10 has everything squished into their GPIO.
Run the wizard and let it generate sample verilog code for you.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 17, 2021, 01:58:17 pm
As for a simple tristate anywhere, there is always:

inout  iopin,

reg iopin_oe;
reg iopin_outdata;
assign iopin = iopin_oe ? iopin_outdata : 1'bz ;

...
readback_iopin <= iopin ;

Title: Re: DDR3 initialization sequence issue
Post by: promach on October 17, 2021, 02:54:15 pm
I tried to search for more information about MAX 10 GPIO (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/max-10/archives/ug-m10-gpio-15.1.pdf) , but there is not much info about tri-state buffer ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 17, 2021, 03:31:36 pm
The tristate is an input to the GPIO.
Run the megawizard and read and look at the example generated verilog code.
ok just use the verilog code example 2 posts up.
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 17, 2021, 03:54:18 pm
the simple verilog code example is not enough for double-data-rate purpose.

Anything wrong with the following setting for DQS tri-state buffer ?

(https://i.imgur.com/0Yrvpbs.png)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 17, 2021, 04:09:18 pm
the simple verilog code example is not enough for double-data-rate purpose.

Anything wrong with the following setting for DQS tri-state buffer ?

(https://i.imgur.com/0Yrvpbs.png)
Disable open-drain output.
Open-drain means it will not drive a high, only low or open.

The rest is functional.
The DQ and DM should be the same except for the use-differential.

Disabling the the open-drain may allow the use of a register to drive the OE at the IO buffer.
You can verify your code with my choice here:
https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/75a3d5fe0ef248d7826fdf5fab9c369686c6aa2f/BrianHG_DDR3/BrianHG_DDR3_IO_PORT_ALTERA.sv#L278 (https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/75a3d5fe0ef248d7826fdf5fab9c369686c6aa2f/BrianHG_DDR3/BrianHG_DDR3_IO_PORT_ALTERA.sv#L278)

Except for parameter 'INVERT_INPUT_CLOCK' as you may be using a separate PLL phase to drive the clock input depending on your design.
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 17, 2021, 04:30:00 pm
May I know why you set ENABLE_OE_PORT , INVERT_INPUT_CLOCK , USE_ONE_REG_TO_DRIVE_OE , USE_DDIO_REG_TO_DRIVE_OE , USE_ADVANCED_DDR_FEATURES_FOR_INPUT_ONLY to TRUE ?
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 17, 2021, 04:47:42 pm
https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/max-10/archives/ug-m10-gpio-15.1.pdf (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/max-10/archives/ug-m10-gpio-15.1.pdf)

Read page 50.
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 17, 2021, 04:51:38 pm
The end result of my parameter selection is a DDR buffer which functions identically to altera Cyclone's altddio_bidir buffer with the following settings:

https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/75a3d5fe0ef248d7826fdf5fab9c369686c6aa2f/BrianHG_DDR3/BrianHG_DDR3_IO_PORT_ALTERA.sv#L390 (https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/75a3d5fe0ef248d7826fdf5fab9c369686c6aa2f/BrianHG_DDR3/BrianHG_DDR3_IO_PORT_ALTERA.sv#L390)

Now, the 1 piece of code I made will operate identically for Max10 and Cyclone V/IV/III.
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 18, 2021, 04:24:08 pm
I do not understand why you use .din( 2'b10 ) (https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/75a3d5fe0ef248d7826fdf5fab9c369686c6aa2f/BrianHG_DDR3/BrianHG_DDR3_IO_PORT_ALTERA.sv#L294)
Title: Re: DDR3 initialization sequence issue
Post by: BrianHG on October 18, 2021, 04:47:27 pm
Since it is a 1 bit DDR buffer, and that is the SDR input -to- DDR data output which will be fed out, a 0 will be sent when the DDR_CLK goes high, and a 1 will be sent when the DDR_CLK goes low.
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 18, 2021, 05:03:29 pm
What about inclock and outclock (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/max-10/archives/ug-m10-gpio-15.1.pdf#page=52) ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 20, 2021, 03:03:38 pm
Why does a simple tri-state buffer require clock to gate the data signals ?  I suppose it is the job of OE (output enable) signal to do this ?
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 24, 2021, 03:46:19 pm
As for the fseek() issue, it seems that row = row_pipeline[0]; (https://github.com/promach/DDR/blob/main/ddr3.v#L1765) contains the XXX value.
But I am not sure what causes this.

Note: According to the simulation waveform, there is no XXX value in any of the DDR command inputs signals

(https://i.imgur.com/VujGmbR.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 26, 2021, 01:53:36 pm
I have gotten around the fseek() issue by exporting Vivado simulation libraries to Modelsim.  It seems that it is due to Vivado internal issue, not related to any of the user application coding.

Modelsim waveform using SOFTWARE PLL approach

(https://i.imgur.com/7Pn7mr3.png)


Modelsim waveform using HARDWARE PLL approach

(https://i.imgur.com/v0CiNbZ.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on October 31, 2021, 09:56:30 am
What causes the col address (https://github.com/promach/DDR/blob/main/test_ddr3_memory_controller.v#L482) to follow this order : 5, 6, 7, 4, 1, 2, 3, 0 ?

(https://i.imgur.com/Vmq7n5S.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on January 26, 2022, 01:13:47 am
For this vcd file (https://drive.google.com/file/d/12wtfqdXvYdk0RkRwew57E8jIoPDFZAWK/view?usp=sharing), why is the DRAM read operation (709746ns) not reading back the DQ values written at the same address (4096) during DRAM write operation (709578ns) ?

read and write latency = five ck cycles

(https://i.imgur.com/DGk91Pb.png)

(https://i.imgur.com/0khjWm6.png)
Title: Re: DDR3 initialization sequence issue
Post by: promach on January 27, 2022, 03:48:36 pm
The issue seems to have been resolved by incrementing the DRAM address by an amount of BURST_LENGTH instead of just 1 (https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr3/2gb_ddr3_sdram.pdf?rev=4bc67ac3a6f34250a2b73cb9db8c5502#page=139)