Author Topic: DDR3 initialization sequence issue  (Read 63534 times)

0 Members and 1 Guest are viewing this topic.

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #350 on: July 22, 2021, 04:24:56 pm »
Is it possible to use only a single PLL 0 degree output clock (without generating 90 degree clocks) to perform 90 degree phase shift since the PLL 0 degree output clock period is already known beforehand ?

I suppose the dynamic phase shift inside the PLL has some fixed phase increment value ?
« Last Edit: July 22, 2021, 04:36:17 pm by promach »
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3143
  • Country: ca
Re: DDR3 initialization sequence issue
« Reply #351 on: July 22, 2021, 05:20:58 pm »
Do I need BOTH IOCLK0 and IOCLK1 for IODELAY2 primitive to work properly ?
Is IOCLK0 and IOCLK1 being 180 degree phase difference apart from each other ?

Most likely, these are the same clock lines that drive IDDR and ODDR, so both get connected no matter what you do.

I do not understand how "Optionally invertible" uses a single BUFG instead of two ?

You use "single_BUFG" to drive one clock and "not single_BUFG" to drive the other clock, as opposed to "phase_0_BUFG" driving one clock and "phase_180_BUFG" to drive the other.
 
The following users thanked this post: promach

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #352 on: July 23, 2021, 01:06:08 am »
Quote
You use "single_BUFG" to drive one clock and "not single_BUFG" to drive the other clock, as opposed to "phase_0_BUFG" driving one clock and "phase_180_BUFG" to drive the other.

Does this mean that I could use the same clock (0 degree phase) signal for both IOCLK0 and IOCLK1 of IODELAY2 primitive ?
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3143
  • Country: ca
Re: DDR3 initialization sequence issue
« Reply #353 on: July 23, 2021, 02:10:41 pm »
Quote
You use "single_BUFG" to drive one clock and "not single_BUFG" to drive the other clock, as opposed to "phase_0_BUFG" driving one clock and "phase_180_BUFG" to drive the other.

Does this mean that I could use the same clock (0 degree phase) signal for both IOCLK0 and IOCLK1 of IODELAY2 primitive ?

If you invert it for IOCLK1 (and if you do the same for IDDR and ODDR as well).
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #354 on: July 24, 2021, 04:57:18 pm »
Quote
I'm using the PLL to generate both my fixed 0 and 90 degree clocks, not the DCM.  Depending on resolution, I either would use the PLL third output itself, or use the DCM connected to the PLL 0 degree output to generate my phase tuneable clock for the read data.  In other words, the DCM module only has 1 single 0 degree clock output enabled which exclusively drives my read clock.  No other outputs enables, no frequency conversion.  Wired like this, there should be 0 problem using the tuning input controls with the finest possible tuning steps, as in the act of tuning moves that 0 degree to where I need.  I believe since you have 2 DCMs for each PLL, you can then individually tune each one to clock each 8 bits group the DDR3.

@BrianHG

For read operation, I do not understand why using only a 90 degree clock (without using any delay elements such as Xilinx's IODELAY2 primitive) would be able to clock all 8 bits of the incoming DQ signals ?
« Last Edit: July 24, 2021, 05:37:56 pm by promach »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7727
  • Country: ca
Re: DDR3 initialization sequence issue
« Reply #355 on: July 24, 2021, 07:02:02 pm »
Quote
I'm using the PLL to generate both my fixed 0 and 90 degree clocks, not the DCM.  Depending on resolution, I either would use the PLL third output itself, or use the DCM connected to the PLL 0 degree output to generate my phase tuneable clock for the read data.  In other words, the DCM module only has 1 single 0 degree clock output enabled which exclusively drives my read clock.  No other outputs enables, no frequency conversion.  Wired like this, there should be 0 problem using the tuning input controls with the finest possible tuning steps, as in the act of tuning moves that 0 degree to where I need.  I believe since you have 2 DCMs for each PLL, you can then individually tune each one to clock each 8 bits group the DDR3.

@BrianHG

For read operation, I do not understand why using only a 90 degree clock (without using any delay elements such as Xilinx's IODELAY2 primitive) would be able to clock all 8 bits of the incoming DQ signals ?
90 degree is for the write, not read.
The read is being tuned at initialization based on the read cal.
I haven't looked at or used the IODELAY2 primitive.
 
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3143
  • Country: ca
Re: DDR3 initialization sequence issue
« Reply #356 on: July 24, 2021, 08:58:10 pm »
90 degree is for the write, not read.
The read is being tuned at initialization based on the read cal.
I haven't looked at or used the IODELAY2 primitive.

Yes, there may be arbitrary relationship for read because the delay is caused by round-trip

CLK->CLK output buffers->CLK traces->DDR3 chip->DQ/DQS traces->DQ/DQS input buffers=>DQ

and hence is not very predictable (not to mention fly-by connections).

You need to align the signal and the clock somehow. BrianHG use phase shift in his clock to do so. Delaying the signal with IODELAY is another method.
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #357 on: July 25, 2021, 03:36:11 am »
Quote
You need to align the signal and the clock somehow. BrianHG use phase shift in his clock to do so. Delaying the signal with IODELAY is another method.

PLL dynamic phase shift approach by @BrianHG could only align the incoming DQS strobe, what about all 8 bits of DQ signal ?
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3143
  • Country: ca
Re: DDR3 initialization sequence issue
« Reply #358 on: July 25, 2021, 02:53:13 pm »
Quote
You need to align the signal and the clock somehow. BrianHG use phase shift in his clock to do so. Delaying the signal with IODELAY is another method.

PLL dynamic phase shift approach by @BrianHG could only align the incoming DQS strobe, what about all 8 bits of DQ signal ?

Incoming DQ and DQS are in phase with each other.
 
The following users thanked this post: promach

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #359 on: July 25, 2021, 05:04:45 pm »
I know both incoming DQS and DQ are in phase with each other, at least in most of the ordinary situations.

However, what I am asking is in the case of DQS centering, how exactly the PLL dynamic phase shift mechanism helps to align BOTH DQS and DQ signals without using any delay elements on DQ signals ?
« Last Edit: July 25, 2021, 05:06:55 pm by promach »
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #360 on: July 26, 2021, 11:54:20 am »
in https://www.xilinx.com/support/documentation/user_guides/ug382.pdf#page=65 , there is clock with 90 degrees phase shift already being generated internally.

I am really confused as in how the DQS centering for read operation works with PLL dynamic phase shift.

 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #361 on: July 27, 2021, 11:25:27 am »
Quote
the tunable PLL output is used for read sampling
The tunable PLL output clock goes to the 'input clock' for the DQ & DQS DDR input buffers and subsequent read data FIFO's input clock.
Remember, when reading data, the DQS is in perfect sync with the read DQ.

using tuneable PLL output (@BrianHG approach) for read sampling is not as robust as the conventional DQS centering achieved by using IODELAY2 delay primitive.

Please correct me if wrong.
« Last Edit: July 27, 2021, 11:27:41 am by promach »
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #362 on: July 27, 2021, 04:51:02 pm »
Litedram code author told me that for spartan-6 platform, he uses fixed bitslip/delay which required manual adjustments by the user.

The reason for doing so is due to that IODELAY2 primitive on Spartan-6 platform has internal hardware issues

I think I could only use PLL dynamic phase shift feature.
« Last Edit: July 27, 2021, 05:03:54 pm by promach »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7727
  • Country: ca
Re: DDR3 initialization sequence issue
« Reply #363 on: July 27, 2021, 04:56:10 pm »
From what I read so far...

IODELAY should be used to tune out trace length differences on the PCB of each individual pin.  Not adjust read phase position.

Using just a tunable PLL output will only give you 16-32 steps per 360 degrees per clock.
Using a tunable DCM attached to your main PLL output will give you the enhanced 256 steps per 360 degree phase.

There is a good advantage of higher end frequencies and quality if you use the DQS as a clock input signal instead of using a tuned PLL output as a read sample clock.  However, this will rely on use of the IODELAY if the FPGA cannot accept that the DQS coming in toggles at the same time as the data.  Using this feature will lower compatibility between FPGA types, or, you might need a far greater amount of code for each type of FPGA.  Older, cheaper, slower FPGA might not be able to handle this as the modern FPGAs designed to implement DDR3/DDR4 topologies.

My DDR3 controller has a 50MHz power-up processor handling the power-up tuning.  It is a sequencer where it is possible to add/remove code to support any type of buffer tuning during power-up.  Though, it does cycle the power-up commands at a maximum of ~25 million a second instead of the 1 command per every ~4 DDR3_CK clocks.  This just alleviates fitting & routing load on a variable power-up program having to always run at full tilt DDR_CK frequency so I need not worry about how complex I need to make that part of the firmware to measure calculate and set the IO buffer features if I ever wish to enhance support of higher end features.
« Last Edit: July 27, 2021, 05:13:37 pm by BrianHG »
 
The following users thanked this post: promach

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #364 on: July 27, 2021, 05:22:11 pm »
Quote
this will rely on use of the IODELAY if the FPGA cannot accept that the DQS coming in toggles at the same time as the data.

@BrianHg , I am confused with your quoted sentence. Is this "same time" keyword implying the need for IODELAY in the case of DQS centering ?
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7727
  • Country: ca
Re: DDR3 initialization sequence issue
« Reply #365 on: July 27, 2021, 05:35:40 pm »
I only know how this worked on older Altera fixed timing designs.

We used to specify the tSU and tH in from the DQS grouped to the DQ pins, with PCB timing trace length adjustments in the .sdc file and allow the compiler to setup the buffers for us during compile.

Altera also offers a DDR3/4 IO phy which auto wires all of the PLL and delay line clocking buffers for any DDR2/3/4 ram in one single neat little function package with a control port which allows us to real-time software dynamic tune the PHY's IO delays, or parameter fix them.  If I were to use, all I provide is the RAM type, the # of DQS / DQ groups and # of CK/command pins with clock frequencies and PCB trace lengths.  The instructions on usage is nothing like Xilinx.

I do not know how Xilinx implements their DQS clocked DQ input topology.
 
The following users thanked this post: promach

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #366 on: July 28, 2021, 01:56:52 am »
Quote
There is a good advantage of higher end frequencies and quality if you use the DQS as a clock input signal instead of using a tuned PLL output as a read sample clock.

Wait, I thought tuned PLL output could be phase-shifted to the middle/center of DQ bits ?  How would using DQS as clock signal be more superior compared to tuned PLL output ?
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7727
  • Country: ca
Re: DDR3 initialization sequence issue
« Reply #367 on: July 28, 2021, 02:36:45 am »
When reading data, the DQS is generated by the DDR3 ram chip, just like the DQ data.  This means that there any timing and jitter noise generated by the FPGA clock output, pcb traces, and the ram's CK input plus it's PLL clock and the speed of the IOs and the effect of both IC temperatures will be there in the DQS and DQ alike.  If you are clocking the data in from the local PLL clock, all these errors may not be there as the local signal may be super clean by comparison.

Now, if you sample the DQ data using the DQS without a PLL, just using the DQS as a dedicated clock with the FPGA's special routed hard wired path, these accumulated timing errors will better match the DQ data.  However, this can only be accurately done because the FPGA has special IO zones where you must use the dedicated DQS inputs paired with it's 8 dedicated hard wired DQ IO input pins to get this enhanced timing error following performance as only each of the 8 IO matched with their dedicated 2 DQS pins have this special direct clock routing on the FPGA silicon achieving the tight timing needed to pull off the high end 1-2 GSPS route with a predicted minimum delay.  This cannot work with global clocks through PLLs as the DQS is not a continuous clock and you don't want to clean up that clock as you want that timing noise within the DQS to match the timing noise within the DQ data being read to further guaranteed a correct read.
 
The following users thanked this post: promach

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #368 on: July 28, 2021, 12:21:08 pm »
As for PLL dynamic phase shift approach, I have few questions:

1. Could I actually generate a 90 degree phase-shifted clock from CLK_OUT2 using DCM_SP Settings ?
    Could I actually generate a 270 degree phase shifted clock from CLK_OUT4 using DCM_SP Settings ?

2. What about PLL_BASE Settings which seems to have the phase shift capability as well ?





« Last Edit: July 28, 2021, 12:32:44 pm by promach »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7727
  • Country: ca
Re: DDR3 initialization sequence issue
« Reply #369 on: July 28, 2021, 07:09:08 pm »
I do not know about Xilinx specifics, but, remember in earlier discussion, if you make a clk_90 degree output, remember that a 270 degree is actually the same as !clk_90.

Also, according to the Xilinx data sheet, if you make a PLL output with clk_0, clk_90, clk_270, you can tie any of these outputs to one or two of that PLL's DCM modules and use that DCM to output 1 tunable 256 step phase from the clock you are receiving from the PLL.  Also, the output from that DCM inverted will be 180 degree out phase compared to where you tuned.

Presetting and using multiple outputs from each DCM, or trying to get the DCM to do a frequency conversion will probably loose it's ability to be software tuned in real time.  Since you get 2 DCM per PLL, and the PLL itself is also tunable, only with far fewer steps, your options should be almost boundless.  This is why the main PLL should first be used for the main frequency conversion and system clock generation unless you are doing something special.

 
The following users thanked this post: promach

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #370 on: July 29, 2021, 02:10:16 am »
Quote
use that DCM to output 1 tunable 256 step phase from the clock you are receiving from the PLL.

cool, but how to phase-shift to the middle of DQ bit once the first DQ bit transition edge is detected ?

Besides, I suppose the transition edge detection should not be done using FPGA fabric ?


When I simulate my design with DCM, I have Warning : Input Clock Period Jitter on instance test_ddr3_memory_controller.ddr3_control.pll_ddr.dcm_sp_inst exceeds 1.000 ns. Locked CLKIN Period = 0.822. Current CLKIN Period = 0.822. ?

Why PLL DCM could not be locked ?





« Last Edit: July 30, 2021, 04:12:35 am by promach »
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #371 on: August 01, 2021, 03:22:20 am »
Why ck_dynamic is having period of 0.822ns when it is stated to be of 333MHz frequency?



 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #372 on: August 06, 2021, 01:53:23 pm »
I have solved the locked issue above.

Now, I have this Warning : Please wait for PSDONE signal before adjusting the Phase Shift issue.  Why ?

« Last Edit: August 06, 2021, 02:26:59 pm by promach »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7727
  • Country: ca
Re: DDR3 initialization sequence issue
« Reply #373 on: August 06, 2021, 06:54:13 pm »
I have solved the locked issue above.

Now, I have this Warning : Please wait for PSDONE signal before adjusting the Phase Shift issue.  Why ?



Phase shifting is like changing the PLL settings, so you need to wait for the new PLL lock.
Though, the step is so small, the PLL moves smoothly.

This takes a few clock cycles with Altera PLLs as well.
 
The following users thanked this post: promach

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: DDR3 initialization sequence issue
« Reply #374 on: August 07, 2021, 12:08:38 am »
Quote
Phase shifting is like changing the PLL settings, so you need to wait for the new PLL lock.
Though, the step is so small, the PLL moves smoothly.

This takes a few clock cycles with Altera PLLs as well.

Xilinx requires only a single clock cycle to wait for the new PLL lock

However, ck_dynamic waveform is not really locked to udqs_r even though lock_dynamic is asserted high.  WHY ?

« Last Edit: August 07, 2021, 12:15:32 am by promach »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf