I mean that on the core ck_0, I have the:
{wd_h, wd_l, wd_oe, wm_h, wm_l} ready every clock.
on every ck_90, I have:
before_DDR_BUFFERS [4] <= {wd_h, wd_l, wd_oe, wm_h, wm_l};
before_DDR_BUFFERS [3] <= before_DDR_BUFFERS [4] ;
before_DDR_BUFFERS [2] <= before_DDR_BUFFERS [3] ;
before_DDR_BUFFERS [1] <= before_DDR_BUFFERS [2] ;
before_DDR_BUFFERS [0] <= before_DDR_BUFFERS [1] ;
DDR_BUFFERS_OUT <= before_DDR_BUFFERS [0] ;
This adds a serial chain of 6 clocks inside the ck_90 domain allowing the fitter to skew the clock timing of each step in that chain to allow error free data to shift it's ck_0 phase to ck_90 phase before it reaches the IO buffer's input DFF. With a chain this size, this also means the source data {wd_h, wd_l, wd_oe, wm_h, wm_l} needs to be ready 6 clocks in advance.
Obviously, DDR_BUFFERS_OUT would be the DQ IODDR receiving the {wd_h, wd_l} as data input and the {wd_oe} drives the OE for those DQ buffers while the DM DDROUT receives the {wm_h, wm_l}.
See my source code 'BrianHG_DDR3_IO_PORT_ALTERA.sv' V1.00 beginning at line 585 until the end of the code. You will also see I have a parameter option WDQ_CLK_270 to use a 270 degree write clock in place of the 90 degree which needs to swap the _h & _l, plus advance the _l by 1 write clock.