When using the dedicated DQS clock circuitry inside the FPGA, each DQS group has a hard-wired 8 DQ pins which are paired with it. Same with the DDR3 ram chip, there are 8 data bits for each each DQS pair. When wiring and programing your DDR3 controller, you need to match each 8 bit DQ group with it's connected DQS since only those DQ's are wired to that 1 DQS input inside the FPGA. This is where they get the ability to shift adjust timing specifically for each group of 8 data bits without having a PLL with another set dedicated tuned output for every 8 bits.
In my design, I do not derive the read DQ clock from the DQS inputs with it's dedicated wiring to the shared 8 DQ pins. My DQ read clock is generated by the FPGA PLL instead, so, as long as all your inputs on your FPGA can be clocked by the PLL instead of the DQS inputs, any DDR input may be used with my controller. The only caveat is that all your DQ and DQS wiring lengths to the DDR3 memory need to be fairly closely matched. Otherwise, for every 8 bits, I would need a separately phase tuned PLL output where in a 64 bit system, this would mean 8 pll outputs for 8 read clocks as well as another 8 pll outputs for 8 write clocks.
The simplicity of my system using exclusive DDR in and DDR out buffers means easy cross vendor and cross FPGA type compatibility with the same code except for the DDR buffers and the tuning size step depending on pll precision.