Even with the efficient bank rotation, there is a small penalty, even within the same bank, for goring from read to write of 1-2CLK, while there is a huge one going from a write to a read. Your best bet is to make the DDR2 2x-3x as fast as you need with simple smart bank management, setting up that smaller internal FPGA FIFO to smooth out those breaks.
Another solution is if your FIFO always has a huge difference between in and out, you may choose the position of your banking so that you are always reading from a set of banks while writing in another. Hence, 2 important wildly different sections of rows activated at the same time.
Refresh will always kick you in the ass as you must -precharge all banks- first, then refresh, then back to the activate row...