I'll definitely consider doing that for future developments.
Anyway, I managed to get something working on a Spartan-6 dev board with LPDDR RAM (512Mb, 16-bit). I wrote a test in which I first write a 4MBytes block (sequential) at the maximum rate I can (using the max 64 burst length and 32-bit port width), then read back the same block also as fast as possible and checking each read word. And then back to block write, etc. Works fine. There is no concurrent read or write during each phase.
The LPDDR chip is rated for 166MHz max, I used 150MHz for this test. I measured the total write time and read time (toggling I/Os). I get approx. 80% of the max. bandwidth (which would be 600MBytes/s) for the block write, and 70% for the block read. Not bad, but I don't know if one can get better, and I don't know whether it's normal that reading is a bit slower than writing. Granted I'm already happy to get 480MBytes/s write and 430MBytes/s read at my first try.
What's your experience with max. throughput using DDR? Are there more favorable settings, such as maybe limiting the burst length (as unintuitive as it may sound) or using wider user port width (64 or 128-bit)? Does LPDDR have worst performance than say classic DDR2 at the same clock speed?
Last thought is that the sligthly degraded performance could also come from the signal routing, as there is a calibration phase. The board is a Numato Saturn V3. The routing doesn't seem too bad, but it's a small board and it looks pretty crowded.