I like what I see. Here are some points you may find helpful:
1. Consider moving from Cyclone II to IV E. It is easier to close timing with the newer devices and they are supported by current Quartus
2. I spent last year developing a USB 3.0 IP core. Having used both the Beagle 5000 and Lecroy M3x, M3i, both have frustrating/inadequate interfaces.
Total Phase wins in the usability and friendliness department. But it's not possible to see actual 8b/10b data or exact bus timing. Also there are some severe bugs related to packet ordering. Especially when malformed data is sent, the entire thing blows up. Sometimes crashes every couple hours.
Lecroy wins in the depth and extensiveness, you can see each symbol going across the link, both scrambled and not, it's smart enough to detect LFSR desync, etc. However the packet/link view is a complete trainwreck. I can't look at it for more than 10 seconds before my eyes start bleeding.
Also, something that is absolutely mandatory for any dual simplex link: Separate up/downstream data views!!! It jsut doesn't work to shove both directions' data onto the same giant list. Relational timing information is lost, making my job of debugging even worse.
3. Ditch discrete SDR ram and move to discrete DDR2. First off you will get double throughput, also it sure is a bunch cheaper. Also, micron have quit making SDR! They recently did a die shrink on all their other rams and sold some old tooling to Alliance. You can use ALTMEMPHY still. Use narrower rams and gang them up to present as one bus to the FPGA.
I.e. use four x8 rams for a 32-bit wide DQ bus. You would not even need a chip select.
Also Alliance have started making some nice LPDDRs, though I'm not sure if they will work directly with Altera's controller (consider writing your own)
I'm not sure how DDR2 sodimms are priced these days, but they are nowhere near as cheap as they were, and quickly approaching dinosaur status. Unfortunately to use DDR3 SODIMMs requires write leveling (per-DQ pin DLL) which is only available in later Arria and Stratix. Meanwhile all 7 series Xilinx devices support it. I like altera but that's another option. Lattice ECP3 also supports it.
Also maybe consider Gig-E for uplink to PC. It will be cheaper than a FX3.
Also you may find interesting: The Beagle 5000 has a Stratix III with dual TUSB1310A phys (they didn't use the fpga serdes) while the M3x contains a Stratix IV (I believe) with its onboard serdes.
I have interior pictures of these units if you're curious.