I am playing with a design which replicates an old 8-bit microprocessor and 64kByte of RAM on an FPGA. I'd like this to run as fast as possible, preferaby at 100 MHz.
I have currently implemented this on a Spartan-6 (LX9 size, -3 speed grade). Unfortunately the access times to the on-board Block RAM, and specifically the network path delays, limit the clock rate to approx. 70 MHz: While the CPU core is small, all RAM blocks on the chip are required for the 64kByte RAM. That results in painfully long signal paths for both, the address and data bus, at least to the "outer" RAM blocks.
Since a CPU cycle requires the data to travel from RAM to CPU, be processed to determine the new address (among other things), and then the address to travel back to the RAM in preparation of the next cycle, the path delays enter the cycle time twice. They easily add up to 5 ns, eating up half of my target cycle time.
Is there an FPGA family in roughly the same price and performance class as the Spartan-6 which has larger, more "centralized" RAM blocks on-chip? Thanks!
(Any other suggestions on how things could be sped up are appreciated as well, of course! I have thought about caching, but that gets unwieldy very quickly...)