Unless ASICs that does this out of the box can be found, FPGA+SDRAM probably would be the best choice (especially if latency is a concern). As of today, there are FPGA+SDRAM combinations available under $20 on Digikey (single unit price). Whether your algorithm can fit in to the said FPGA is another question which will need some actual testing. Don't forget the cost of power supply. FPGAs usually need multiple voltages to run properly.