Yeah, as the OP stated, they want that on FPGAs. Any decent FPU will inevitably be heavily pipelined to reach even moderate clock frequencies on FPGAs, and thus have significant latency (but throughput can be good - although depending on the FPU structure, you may be able to get high throughput only on a series of similar operations, like a series of multiplies, and not on just a series of random operations such as a series of multiply, then divide, then add, then .... for the latter, you'll need a more involved structure taking up significant resources.)