FPGAs are now falling way behind in terms of progress compared to everything else. 7 series, ultrascale, and ultrascale+ are all based on the Virtex 6 architecture that came out like 20 years ago, with very few changes. Process nodes are also old - 7 series is 28nm, ultrascale/ultrascale+ are newer but priced out of most markets. You can get a latest 7nm GPU with HBM for under $1000. A 7nm FPGA + HBM dev board costs as much as a house, and to get it you have to sign a bunch of paperwork (export restrictions).
Last time I did some benchmarks and rough calculations on performance per watt of doing large FFTs (for example for prime95), and comparing with published GPU numbers I inferred that a cluster of Zynqs can match the radeon vii (the highest ranked GPU in terms of prime95 performance per watt) in both power efficiency and cost. If a 28nm FPGA with external DDR3 can match the efficiency of the latest HBM GPU, FPGAs on newer processes + HBM would be able to do far better.
However the sorry state of affairs in FPGA progress, partly due to Altera getting acquired and killed leaving no competition for Xilinx, means we likely won't see FPGAs on proper process nodes becoming accessible any time soon.