I agree with everyone above me who said it depends on the data requirements. I assuming for a moment that the data may be transferred as bursts and is not continuous.
So, to keep things simple I would use a circular daisy chain with in/out streams on each FPGA. Each stream comes with 4 data, a clock and a data valid signal. Total 6 wires per stream and 12pin per FPGA. The data is clocked as DDR, this is OK for 50+MB/S on the same board (and depending on the FPGA you could probably have 100MB/S as well).
On the FPGA you just implement a BRAM as fifo in-out and also tap the data to another fifo - in case the data is addressing your FPGA. The protocol is very simple, first byte is the FPGA address or opcode description which would tell the FPGA if the data should be tapped.
In case you reach a full circle you simply discard the data.
In case you need to send new data on the line you first have to wait until the in-out fifo is empty to certain threshold - in order to be able to receive a new packet in case one arrives while you transmit your data. This creates a relationship between your fifo size and maximum packet size.
I would prefer this because of the simplified PCB design and further it doesn't require expensive FPGAs. And you could also implement a data logger who just listens on the line and routes the data to another medium such as ethernet or usb.
If you need help with this please feel free to reach out, we provide
electronic engineering services including FPGA.
(I hope it ok to publish here, if not I'll remove this)
Best of luck
Guy