That timing diagram looks a little odd to me - there is eight data values per FPGA clock cycle, not four.
Not sure what the question is (if any), but you will need to use SERDES to separate out the four samples.
See attached diagram for what you need to construct for each bit (the PLL gets shared by all bits).
You will need to get very friendly with:
- the simulator (to check it works before trying in H/W)
- The clocking resources on your FPGA - you have to write your design to map onto the FPGA's dedicated structures. To not do so will result in failure.
- Documentation for the SERDES block on your FPGA - they are simple to use, but have so many options, and it can be tricky to reset them correctly.
The other option is to clock the edge of the design twice as fast, and then use the much simpler DDR registers to capture the data.
This might be of use (even though Ultrascale I/O hardware is different):
https://github.com/hamsternz/Artix-7-HDMI-processing/blob/master/src/deserialiser_1_to_10.vhd