Use a clock distribution chip (like an HMC7043 or HMC7044), each clock output has a programmable delay block that can be used to compensate for PCB trace length differences. The sync outputs can be used for the I/O update signals. I had done this once with AD9857sAD9517-3s feeding two AD9854s (fclk 160 MHz) and used length matching on the IO update and clock traces, and didn't really need any further matching using the programmable delays.
edit: fixed the clock dist part