There are some really good videos on YT for that, check them out if you have some spare time. Basically the way this works is that you define an interface boundary which is both logical AND physical, and then you design a bunch of interchangeable modules which all have the same logical and physical layout at the boundary. Xilinx themselves used this trick to work around PCIE boot time limitations - basically the problem is that PCI Express specification demands that an endpoint device becomes responsive to a bus transactions within a fixed timeout after power up, and larger FPGAs just could not transfer bitstream fast enough from the storage, much less to actually boot in that time, so what they came up with is booting a super-barebones bitstream which is just enough to make a PCIE bus functional, and then booting the actual application bitstream either from another source, or over PCIE via driver - kinda like what most modern WiFi/BT chipsets are doing. They call it "Tandem Configuration". Here is a quicktake video about that: