I should have realized the lack of a 16th address bit means I can only access half of the actual depth (0-256 of 512M), so there's no benefit to using the DDR3-x8 ICs. I considered using DDR3L, but minimizing voltage bus count feels more useful than reducing cost/power. On this board, at least.
Make sure your 1.5 V rail can provide enough current - DDR3 can have huge transient currents (in the order of few Amps!), and if you PMIC (or power source) isn't quite up to task, you could get hard-to-debug issues caused by 1V5 rail going places. Also Xilinx recommends using termination to Vtt, so you will need tracking regulator for that. Read Ch.5 of UG933 very carefully and do as they say. Unless you are willing to experiment (read - got money to burn on possible board respins) that is.
I'm not even at the stage where I can work on the layout yet - still have to get an Ethernet PHY into the schematic. I may livestream the layout towards the end of this week if anyone's interested.
I doubt I can make it on the livestream (as my spare time is always sketchy), but would love to watch full-length recording at my own schedule. Don't mind 5+ hrs long YT videos at all.
But like I said, I seriously doubt you would be able to make it work on 4 layers. It least I couldn't make even much simpler Artix-7 + DDR3L x8 work on 4 layers without violating DDR3 layout requirements.
My first 6-layer boards are now on their way to me - looking forward to get my hands on them
