Here's a 6.15x6.5 4-layer board I did a while ago for a discrete Z80-reimplementation. This is a 4-bit bit slice of the register file. Every DIP in the picture is for a CD4000-family chip
Also, every single pin on the card connector is used up, with just the minimal number of GND and VCC repeats needed for signal integrity. The capacitors are surface-mount 1206 on the bottom side. Through-hole capacitors waste a lot of real estate on such tight boards. Top layer mostly has vertical traces, bottom layer mostly horizontal traces. The internal planes are GND and VCC.
Since this was a single-cycle-per-instruction Harvard implementation, with no pipeline, the register file had to have multiple read and write ports according to what the instruction set needed. This thing can run at about 4MHz from 12.5V supply. 4MIPS. 18 registers in total, with alternate register file included. The register file is 4-slices wide, so the data paths are all 16-bits. That was the only way to do 4MIPS with 70s discrete CMOS logic. Data and program memories also had multiple ports... I'm glad this is over, I got in way deeper than I thought it'd take.
To get the ALU fast enough, only two bits fit on a card of the same size. All of it was implemented using 4066 muxes for logic and some 4051s as complementary output buffers to keep the muxes fed with strong inputs
4 slice cards for registers, 8 slices for ALU, 8 slices for code memory and instruction decoder, another 4 slices for data memory, 6 unique cards for the controller, external bus interface, etc. No time to publish the whole thing yet...