IBM made monster multi-die assemblies for a long time. Density is limited by interconnects: only so many chips can be on a single ceramic substrate, which needs a massive heatsink on one side and a massive multipin connector on the other. You could put a few of these side by side, but eventually need to connect to power supplies, bus interfaces, peripherals, etc. so that space tends to form the core/backbone of the machine, and everything clusters around it, including all the CPUs integrated to it.
By "limited" here, I mean that it scales poorly: if you want a bigger machine, it's going to take more interconnections than CPUs, as the number goes up; it doesn't grow linearly, but as a power law. What power, depends on how tightly integrated you want the cores to be -- if any one should have immediate communication to any other, there's O(N^2) connections (edges in the graph).
Area of silicon chips is limited by process yield and thermal expansion. Even with reasonably well matched expansion rates (special ceramics and alloys), it also needs a low enough defect rate that much of the area is usable -- multicore CPUs have enough that they may disable whole cores that are defective, selling the chip as a lower cost variant rather than scrapping it. That limit is around an inch, and will likely remain so for a long time. Thus we should expect computing power grows ultimately by vertical stacking (3D chips; we already have modest height stacks in multi-die assemblies, often with thru-silicon vias (TSV), often for integrating RAM, Flash and CPU together in one SoC), or laterally by integrating multiple cores/machines (which has also been true for a long time, supercomputers are cluster based).
Tim