I would also add that current processors already operate at an energy density roughly equal to the surface of the sun, and the biggest issue is to cool this die and get the heat out evenly across the surface, but still have a good insulator to keep the leakage curent to the thermal solution well below what is going to slow the rise and fall times of the signals. Another issue is the speed of light, while the 5GHz might go 6cm in a cycle, it really is only going to be sampled on the rising edge ( or falling edge depending on the logic in the particular part of the die) and will also be only available on the other edge on the origination side. Thus, with C in silicon being a crap load slower than in a vacuum, you are limited to under 1cm of total trace length with polysilicon, and slightly more with a copper interconnect ( thus the IBM patents for copper on silicon interconnects that garnered them so much revenue over the years) so that you can actually have a reliable data transfer of data, and you need to have all traces matched lengthwise and all will be transmission lines with a defined impedance as well.
Thus you see things like clock generation and distribution having most of the silicon die space, and that there are more data lines snaking through than logic, along with doing things in stateless logic and clockless logic whenever possible so that you can shave a tiny bit of time off them, along with shutting off whole swathes of the die between instruction cycles so as to save power lost in the really leaky transistors in there.
Pretty much all at the limits of current processes, and adding extra cores and getting the programmers to make software that will use them as efficiently as possible is the way forward. Thus you see graphics cards with essentially 1000 Z80 processors on board, simple and small, but with very fast memory and really fast sharing of data between them, so that you can do a lot of things in parallel when you need to, so they are more capable than the individual processor core itself. Kind of like an ant nest, where each individual is small and seems unable to do much, but in a collective they are capable of doing massive tasks in a short time, without much coordination other than simple semaphores and limited data transfer.