Regarding x86: it manages by a heaping mess of taking things apart and putting them back together. AFAIK, all high power processors today utilize microinstructions, the instruction pipeline and decoder serving to transform complex opcodes into sequences of simple operations, while avoiding conflicts (pipelining and out-of-order dependencies).
Deep pipelines, dependency analysis, and good branch prediction, all combine to give a total execution capacity over 2 instructions/cycle per core (or whatever the standard is, these days; I want to say 2/cy was more typical of, like, Pentium III days??).
The absolute propagation delay of a given instruction might be quite large (20-40 cycles?), and not much faster than modern non-pipelined RISC machines (e.g., AVR, Cortex M0?), but because all those delays are hidden behind caches and operating system management, you have no way to actually tell, or care, how long a given instruction takes. So the system works.
Think I heard the claim that x86 overhead takes a startling percentage of power, like, the Intel Atom / AMD Geode takes an extra 20% for paying the price of running x86, as compared to low power [native-]RISC machines (ARM A series, PPC, etc?). Whether that makes an ultimate difference in an end product is a very high level systems engineering question, really hard to make any hard statement on the low level hardware, even if such a claim happens to be true.
Tim