| Electronics > Projects, Designs, and Technical Stuff |
| CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field... |
| << < (4/6) > >> |
| T3sl4co1l:
I wonder if there is any interest in making highly serialized, simplified cores. Namely: strip out some of the complicated logic (floating point? SIMD? simpler memory addressing models?) to push the clock rate higher. The assumption being higher clock speeds give higher performance in frequently-branching (difficult-to-predict) code. Or, higher order prediction and speculative execution units. *shrug*, it may be there isn't any way to optimize such problems (namely, those that are strongly locked by Amdahl's law), and the best approach can only be a special-ish-purpose ASIC. (Some crypto algorithms do this intentionally.) The hope would be that something a bit more general (a CPU) would be capable of doing other Amdahl-locked problems faster. Programs that are heavily serial in logic and light in arithmetic, even if they could be parallelized properly. Idunno. They're probably already doing as well as they can, considering how much of today's programs fit this bill. Tim |
| Miyuki:
--- Quote from: T3sl4co1l on December 15, 2018, 06:21:24 pm ---I wonder if there is any interest in making highly serialized, simplified cores. Namely: strip out some of the complicated logic (floating point? SIMD? simpler memory addressing models?) to push the clock rate higher. The assumption being higher clock speeds give higher performance in frequently-branching (difficult-to-predict) code. Or, higher order prediction and speculative execution units. --- End quote --- I think silicon frequency limit is already reached at that 5-6Ghz where specialized processor are run now (and efficient point speed is about 2-3GHz ) These processors have extremely long pipeline, you cannot make instruction processing to any smaller steps to get higher clock Logic speed not changing if you dont use some extreme cooling or go to new material Wonder how much development we are away from SiC based processors or things like GaAs |
| donmr:
On newer IC processes the R, L, and C of the interconnect limits the speed more than the transistors. |
| David Hess:
--- Quote from: T3sl4co1l on December 15, 2018, 06:21:24 pm ---I wonder if there is any interest in making highly serialized, simplified cores. Namely: strip out some of the complicated logic (floating point? SIMD? simpler memory addressing models?) to push the clock rate higher. The assumption being higher clock speeds give higher performance in frequently-branching (difficult-to-predict) code. Or, higher order prediction and speculative execution units. --- End quote --- This is linked to *why* the highest clock rate processors are also the most complex. The biggest limit is the low level cache load-to-use latency. In the simplest processor, a load instruction returns a result from cache (or memory) for use in the next clock cycle so the load-to-use latency is 1. If the pipeline is clocked faster, and some microcontrollers support this, then a wait state is generated because the memory does not return a result quickly enough. So for higher clock rates, the pipeline is altered to support longer load-to-use latencies from the cache. This is where out-of-order execution is necessary to achieve the highest clock rates. Instead of inserting wait states, the pipeline now extracts instruction level parallelism to hide the latency to memory by executing other instructions while it waits. I think Intel's Core2 processors are currently up to a 4 cycle load-to-use latency but at the beginning they were only 3. Intel also made a similar change with the Pentium 4 Prescott which allowed for a larger but slower cache at higher clock rates but performance results were mixed. As I recall, one of IBM's later PowerPC processors achieved a 5+ GHz clock rate despite unusually lowering the load-to-use latency to 2 clock cycles but the compromises to achieve this lowered performance in other ways; cache size was limited by this. In the fastest processors, the register bank is not even fast enough to keep up with the pipeline so a register forwarding network holds results close to the pipeline for immediate use. |
| Berni:
You could probably design a CPU exclusively for clock speed that would be able to run at 10GHz. The problem is that such a CPU would probably have a simple architecture and have a crappy instruction set that supports only the simple kind of operations or takes many clock cycles for any of the more complex instructions. The result is a high clocked chip that is actually not very powerful at all because it can't do all that much work per cycle. You only need a few simple instructions to create a truing complete computer, but the simpler your instruction set it the more instructions are needed to complete a given task. The x86 architecture has taken the path of making more and more powerful instructions so it could do more work per clock cycle. If you compare a latest generation Intel i9 running at 1GHz with any ARM CPU running at 1GHz you will find that the x86 chip is still way faster at completing a given real world task. |
| Navigation |
| Message Index |
| Next page |
| Previous page |