Author Topic: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...  (Read 4613 times)

0 Members and 1 Guest are viewing this topic.

Offline AgentTopic starter

  • Contributor
  • Posts: 14
  • Country: us
I’m writing a paper about the stagnation of CPU clock speed increases. In the late 90’s and early 2000’s we were seeing insane clock speed increases. In late 2004 the Pentium 4 570 was launched at 3.8 GHz. From what I have found, the fastest manufacturer rated base clock is 4.7 GHz with the AMD FX-9590. I’m ignoring turbo and only looking at single core performance. I have picked a few CPUs from the end of each decade and started with the intel 8088.

What I am trying to do is compare theoretical clock speeds with actually achieved clock speeds. As I understand it, cutoff frequency is a function of electron mobility, electric field, and channel length. I realize that equation 1 is for the ideal case and in an actual MOSFET, the effect of parasitic capacitance will substantially reduce the cutoff frequency. For reference the channel length of the 8088 is 3 um and it topped out at 10 MHz.


Questions:
1.   Is 400 cm^2/V-s reasonable for electron mobility?
2.   CPUs use both nMOS and pMOS. Hole mobility is much lower than electron mobility so we don’t care about nMOS, pMOS is the limiter. What is a reasonable hole mobility?
3.   What should I be plugging in for Vgs and Vt? (I think Vt should be around 0.3V)
4.   How do I calculate the electric field in a CPU MOSFET?
5.   If a CPU lists Vccmax as 1.55 V that is the highest voltage in the CPU. How can I figure out what gate, drain, and source voltages are?

Basically I want to go from 1980 to 2004 when clock speed actually scaled and come up with a reasonable approximation of calculated vs. actual clock speed based on transistor physics.


If I ignore Vgs and Vt and just plug in thermal voltage, 0.0259, it actually looks pretty reasonable until the gate length shrinks to 130 nm. Any help would be appreciated. Thanks!
 

Online T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21675
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #1 on: December 11, 2018, 05:00:58 pm »
Electron mobility is higher than that, but mind it also depends on doping density, and density has been getting higher and higher as feature scales get smaller (they famously claim transistors have about one atom of dopant in them nowadays, which if you consider the volume of such a transistor, is actually an incredibly high doping rate for silicon, IIRC).

There are special effects for short channels that I don't know about, and there's also the contribution from SOI where (usually) an oxygen layer is implanted beneath the transistor, and annealed to grow a solid oxide (SiO2 glass) layer; the reduction in leakage current and capacitance provides a big kick.  And everything else they've done: copper interconnects, high-K gate oxide, etc. etc.

Some of these aren't new -- the RCA 1802 was SOI (epitaxy on sapphire, wasn't it?), and performed well for its day (its strange architecture notwithstanding!), as well as being rad-hard.

May also be worth including some "honorable mentions", like Cray's GaAs design, which belched out so much heat it had to be cooled with Freon (unless I'm mixing up various things here, which may be :P ).  Sure it was fast (GaAs NMOS has very nice mobility), but it was "Class A" because there was no complement (GaAs hole mobility is pitiful, better to use resistors(!)).

Similarly there were quite fast machines in the 60s and 70s, thanks to ECL -- but they were hard to design and build (CNC wire-wrap machines were a thing, and not just for single wires, but for twisted pair as well -- a hard requirement for ECL signal quality!), and very, very hot.  VLSI CMOS quickly surpassed these through the 70s and 80s.  Since you're concentrating on CMOS, maybe this isn't necessary.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline AgentTopic starter

  • Contributor
  • Posts: 14
  • Country: us
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #2 on: December 11, 2018, 07:41:27 pm »
I'm actually examining materials with higher mobility that can support higher electric fields without breakdown. I'm attempting to find some reasonable trend within the achieved Si clock speeds that can be applied to a new material. e.g. Silicon theoretically can reach X GHz, but in actuality it reaches Y GHz. New material theoretically can reach X GHz, using info from Si as a base it could probably reach Y GHz.

When people attempt to break world records for clock speed with liquid nitrogen are they using stock motherboards with no electrical modifications? I assume that MOBO manufacturers didn't have this in mind when they designed them. Which makes me think is it the CPU or motherboard that is limiting liquid nitrogen overclocking? It seems like when people go for records they have multiple CPUs on hand...so the CPU is the limiter?

What is actually happening with liquid nitrogen cooling? The processor is cooled down, therefore the power can be increased without over heating. Which means the voltage is higher, increasing the electric field, making the electrons travel faster, so the transistor can switch faster. Or does it have something to do with interconnects resistance being decreased, improving the RC time constant? Both?
 

Offline Kilrah

  • Supporter
  • ****
  • Posts: 1852
  • Country: ch
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #3 on: December 11, 2018, 09:14:52 pm »
When people attempt to break world records for clock speed with liquid nitrogen are they using stock motherboards with no electrical modifications? I assume that MOBO manufacturers didn't have this in mind when they designed them.
Yup, but there ARE mobos made for overclocking (manufacturers do care about that, gives them visibility plus they can sell related products that don't cost a lot more to make with comfortably inflated margins) and that's what they use. Overclocking contests usually care only about the highest core clock figure so that's the only thing that gets boosted like crazy, chipset/memory/peripherals stay close to their normal clocks. So all that really gets extra stress on the mobo is the power supply section, and that's seriously beefed up on these.
Don't think they go so far as to optimize the PCB traces specifically for higher clocks or binning the chipsets.

What is actually happening with liquid nitrogen cooling? The processor is cooled down, therefore the power can be increased without over heating. Which means the voltage is higher, increasing the electric field, making the electrons travel faster, so the transistor can switch faster. Or does it have something to do with interconnects resistance being decreased, improving the RC time constant? Both?
I'd say mostly the former, with a little bit of the latter. They push the voltage up A LOT.
« Last Edit: December 11, 2018, 09:19:58 pm by Kilrah »
 

Online T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21675
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #4 on: December 11, 2018, 10:43:51 pm »
When people attempt to break world records for clock speed with liquid nitrogen are they using stock motherboards with no electrical modifications? I assume that MOBO manufacturers didn't have this in mind when they designed them. Which makes me think is it the CPU or motherboard that is limiting liquid nitrogen overclocking? It seems like when people go for records they have multiple CPUs on hand...so the CPU is the limiter?

What is actually happening with liquid nitrogen cooling? The processor is cooled down, therefore the power can be increased without over heating. Which means the voltage is higher, increasing the electric field, making the electrons travel faster, so the transistor can switch faster. Or does it have something to do with interconnects resistance being decreased, improving the RC time constant? Both?

Yes, it's all about the silicon.

Rds(on) (or more directly, mobility) has a positive tempco.  Vgs(th) also has a negative tempco, so higher voltages are needed, to some extent.  Capacitance stays mostly... well, it has a tempco, but not as strong, AFAIK.

So, cool it, crank up the voltage, crank up the clock, crank up the voltage some more, crank a shitload of amps, dump the nitrogen in, and watch it boil away with 5GHz underneath or whatever.

What happens at the board level is pretty irrelevant.  FR-4 is a shitty material no matter how you cut it; board-level interfaces are already designed to accommodate this.  For example: LVDS and SSTL/GTL signalling to deal with noise; transmitters with preshoot to compensate for HF loss; receivers with hysteresis (schmitt trigger), hold-off (ignores bouncing after the received edge, but well before the next expected edge), voltage reference (differential in some way), automatic skew adjustment, clock recovery, etc.

All sorts of things that may be familiar from facility-level comms (1000BASE-T, say), at lower symbol rates (~100s megs), but these are exactly the kinds of lengths necessary to deal with proportionally shorter distances at proportionally higher symbol rates.  Crazy stuff. :)

I don't know that they're having to go crazy with signalling within a single chip (yet?), but the RC transmission line delay there is even worse, so it has to be buffered just right, at least.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 16614
  • Country: us
  • DavidH
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #5 on: December 12, 2018, 04:39:04 am »
For last several generations clock speed has been limited more by power density.  If you compare chip area versus power over several generations, maximum power density has remained roughly constant while chip size has decreased.

Hot spots in dense and fast logic like ALUs is also a major limitation resulting in behaviors like reducing clock speed when wide vector instructions are used.

This is all reflected in design priorities which changed after the Pentium 4 fiasco to require new features to provide at least a minimum performance gain for a given increase in power consumption.  Anything lower would actually reduce performance because of the added power burden.

This also points out why techniques like eager execution and scout threads were discontinued.  The added work increases the power budget so much that performance would actually be worse.  Predictive execution at least does work which was likely to be needed anyway.
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4951
  • Country: si
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #6 on: December 12, 2018, 07:02:24 am »
FET transistors can switch incredibly fast. It tends to be everything around it that limits the usable maximum clock speed. I managed to get big power MOSFETs into parasitic oscillations at 600MHz despite the things never being designed to operate above 10MHz.

When a logic gate switches it has to propagate its new state to everything it is connected to. This means it has to drive the capacitive load of the trace and the capacitive load of all the inputs of other gates it is driving. This slows down the propagation of the new state trough the chip. If the new state doesn't reach all the required places before the next clock cycle happens then the CPU could make a wrong decision and in some cases crash.

As you increase the core voltage you increase the available driving current provided that the internal resistance of the FETs stay the same. This charges up the parasitic capacitance faster and with such reduces the propagation delay. This is why overclocking often involves overvolting the CPUs core supply. While this does make it faster it also makes it run hotter. Another approach is to cool the chip down really cold to get better characteristics from the FETs and reduce the propagation delay that way. This goes hand in hand with overvolting because it means you can put even more voltage into it when you have this much cooling power to take away all that heat.

However its not just the transistors speed that is the full story.

The design of the CPU matters a lot. Designing with FPGAs gives you a lot of insight how the design really affects the max speed. Basically the more complex of a logical operation you want to do the more transistors are involved in building it physically, the longer of a chain of transistors that signal needs to pass trough the longer it takes for it to "compute" the answer. This gives rise to a trade off of how much work you can do per clock cycle to how fast you can clock it. For this very reason a CPU usually needs many clock cycles to execute a single instruction(Sometimes over 20 cycles) as each clock cycle just does part of the instruction and this keeps things faster and simpler. On old 8bit CPUs you can see how long it takes to do this. As chip designers got more transistors at there disposal they started getting smarter about this and reusing parts of logic that are not needed in the current instruction to execute the next instruction, some heavily used logic is duplicated, caches and branch prediction is added etc... And you get to a modern 64bit CPU that is executing pretty much 1 instruction per cycle (But its internally actually executing a small step in 20 instructions simultaneously). Once you have this many transistors then heat output does matter too and can be a limiting factor.

This is the reason why we don't see GPUs following the same trend of running stock at >4GHz (like CPUs do today). They are running the cores for the most part around 700MHz to 1.5 GHz. The architecture that GPUs use is fundamentally different even tho the silicon manufacturing processes are pretty much identical (Same speed FETs). These GPUs are optimized for parallel operation due to the nature of the workload. Large GPUs contain >1000 cores and each core is mostly a very fast floating point calculator that's not actually all that smart but is tiny and power efficent. So because of there different principle of operation it makes more sense to simply add more cores rather than push them faster. This gets them better performance per watt and that's what they want. Graphics cards are pretty much power limited as they already take >200W for the high end models and much more than that is not practical to keep cool in a PC.
 

Online T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21675
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #7 on: December 12, 2018, 05:26:14 pm »
Keeping in mind I don't know very much about GPU design --

GPUs are also heavily SIMD (single instruction multiple data), in other words vector based.  That's why the memory bus is so wide, to get the thruput necessary to do lots of relatively simple operations simultaneously (in practice, mostly linear algebra: vector and matrix arithmetic).

It used to be that there was one, or few, instruction decoders, and a shitton of ALUs and data buses.  You could write very high performance, non-branching shaders, or pitiful performance conditional-branching shaders that ran slower than on the CPU itself (which is better suited to branching, thanks to prediction and speculative execution).

(Well.  Going back really far, you had a pipeline controlled by registers, hidden by proprietary drivers, and no general computational resources as such.  I'm talking from, since... early-2000s I think?)

This is still true today, but there are more instruction decoders too.  (AFAIK,) the whole unit together counts as a core (whether it's NVidia's CUDA or whatever).  As long as you have lots of cores, and an "embarrassingly parallel" problem to solve, you can have all of them branch independently without much penalty (caches notwithstanding, because cache rules all).

But you'll still get that much better performance by harnessing the full power of each core, as well as all cores together.  This often means "wasting" arithmetic steps, in exchange for eliminating branches, or even loops (which can at least be unrolled to some extent).

A typical example for DSP, vector and GPU code:
Say you have an if statement:
Code: [Select]
if (cond)
var1 = expr1
else
var2 = expr2;
Instead of branching, it can be rewritten as:
Code: [Select]
var1 = var1 * (1 - cond) + cond * expr1;
var2 = var2 * cond + (1 - cond) * expr2;

because in C (this is a C example by the way), a logical (cond) evaluates to 0 or 1, and if that expression can be evaluated without a branch (e.g., a compare can be evaluated with a subtraction, sign-extend (high word = 0 or -1), then arithmetic negation), then it can be used at almost no penalty in the subsequent arithmetic.  Which is simply the sum of two terms, each masked by the condition or its complement.  It seems wasteful at first, but disturbing a deep pipeline is far more wasteful (well, for sufficiently simple expressions; at some point, you will end up saving time by branching between complex operations).

I have no idea if shader compilers know this.  It's a pretty obvious optimization on that kind of platform.  Certainly it can only be done if there are no side effects (you can always write bad code that's impossible to optimize; writing good code that the compiler can worth with, takes care).  Anyway, I just like that it's a different way of looking at the problem -- normally your instinct is to reduce instructions period, but the priority changes when you're on different platforms.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4951
  • Country: si
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #8 on: December 12, 2018, 05:54:16 pm »
Yep that's about how it works.

The reason that branches are so annoying is exactly because modern processors execute multiple instructions simultaneously(This is pipelining). Its common for a modern processor to be working on the next 10 instructions when the current instruction completes. If the instruction it just hit results in a branch this means those 10 instructions after it must not happen so it is forced to throw away all that work it did, flush its pipeline and slowly get up to speed by filling it up with the instructions from the branch location. Because code you tend to run on a PC uses a lot of branches meant that CPU vendors had to get around the problem somehow. As a result modern CPUs have branch prediction and speculative execution. This means that a CPU tries to take a good guess where the branch would go and prioritizes that path(Such as a loop that keeps going round will usually branch backwards.) while speculative execution still explores the other scenario a little bit in case branch prediction was wrong. Unfortunately this gave rise to some nasty security issues (Specte and Meltdown)

On the other hand GPUs don't tend to use these branching optimizations so a branch really messes up the processors internal pacing. But it does save a lot of valuable transistors that can be used to build even more cores to make it faster. Shader scripts can be written in ways that help mostly avoid branches.

That's not to say that CPUs don't benefit from careful manual code optimization. They have limited bandwidth for moving data trough caches so working on small chunks of data at a time can help speed them up by not waiting on memory as much. All of the fancy SIMD math instructions(SSE, AVX, AVX512 etc...) in CPUs are not very well used by compilers because to work well data needs to be arranged in memory just right. Massive speedups in math work can be achieved by manually optimizing for this. Things like fast optimized FFT libraries for x86 will use it trough inline assembler.

Computers have gotten a lot more complicated in the last 20 years. But also a lot more powerful, even tho the CPU clock speeds barely went anywhere.
 

Online T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21675
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #9 on: December 12, 2018, 05:59:33 pm »
Going back to the original topic a bit -- has there been any serious research (i.e., motivated by reduced power consumption, in fine-scale processes) regarding "lossless computation"?  Namely, AC synchronous logic that draws mostly reactive power (junction capacitance).

I wonder if the advantage is completely swamped by the dominant resistance in interconnects and such small transistors.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline AgentTopic starter

  • Contributor
  • Posts: 14
  • Country: us
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #10 on: December 15, 2018, 02:21:25 am »
Thanks for the help! Mathematically I couldn't find a good way to model clock speed based off just the physics. So instead I looked at the actual data from processors. Let's say clock speed is purely a function of gate length, electric field, and mobility. If I fix the electric field and mobility, then extending to 2004, my gate length is 60 times smaller. At 3000 um we got speeds of 10 MHz. So shorting the gate length would multiply the clock speed by 60 times. ( The data clearly doesn't show this. It's WAYYY better than that) If you look at the graph though, it's actually pretty close in the beginning. 1500 nm gate gives about twice the speed of 3000 nm. ( I guess the huge increase later on came from the switch to copper interconnects) So if everything scaled exactly with gate length at 50 nm we should have 600 MHz processors.(Which is a lot lower that actual processors at 50 nm) With this under estimate, it would then be fair to say that if the new material had 10x the mobility of silicon, it should be able to achieve 10x the speed, 6 GHz.
 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 16614
  • Country: us
  • DavidH
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #11 on: December 15, 2018, 03:53:11 am »
Going back to the original topic a bit -- has there been any serious research (i.e., motivated by reduced power consumption, in fine-scale processes) regarding "lossless computation"?  Namely, AC synchronous logic that draws mostly reactive power (junction capacitance).

I wonder if the advantage is completely swamped by the dominant resistance in interconnects and such small transistors.

I know I have read papers on it in the past decade.  The major problem appears to be the same as with asynchronous logic; there is a lack of simulation and development tools and the processes are only characterized for standard synchronous logic.
 

Online Miyuki

  • Frequent Contributor
  • **
  • Posts: 905
  • Country: cz
    • Me on youtube
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #12 on: December 15, 2018, 12:34:21 pm »
BTW for example IBM Power CPUs are made for high clock for expense of extreme power consumption
Even old Power 6 made on 65 nm was up to 5GHz (released at 2008)
More modern Power 8 on 22 nm have also 5GHz version

But power demand is extremely and are only used when need single core performance
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4951
  • Country: si
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #13 on: December 15, 2018, 01:00:41 pm »
Yeah the transistor size doesn't really have that much to do with max clock speeds.

For example take this CPU:
https://ark.intel.com/products/27510/Intel-Pentium-4-Processor-supporting-HT-Technology-4-00-GHz-2M-Cache-1066-MHz-FSB
Its a Pentium 4 from 2005 on a 90nm process running at 4GHz.

Turn it forwards to today we for example get this:
https://www.intel.com/content/www/us/en/products/processors/core/i9-processors/i9-9900k.html
This is the worlds fastest x86 CPU for single threaded loads. It still runs on a base clock of 4GHz despite being a 14nm technology (Okay it does boost up to 5GHz but still). However it does so on 8 cores while having a TDP lower than the Pentium 4 up there.

The transistor feature size is only one of the parameters that is getting optimized in new chip manufacturing processes. There are lots of other factors that get improved along the way but the transistor size is very important since given the example above going from 90nm to 14nm lets you put 6 times as many transistors on the same size of silicon. And this is where mostly the extra computing power was coming from in the last 10 years. More transistors doing more work per cycle at the same clock speeds.
 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 16614
  • Country: us
  • DavidH
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #14 on: December 15, 2018, 02:55:18 pm »
https://www.intel.com/content/www/us/en/products/processors/core/i9-processors/i9-9900k.html
This is the worlds fastest x86 CPU for single threaded loads. It still runs on a base clock of 4GHz despite being a 14nm technology (Okay it does boost up to 5GHz but still). However it does so on 8 cores while having a TDP lower than the Pentium 4 up there.

It is actually worse than that.  Older Core2 processors had a load-to-use latency of 3.  I forget exactly where but one of the changes was to add pipeline stages and logic to support a load-to-use latency of 4 so despite a greater load-to-use latency, the clock speed did not increase by a commensurate amount.  The increase in load-to-use latency made up for decreasing cache performance.  Was any of that from a decrease in transistor performance?

This is why out-of-order execution is required for high clock speeds; it hides memory latency with a longer load-to-use latency.  And predictive execution is required to fill the out-of-order execution pipeline with useful work.

William Holt of Intel in 2016 discussing how Moore's Law applies even when transistor performance decreases.  Moore's Law is about the price per transistor.
 

Online T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21675
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #15 on: December 15, 2018, 06:21:24 pm »
I wonder if there is any interest in making highly serialized, simplified cores.  Namely: strip out some of the complicated logic (floating point? SIMD? simpler memory addressing models?) to push the clock rate higher.  The assumption being higher clock speeds give higher performance in frequently-branching (difficult-to-predict) code.  Or, higher order prediction and speculative execution units.

*shrug*, it may be there isn't any way to optimize such problems (namely, those that are strongly locked by Amdahl's law), and the best approach can only be a special-ish-purpose ASIC.  (Some crypto algorithms do this intentionally.)

The hope would be that something a bit more general (a CPU) would be capable of doing other Amdahl-locked problems faster.  Programs that are heavily serial in logic and light in arithmetic, even if they could be parallelized properly.

Idunno.  They're probably already doing as well as they can, considering how much of today's programs fit this bill.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Online Miyuki

  • Frequent Contributor
  • **
  • Posts: 905
  • Country: cz
    • Me on youtube
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #16 on: December 15, 2018, 08:21:08 pm »
I wonder if there is any interest in making highly serialized, simplified cores.  Namely: strip out some of the complicated logic (floating point? SIMD? simpler memory addressing models?) to push the clock rate higher.  The assumption being higher clock speeds give higher performance in frequently-branching (difficult-to-predict) code.  Or, higher order prediction and speculative execution units.
I think silicon frequency limit is already reached at that 5-6Ghz where specialized processor are run now (and efficient point speed is about 2-3GHz )
These processors have extremely long pipeline, you cannot make instruction processing to any smaller steps to get higher clock
Logic speed not changing if you dont use some extreme cooling or go to new material
Wonder how much development we are away from SiC based processors or things like GaAs
 

Offline donmr

  • Regular Contributor
  • *
  • Posts: 155
  • Country: us
  • W7DMR
Lately it's interconnect limited
« Reply #17 on: December 15, 2018, 09:41:36 pm »
On newer IC processes the R, L, and C of the interconnect limits the speed more than the transistors.
 
The following users thanked this post: Wimberleytech

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 16614
  • Country: us
  • DavidH
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #18 on: December 16, 2018, 09:46:28 pm »
I wonder if there is any interest in making highly serialized, simplified cores.  Namely: strip out some of the complicated logic (floating point? SIMD? simpler memory addressing models?) to push the clock rate higher.  The assumption being higher clock speeds give higher performance in frequently-branching (difficult-to-predict) code.  Or, higher order prediction and speculative execution units.

This is linked to *why* the highest clock rate processors are also the most complex.

The biggest limit is the low level cache load-to-use latency.  In the simplest processor, a load instruction returns a result from cache (or memory) for use in the next clock cycle so the load-to-use latency is 1.  If the pipeline is clocked faster, and some microcontrollers support this, then a wait state is generated because the memory does not return a result quickly enough.

So for higher clock rates, the pipeline is altered to support longer load-to-use latencies from the cache.  This is where out-of-order execution is necessary to achieve the highest clock rates.  Instead of inserting wait states, the pipeline now extracts instruction level parallelism to hide the latency to memory by executing other instructions while it waits.

I think Intel's Core2 processors are currently up to a 4 cycle load-to-use latency but at the beginning they were only 3.  Intel also made a similar change with the Pentium 4 Prescott which allowed for a larger but slower cache at higher clock rates but performance results were mixed.  As I recall, one of IBM's later PowerPC processors achieved a 5+ GHz clock rate despite unusually lowering the load-to-use latency to 2 clock cycles but the compromises to achieve this lowered performance in other ways; cache size was limited by this.

In the fastest processors, the register bank is not even fast enough to keep up with the pipeline so a register forwarding network holds results close to the pipeline for immediate use.
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4951
  • Country: si
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #19 on: December 17, 2018, 08:35:31 am »
You could probably design a CPU exclusively for clock speed that would be able to run at 10GHz.

The problem is that such a CPU would probably have a simple architecture and have a crappy instruction set that supports only the simple kind of operations or takes many clock cycles for any of the more complex instructions. The result is a high clocked chip that is actually not very powerful at all because it can't do all that much work per cycle.

You only need a few simple instructions to create a truing complete computer, but the simpler your instruction set it the more instructions are needed to complete a given task. The x86 architecture has taken the path of making more and more powerful instructions so it could do more work per clock cycle. If you compare a latest generation Intel i9 running at 1GHz with any ARM CPU running at 1GHz you will find that the x86 chip is still way faster at completing a given real world task.
 

Online T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21675
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #20 on: December 17, 2018, 09:18:05 am »
I've heard it suggested (as if to imply practicality...) that a ZISC could run at perhaps 40GHz.

Basic ops on a ZISC are terrifying illustrations of theoretical CS; a single addition takes hundreds of cycles, IIRC.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline viperidae

  • Frequent Contributor
  • **
  • Posts: 306
  • Country: nz
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #21 on: December 17, 2018, 10:27:41 am »
Why would you ignore turbo frequencies?
The silicon is perfectly capable of running at the higher speeds, the only issue is thermal.

 

Offline Kilrah

  • Supporter
  • ****
  • Posts: 1852
  • Country: ch
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #22 on: December 17, 2018, 10:56:40 am »
Yep, hadn't seen that mention in the OP... But if you care about what the process limitations are you should indeed actually be looking ONLY at turbo speeds.

Base clocks have nothing to do with process, they're just an arbitrary limit set to satisfy the specced TDP.
 

Offline Wimberleytech

  • Super Contributor
  • ***
  • Posts: 1133
  • Country: us
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #23 on: December 17, 2018, 12:30:33 pm »
Quote

Rds(on) (or more directly, mobility) has a positive tempco.  Vgs(th) also has a negative tempco, so higher voltages are needed, to some extent.  Capacitance stays mostly... well, it has a tempco, but not as strong, AFAIK.


Mobility has a negative tempco (T-1.5)
Thus R has a positive tempco

I know you know this...just correcting for the record.
« Last Edit: December 17, 2018, 12:34:50 pm by Wimberleytech »
 

Offline Wimberleytech

  • Super Contributor
  • ***
  • Posts: 1133
  • Country: us
Re: CPU Clock Speed based on MOSFET Physics Gate Length, Electic Field...
« Reply #24 on: December 17, 2018, 12:33:45 pm »

I wonder if the advantage is completely swamped by the dominant resistance in interconnects and such small transistors.


Yes, I believe interconnect (R and C) are a big factor in hitting the speed bump.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf