Author Topic: Why our MCU's have low frequency! (Read 5252 times)

ali_asadzadeh · « **on:** April 06, 2018, 06:16:12 pm »

Hi,
I was checking the Intel CPU microarchitectures in wiki,
https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures

I notices something that catches my eye attention

almost all of the CPU's have way more higher speed than the current MCU's, for example look at P5 (Pentium) with it's massive transistor sizes of 600nm, they achieved 300MHz operation, I always have this question of why lower count transistor CPU architectures have lower operation frequency? Doesn't it suppose to have higher speed, because of lower transistors, hence lower clock load and capacitance and higher speed, the highest Speed MCU which I know for now is i.MX RT from NXP with 45nm process node and 600MHz of speed, look at 45nm node from Intel !

Do you have any idea? it's almost an order of magnitude lower in frequency.

ataradov · « **Reply #1 on:** April 06, 2018, 06:25:19 pm »

MCUs run from flash, and their performance is limited by the flash speed. Creating specs from the MCU is a balancing act. If you want to go faster, the memory will be a limiting factor, so you need caches and faster buses, which makes the price go up.

Speed is not the only parameter that is being optimized for MCUs.

There is also less demand to push the specs on low end MCUs. They are designed with a huge margin. I ran nominally 48 MHz Cortex-M0+ at 96 MHz and it worked fine running a simple test program at room temperature. Intel on the other hand is less conservative with their margins, they basically specify as much performance as they can, because that's how they market their devices.

And tightening the margins creates yield issues. Intel solves this by introducing different speed grades for the devices. But it would be a nightmare to manage this on MCU side given the variety of available devices.

ali_asadzadeh · « **Reply #2 on:** April 06, 2018, 06:35:39 pm »

Thanks, but we have countless flash-less MCU's , like the LPC43XX, some ATMEL and microchip parts, and recent parts like i.MX RT series

ataradov · « **Reply #3 on:** April 06, 2018, 06:40:20 pm »

Still, their internal bus architecture is not designed for high speed. And doing so will increase complexity, which increases the price, but adds no value in a target market.

helius · « **Reply #4 on:** April 06, 2018, 06:46:03 pm »

You need to look at MIPS per watt per dollar, not just clock frequency.
For example, Core i7-920XM and i.MX RT are both fabricated in 45nm technology, but:
i7-920XM has 48000 Dhrystone MIPS, TDP is 55W, retails for $1054.
i.MX RT1050 has 1284 Dhrystone MIPS, TDP is 280mW, retails for $6 in single quantities (around $2.50 @ 10K units).

So the figure of merit MIPS/W/$ for Intel is 48000/55/1054 = 0.828
for NXP it is 1284/0.28/6 = 764

NXP is 923 times better

metrologist · « **Reply #5 on:** April 06, 2018, 07:37:03 pm »

That's only in dollar and energy cost. Time is money, and you left that out of your equation.

ogden · « **Reply #6 on:** April 06, 2018, 07:42:04 pm »

Quote from: metrologist on April 06, 2018, 07:37:03 pm

Time is money, and you left that out of your equation.

How do you bring time in equation while comparing apples to oranges anyways?

coppice · « **Reply #7 on:** April 06, 2018, 08:02:50 pm »

Current really fast processors need to use fairly deep pipelines to be able to push their clock rates up to several GHz. To keep the power under control they also need various clock and power gating schemes. These things add a considerable number of transistors. Achieving a high clock rate requires more transistors, not less. Look at the block diagrams for the ARM M series cores, and then contrast them with the block diagrams for the A series cores built for a similar instruction set, and the same silicon processes, but a lot more throughput. The A5 and A7 add transistors for parallelism, as well as achieving a higher clock speed, but you should get an idea of where the additional silicon is used.

Most MCU applications do some kind of I/O every few instructions, directly from the core (i.e. not through DMA and other techniques which separate the I/O from the core). This would cause a several GHz core to stutter so often, it wouldn't end up much faster than one clocking at a few hundred MHz. MCUs just aren't doing the same kinds of things a fast clocked applications processor normally does. To see something of the complementary nature of these things, try looking at the block diagrams for some of the big ARM SoCs. You'll often find several M series cores on the chip, doing custom I/O related tasks, working in parallel with the A series cores that are clocking much faster on the same die. The compute throughput of the M series cores looks pathetic compared to the A series cores, but they can really speed up applications by not bothering the A series cores when I/O occurs.

ogden · « **Reply #8 on:** April 06, 2018, 08:32:23 pm »

Quote from: coppice on April 06, 2018, 08:02:50 pm

Current really fast processors need to use fairly deep pipelines to be able to push their clock rates up to several GHz. To keep the power under control they also need various clock and power gating schemes. These things add a considerable number of transistors. Achieving a high clock rate requires more transistors, not less. Look at the block diagrams for the ARM M series cores, and then contrast them with the block diagrams for the A series cores built for a similar instruction set, and the same silicon processes, but a lot more throughput. The A5 and A7 add transistors for parallelism, as well as achieving a higher clock speed, but you should get an idea of where the additional silicon is used.

Right. MCU's are used in either low complexity or low power devices. Low complexity obviously does not need fast CPU. What remains is low power, but main rule of power saving for CPU is.. guess what? - Low clock frequency. That's because there's another semiconductor rule - you can have either low leakage (low static consumption) or fast switching frequency (high dynamic consumption), but not both at the same time.

Simple 3 GHz frequency clock generator alone will consume more than most of 24MHz embedded MCU's doing their usual job.

[edit] Imagine wireless mouse powered by AA battery with Core I-7 CPU that consumes more than 55W peak. It means that battery and power supply shall be able to sustain 33A current

Oh, and power consumption is more or less equal to power dissipation. I do not want my mouse to be 55W heater. Never.

coppice · « **Reply #9 on:** April 06, 2018, 09:02:14 pm »

Quote from: ogden on April 06, 2018, 08:32:23 pm

... main rule of power saving for CPU is.. guess what? - Low clock frequency.

That does apply to a CPU in isolation, but for a whole MCUs it's not really true. This is mostly because of the power hungry mixed signal content of the device. For example, if you are taking samples from most MCU ADCs you will achieve the lowest power consumption with a fairly high clock speed. That's because you can come out of sleep, turn on the power hungry mixed signal stuff, get the sampling over with quickly, turn off the mixed signal stuff, and get back to sleep. In this way you have kept the analogue circuitry drawing current for the minimum time. This is one of the founding principles of ultra low power MCU families, like the MSP430. Most of the ULP performance of these devices comes from smart peripheral design, but they need a core that has really snappy wake up, run fast, and go to sleep performance for best ULP results.

hans · « **Reply #10 on:** April 06, 2018, 09:32:53 pm »

Low complexity, cost, power and deterministic behaviour is also a key requirement in embedded.

You can see that e.g. the Cortex M7 chips that run 300-400MHz are getting deeper pipelines (like 5), but nowhere near modern Intel chips:

- Modern Intel chips are in the order of 14 stage papelines. Pentium 4 (NetBurst) was doing something like 30, but they reverted that decision in future architectures.
- Deep pipelines require a whole slew of things to extract optimal performance:

1) Branch predictors. There could be several instructions in between the branch and actual final decision. Thereby, making accurate predictions is important.
2) Branch predictors come in different styles. One of the simplest is remembering the last branch result and keep reapplying that. But in tight loops and a deep pipeline you will be suffering massively from small mispredictions. More complex branch predictors PC-dependent heuristics on multi-level dictionary lookups to get better predictions.
3) Modern processors tuned for performance use speculative execution. This executes instructions after a branch instruction, and then throws them away if it was incorrect. Throwing away results == wasted energy.
4) Programs have lots of dependencies and associated hazards, including false ones.
4a) Data hazards: e.g. don't overwrite a value before all previous instructions have read it. Make sure that if 2 writes happen to the same register, the last value sticks.
4b) Control hazards (e.g. branches, as explained)
4c) Structural hazards: e.g. a pipelined integer divider can be issued only once every 32 cycles, but the program does it faster than that.
5) Modern processors involve out-of-order execution to bypass false dependencies as much as possible, e.g. register renaming or the very famous Tomasulo's algorithm. This is beneficial for performance but requires more bookkeeping (e.g. re-order buffers)
5a) It does allow for multiple instructions to be fired per clock cycle, thus being able to achieve >1 instruction completed per clock cycle, given that hazards are not a problem.
6) Fast processor = fast memory busses = problems. Fast CPU's add multi-level cache structures, you can even see it on microcontrollers like ARM Cortex M7 chips or the PIC32MZ that employ program/data caches. Most modern microcontrollers also employ FLASH accelerators that in some way are also a cache.
7) Despite all these efforts, the amount of parallelism you can exploit in a single-thread program is limited. You see many CPU's introducing 2-way (or more..) "hyperthreading" that interleaves executions of multiple threads on the same CPU, to extract most of the multi-thread performance out of it.
8 ) Imagine this complex well-oiled machine executing instructions like mad. Then imagine that it also needs to handle exceptions, i.e. it needs to stop the thread it was executing and switch state. Oh, and we also want to do this in a precise manner, i.e. we want to return to the original program once the exception (or interrupt) has finished, with no side effects in any internal state or register any component of our CPU has. In order to accomplish this precise exception behaviour, it may need to undo or cancel instructions or in flight in order to show the exact content of e.g. registers at a particular moment in the program.

Contrast this with a processor that uses a 2 or 3 stage pipeline like the AVR, PIC24 or low-end ARM Cortex m chips. Alot of the "problems" said are irrelevant at that point. This makes the system also very deterministic, which is what we often want in embedded applications where latency and jitter are important.

Also, a lot of the above points are solved by throwing more and more transistors at the problem. Although modern silicon technologies has transistors in abundance (wiring and power is often a problem), all those transistors burn power, even when not switching and you got a billion of them, so you see many modern chips employing complex power management strategies. E.g. modern Intel CPUs have 10+ power states per core (which doesn't even mention frequency/voltage turbo boosts), and in addition 10 power states for the package, and 5 system states the computer can be in. In order to have these CPU's run fast, they become incredibly complex machines that are nowhere near microcontrollers.

Fsck · « **Reply #11 on:** April 06, 2018, 09:37:32 pm »

What exactly are you doing that requires a fast MCU? Usually you care about response time, in which parallelism is usually more helpful, you could take a look at xcore, 16 cores will let you do quite a lot with insanely snappy response times if you distribute your tasks correctly.
Or higher end ARM if you need actual compute power. there are dev boards available for the Kirin 960 and other arm socs which use the A72/A73 etc, those are pretty fracking powerful compared to MCUs.

ogden · « **Reply #12 on:** April 06, 2018, 10:15:19 pm »

Quote from: coppice on April 06, 2018, 09:02:14 pm

Quote from: ogden on April 06, 2018, 08:32:23 pm
... main rule of power saving for CPU is.. guess what? - Low clock frequency.
That does apply to a CPU in isolation, but for a whole MCUs it's not really true. This is mostly because of the power hungry mixed signal content of the device. For example, if you are taking samples from most MCU ADCs you will achieve the lowest power consumption with a fairly high clock speed. That's because you can come out of sleep, turn on the power hungry mixed signal stuff, get the sampling over with quickly, turn off the mixed signal stuff, and get back to sleep.

Come on.

MCU ability to sleep is no argument here. Computers can sleep as well - halt their CPU (cores) and peripherals as well.

coppice · « **Reply #13 on:** April 06, 2018, 11:05:12 pm »

Quote from: ogden on April 06, 2018, 10:15:19 pm

Quote from: coppice on April 06, 2018, 09:02:14 pm
Quote from: ogden on April 06, 2018, 08:32:23 pm
... main rule of power saving for CPU is.. guess what? - Low clock frequency.
That does apply to a CPU in isolation, but for a whole MCUs it's not really true. This is mostly because of the power hungry mixed signal content of the device. For example, if you are taking samples from most MCU ADCs you will achieve the lowest power consumption with a fairly high clock speed. That's because you can come out of sleep, turn on the power hungry mixed signal stuff, get the sampling over with quickly, turn off the mixed signal stuff, and get back to sleep.

Come on.

MCU ability to sleep is no argument here. Computers can sleep as well - halt their CPU (cores) and peripherals as well.

You entirely missed the point. Most larger computers take an extremely long time to wake up and to get to sleep. Most modern MCUs are targeting at least low power, with more and more targeting ultra low power. They use oscillator designs with near instant start from sleep. The better ones get from sleep to full operation in a microsecond or two, with another microsecond or two to get back to sleep. This allows keeping an ULP MCU, like an MSP430, in a sleep state on a whole different scale from larger computers or older MCUs. When you pop up from the sleep state you don't generally want the slowest clock possible. You want a fast clock, so you can minimise the time you are out of the sleep state, as this minimises the drain of the mixed signal hardware in the MCU, and any external circuitry you need to wake up.

ogden · « **Reply #14 on:** April 06, 2018, 11:41:18 pm »

Quote from: coppice on April 06, 2018, 11:05:12 pm

You entirely missed the point. Most larger computers take an extremely long time to wake up and to get to sleep.

It is good question who is missing the point here.

Obviously low power consumption can be achieved using effective sleep modes. It does not need to be reminded or discussed. It's obvious.

Here we talk why MCU have lower clock frequencies. You are advised to re-read topic of this thread, if in doubt.

westfw · « **Reply #15 on:** April 07, 2018, 12:12:11 am »

Quote

for example look at P5 (Pentium) with it's massive transistor sizes of 600nm, they achieved 300MHz operation

The 600nm Pentiums ran at 60-90MHz. By the time they got to 266MHz, they were down to 250nm...
The early Pentiums also has 16K of cache, and ran the front-side buses at 25MHz...
They also cost about $500 (NOT adjusted for inflation.)

http://www.cpu-world.com/CPUs/Pentium/index.html
http://processortimeline.info/proc1996.htm

coppice · « **Reply #16 on:** April 07, 2018, 12:19:21 am »

Quote from: westfw on April 07, 2018, 12:12:11 am

Quote
for example look at P5 (Pentium) with it's massive transistor sizes of 600nm, they achieved 300MHz operation
The 600nm Pentiums ran at 60-90MHz. By the time they got to 266MHz, they were down to 250nm...
The early Pentiums also has 16K of cache, and ran the front-side buses at 25MHz...
They also cost about $500 (NOT adjusted for inflation.)

http://www.cpu-world.com/CPUs/Pentium/index.html
http://processortimeline.info/proc1996.htm

I don't think P5 devices ever reached 266MHz. It was the much more complex, longer pipeline, OOO P6 based devices which reached that speed.

ogden · « **Reply #17 on:** April 07, 2018, 01:09:58 am »

Quote from: ogden on April 06, 2018, 11:41:18 pm

Here we talk why MCU have lower clock frequencies. You are advised to re-read topic of this thread, if in doubt.

Let's compare apples to apples - "high speed" 120MHz STM32F20xxx Cortex-M3 MCU to low power, low leakage process 32MHz STM32L1xx Cortex-M3 MCU.

Active 100% Run current, execution from flash, peripherals disabled, external 8MHz clock, nominal VCORE voltage, 25oC:

STM32L1xx: 2.1 mA
STM32F20xxx: 4 mA

Same ARM core, same manufacturer, same frequency - yet result is surprisingly different.
Further reading: TI article. ST does not write so well

David Hess · « **Reply #18 on:** April 07, 2018, 02:01:01 am »

The clock speed difference comes from the memory cycle time and latency. Cache allows for fast memory cycle times but at the extreme where latency is greater than cycle time, pipelining of the cache (really pipelining of the whole memory access) is necessary so pipelining of the instruction execution is also necessary. The instruction pipeline is very closely linked to the access time and latency of the cache or memory if no cache is used.

This is the major advantage of out-of-order processors which extract more memory parallelism and why in-order processors have lower maximum clock rates than out-of-order processors. It does not matter how fast the instruction execution pipeline is if it has to keep waiting for memory accesses.

It is worth noting that ARM was originally designed starting from the fast page mode DRAM interface to make an instruction pipeline to extract maximum performance.

Another way to look at this is the load to use latency of the instruction pipeline. A longer load to use latency allows longer memory latency for a given performance. These slow ARM microcontrollers all have a load to use of like 1 cycle. Many in-order processors have a load to use of 2 cycles. Current Intel out-of-order processors have a load to use latency of 4 cycles.

ali_asadzadeh · « **Reply #19 on:** April 07, 2018, 10:38:49 am »

Thanks guys for the hints and your feedback

But I have some other issues with Regarding the SPEED!

Compare the Intel parts with simple 74xx or CD4000 series logic's, They had F,S and ALS ect... series (which was intended for high speed), and I'm sure none of these old babies goes more than 300MHz for example CD4017 could do it under 20MHz, even the recent ultra high speed single gate devices like (SN74AUC06RGYR) barley achieve 1GHz operation, So are the Intel MOSFET's are MOSFET'!? what's your opinion? Intel parts was over 1GHz before 2000

ogden · « **Reply #20 on:** April 07, 2018, 11:01:41 am »

Quote from: ali_asadzadeh on April 07, 2018, 10:38:49 am

Compare the Intel parts with simple 74xx or CD4000 series logic's

You can't do such apples to oranges comparison. Why? - When you understand unbalanced 3V CMOS I/O bus speed limitations - you will know why there is no 3GHz 74xx chips. Before 2000 internally Intel CPUs were clocked over 1GHz, but did they communicate with outer world over 1GHz flock frequency, using CMOS unbalanced signals? Why would 1GHz CPU need external 33MHz bus?

andersm · « **Reply #21 on:** April 07, 2018, 11:26:28 am »

Quote from: coppice on April 07, 2018, 12:19:21 am

I don't think P5 devices ever reached 266MHz. It was the much more complex, longer pipeline, OOO P6 based devices which reached that speed.

P55C topped out at 233MHz, the mobile Tillamook part reached 300MHz. Both were several process generations ahead of the original P5 though. (Knights Corner, which was based on P54C, reached a bit over 1.2GHz.)

SiliconWizard · « **Reply #22 on:** April 07, 2018, 04:08:57 pm »

Cost, integration, power draw and KISS reasons.

Note that some vendors are bridging the gap with microcontrollers running at over 200 MHz (240 MHz for the Renesas RX family, 500 Mhz multi-core for XMOS devices which are more or less considered as microcontroller as well, STM32H7 at 400 MHz). Is that slow?

hans · « **Reply #23 on:** April 07, 2018, 04:18:27 pm »

Quote from: ali_asadzadeh on April 07, 2018, 10:38:49 am

Thanks guys for the hints and your feedback

But I have some other issues with Regarding the SPEED! Compare the Intel parts with simple 74xx or CD4000 series logic's, They had F,S and ALS ect... series (which was intended for high speed), and I'm sure none of these old babies goes more than 300MHz for example CD4017 could do it under 20MHz, even the recent ultra high speed single gate devices like (SN74AUC06RGYR) barley achieve 1GHz operation, So are the Intel MOSFET's are MOSFET'!? what's your opinion? Intel parts was over 1GHz before 2000

http://www.potatosemi.com/

Just watch out. Don't put more than 2pF load on it, otherwise it won't work, like a few cm (at most) of PCB traces.

coppice · « **Reply #24 on:** April 07, 2018, 04:26:24 pm »

Quote from: ali_asadzadeh on April 07, 2018, 10:38:49 am

Thanks guys for the hints and your feedback

But I have some other issues with Regarding the SPEED! Compare the Intel parts with simple 74xx or CD4000 series logic's, They had F,S and ALS ect... series (which was intended for high speed), and I'm sure none of these old babies goes more than 300MHz for example CD4017 could do it under 20MHz, even the recent ultra high speed single gate devices like (SN74AUC06RGYR) barley achieve 1GHz operation, So are the Intel MOSFET's are MOSFET'!? what's your opinion? Intel parts was over 1GHz before 2000

You are comparing on chip speeds with off chip speeds. This is a meaningless comparison. As soon as you leave a die, the loading of the signal path makes it very hard to achieve high speeds, and high speeds require considerable power. Before 2000 Intel didn't have a single device with an off chip signal running at 1GHz. Even today only a few off chip signals run at 1GHz or more, and they use specialised signaling, not simple 74 like logic switching.

BrianHG · « **Reply #25 on:** April 07, 2018, 04:31:44 pm »

Quote from: ali_asadzadeh on April 07, 2018, 10:38:49 am

Thanks guys for the hints and your feedback

But I have some other issues with Regarding the SPEED! Compare the Intel parts with simple 74xx or CD4000 series logic's, They had F,S and ALS ect... series (which was intended for high speed), and I'm sure none of these old babies goes more than 300MHz for example CD4017 could do it under 20MHz, even the recent ultra high speed single gate devices like (SN74AUC06RGYR) barley achieve 1GHz operation, So are the Intel MOSFET's are MOSFET'!? what's your opinion? Intel parts was over 1GHz before 2000

Oh really, funny, this AND/NAND/OR/NOR logic gate seems to be plenty fast: HMC843LC4B
http://www.analog.com/media/en/technical-documentation/data-sheets/hmc843.pdf
http://www.analog.com/media/en/technical-documentation/data-sheets/hmc844.pdf (Xor/Xnor gate)
http://www.analog.com/media/en/technical-documentation/data-sheets/hmc841.pdf (D flip-flop)
Note that Analog devices has a whole HMC84xxxxx line of logic ICs.

Quote from: hans on April 07, 2018, 04:18:27 pm

http://www.potatosemi.com/

Just watch out. Don't put more than 2pF load on it, otherwise it won't work, like a few cm (at most) of PCB traces.

Those are already super slow slugs/snails, why even mention them. They cant even drive a respectable 1Ghz to multiple output gates.

David Hess · « **Reply #26 on:** April 08, 2018, 01:44:03 am »

Quote from: andersm on April 07, 2018, 11:26:28 am

Quote from: coppice on April 07, 2018, 12:19:21 am
I don't think P5 devices ever reached 266MHz. It was the much more complex, longer pipeline, OOO P6 based devices which reached that speed.

P55C topped out at 233MHz, the mobile Tillamook part reached 300MHz. Both were several process generations ahead of the original P5 though. (Knights Corner, which was based on P54C, reached a bit over 1.2GHz.)

I had to refresh my memory. Intel's Socket 7 parts only made it to 233MHz but AMD's updated Super Socket 7 increased the bus speed to 100 MHz and CPU speed to 450MHz or 550MHz. I really liked the AMDK6-III.

David Hess · « **Reply #27 on:** April 08, 2018, 02:22:18 am »

Quote from: ali_asadzadeh on April 07, 2018, 10:38:49 am

But I have some other issues with Regarding the SPEED! Compare the Intel parts with simple 74xx or CD4000 series logic's, They had F,S and ALS ect... series (which was intended for high speed), and I'm sure none of these old babies goes more than 300MHz for example CD4017 could do it under 20MHz, even the recent ultra high speed single gate devices like (SN74AUC06RGYR) barley achieve 1GHz operation, So are the Intel MOSFET's are MOSFET'!? what's your opinion? Intel parts was over 1GHz before 2000

It still goes back to the memory or cache access time and load-to-use latency. The longer the access time and the shorter the load-to-use latency the more work is done per instruction stage and fewer instruction stages are used producing a lower clock rate.

Discrete logic was not dense enough to support the complexity required for a longer load-to-use latency and memory at the time had a long access time so clock speeds were slow. Things did not take off until increasing integration allowed the entire processor and cache memory to be located on the same integrated circuit. Try disabling the cache on a modern processor to see how fast it runs without it.

If you wanted to push the clock rate during the era of TTL, then you used ECL (emitter coupled logic) and ECL memory like Cray and others did but increasing integration from Moore's Law in CMOS won the performance race.

It might be a fun project to implement a simple (!) out-of-order processor for high load-to-use latency in an FPGA to see what clock rate is achievable.

BrianHG · « **Reply #28 on:** April 08, 2018, 03:45:23 am »

Or, if you are a millionaire, build a processor and ram out of the logic gates from Analog devices which I listed above and achieve faster than Intel CPU clock speeds. Though, 630mw per D-Flipflop would make a processor approaching the smallest PIC MCU consume a few kilowatts of power, maybe a megawatt... But, holly shit, it would be the damn fastest PIC anyone could ever dream of...

You would need to buy the dies & direct bond for minimum latency and mount the thing on a room sized heatsink in a pool of liquid nitrogen.

hans · « **Reply #29 on:** April 08, 2018, 08:22:45 am »

And find a solution around the 270 clock phase margin @ 40GHz ;-)

How many flip-flops would a 8-bit PIC contain? This page suggests ~600, i.e. that would be 380W in flip-flops alone.

The reason I mentioned the PotatoSemi parts is not only because of the name, but also to highlight how ridiculous idea it is.

Sure those Hittite parts probably serve a purpose in some industry, but are also 600$+ each

NorthGuy · « **Reply #30 on:** April 08, 2018, 12:47:58 pm »

Quote from: BrianHG on April 08, 2018, 03:45:23 am

Or, if you are a millionaire, build a processor and ram out of the logic gates from Analog devices which I listed above and achieve faster than Intel CPU clock speeds.

Setting aside the (unsolvable) problem of putting the ICs in the space without producing extra delays, the signal will need to pass through a lot of gates between two consecutive clock edges. For example, think how many consecutive gates you'd need to create a simple 64-bit adder. Divide 40GHz by that number. That's the maximum clock speed you can achieve. I don't think you can get to Intel's 5GHz, even if you use reasonable pipelining.

ali_asadzadeh · « **Reply #31 on:** April 08, 2018, 02:33:23 pm »

Thanks guys, So maybe we could reach this idea that Intel is the best MOSFET creator in world! because see how many of them is in their i9 extreme editions or xeon CPU's, and see their power and price and divide the CPU price with the number of transistors, and you see the numbers, so if making this very good MOSFET with this very affordable price, Intel or other companies does not make use of them in other chips as well!?

theoldwizard1 · « **Reply #32 on:** April 08, 2018, 02:36:56 pm »

Quote from: ataradov on April 06, 2018, 06:25:19 pm

MCUs run from flash, and their performance is limited by the flash speed. Creating specs from the MCU is a balancing act. If you want to go faster, the memory will be a limiting factor, so you need caches and faster buses, which makes the price go up.

Although I have been retired for over 10 years from the world of automotive electronics, the above statement is spot on ! High end single chip embedded controllers have been fighting this issue for many, MANY years. Years ago. I was told by silicon designers that it is very difficult to design a single chip with a CPU (random logic) and Flash and RAM (both "regular"/repeated logic) because in the "real" world these are manufactured on totally different processors.

The Infineon TriCore family of MCUs is very popular in the automotive world. Even though their latest chips have 4MB of Flash and and 256KB of RAM, it is just not enough. External memory devices are relatively "slow" and cause processor stalls.

If you look at a die photo of one of these chips, the CPU including Floating Point Processor take about 10% of the die !

theoldwizard1 · « **Reply #33 on:** April 08, 2018, 02:41:15 pm »

Quote from: Fsck on April 06, 2018, 09:37:32 pm

What exactly are you doing that requires a fast MCU? Usually you care about response time, in which parallelism is usually more helpful, you could take a look at xcore, 16 cores will let you do quite a lot with insanely snappy response times if you distribute your tasks correctly.

Not when they are competing for the same on-chip resource, like Flash or RAM.

theoldwizard1 · « **Reply #34 on:** April 13, 2018, 10:53:37 pm »

I finally stumbled across the specific document I was looking for ! This relates to the Infineon Tricore V1.6 architecture which is less than 10 years old. It is a heavily pipeline processor. This is a true Harvard Architecture processor. PMI = Program Memory Interface (instructions). DMI = Data Memory Interface. In both cases there is cache and "scratch pad" (PSPR and DSPR) memory include in the memory interface. Anything in either memory interface can be accessed in 1 clock.

The SRI Cross Bar Interface means that both the instruction cache can fill and the data cache can fill at the same time providing they are not accessing the same memory resource (PM0, PM1 or LMU) at the same time.

Now from Infineon Tricore V1.6 Application Note AP32168, page 10 here is the real interesting part

theoldwizard1 · « **Reply #35 on:** April 13, 2018, 10:57:40 pm »

Duplicate

rstofer · « **Reply #36 on:** April 14, 2018, 01:53:32 am »

The Analog Devices Blackfin MCU has been around for a long time and it runs at 600 MHz. It doesn't have an MMU so it runs uClinux. Very fast with a lot of DSP capability and a large assortment of peripherals.

ogden · « **Reply #37 on:** April 14, 2018, 01:54:19 am »

Quote from: theoldwizard1 on April 13, 2018, 10:53:37 pm

I finally stumbled across the specific document I was looking for ! This relates to the Infineon Tricore V1.6 architecture which is less than 10 years old. It is a heavily pipeline processor. This is a true Harvard Architecture processor. PMI = Program Memory Interface (instructions). DMI = Data Memory Interface. In both cases there is cache and "scratch pad" (PSPR and DSPR) memory include in the memory interface. Anything in either memory interface can be accessed in 1 clock.

Data and instruction bus, that's it? I would say - outdated tech

Kind of basic ARM microcontroller, stm32f3xx can do five access operations in 1 clock, have two SRAM memories and one FLASH.

p.s. eevblog is broken - do not let me attach small 50kb .png file. You shall find picture yourself, in page 6:

ST appnote AN4296


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Why our MCU's have low frequency! (Read 5252 times)

Share me