Author Topic: When CPU's are made is each one slightly different? (Read 7865 times)

Cerebus · « **Reply #25 on:** September 06, 2018, 12:36:23 pm »

Quote from: Tom45 on September 06, 2018, 03:15:01 am

Quote from: Cerebus on September 05, 2018, 10:48:28 pm

...

As to the claim that it didn't use parity or other error checking, I find that unlikely as it was pretty ubiquitous in mainframe designs of the time, but I'm not going to trawl through the documentation (available on-line and massively detailed compared to modern non-documentation) just to prove a strong suspicion.

The 6600 definitely didn't have memory parity checking.

I worked with the 6600 in the late 60's and early 70's. When the Boeing 747 entered service in 1970, a coworker said he wasn't going to fly on it because Boeing had done the aeronautical design using a 6600 without parity. So he didn't trust the design results, or the plane. Back then memory reliability wasn't up to current standards.

History shows that Boeing got it right anyway.

I'll happily stipulate that it didn't use stored parity on the main memory, but I'd be horrified if it didn't use parity on the memory bus, I/O buses etc. Then again, the quotes from Cray above indicate a degree of arrogance and later chagrin, so it's quite possible.

For those without physical experience of old machines, it wasn't uncommon for peripherals and even memory cabinets to be connected to the main processor by parallel buses implemented in relatively long runs of multicore cable - 10 metres for a cable to a card reader or tape drive or disk cabinet isn't unrealistic. Think lots of inductance, capacitance, crosstalk, exposure to interference and so on. Running those without some form of transmission error detecting is/was risky.

Cerebus · « **Reply #26 on:** September 06, 2018, 12:50:47 pm »

Quote from: Beamin on September 06, 2018, 09:20:18 am

... So I'm not doubting you I'm asking if you are applying common sense or if you know this for a fact. Yes its going to cost way more to make a tb then a gb card but the devil is in the details when you go to one or two sizes bigger not orders of magnitude. ...

Where apparent common sense breaks down here is around the issue of feature size and yield. If you want to make chips with higher capacities you have to use smaller feature sizes on the chips. Smaller feature sizes increases the probability that tiny defects in the silicon wafer will break something on a given chip and reduce your yield. So it makes sense to design your circuit so that it is tolerant to having part of it broken - so that a 2 Gb chip is made up of two 1 Gb sub-chips, or four 512 Mb sub chips. That way if one sub-chip doesn't work, you can still sell the whole chip, but as a lower capacity part. The only things that must work for each chip to be saleable is one of the sub-chips and the controller part that interfaces it to the outside world. That way instead of only making 2 Gb chips and getting a yield of say 20% and throwing 80% away, you get 20% full specification 2Gb chips, 50% 1Gb chips, 10% 512Mb chips and only 20% of dud chips that you have to throw away. (Numbers made up for example and bear no relation to actual yield figures, except by accident.)

David Hess · « **Reply #27 on:** September 06, 2018, 02:04:43 pm »

Quote from: Cerebus on September 05, 2018, 10:48:28 pm

Quote from: David Hess on September 05, 2018, 06:05:53 pm
It is certainly possible to make reliable systems without ECC; just lower the soft error rate enough. But this comes at the expense of speed and power.

One of the big advantages of ECC is in addition to correcting errors, it also allows notification that there was an error and where. Otherwise how would you know short of data corruption detected later?

The CDC 6600 was a 2 MIPS machine that consumed 30 kW. So, yeah, a tad slower than modern toothbrushes* and greedier on the juice too. It had a 100ns minimum memory access time, which was pretty damn fast for it's time.

I was referring to a modern design comparable to an existing processor. An Intel processor which currently has ECC and parity protection of its cache could be redesigned to operate its cache at a higher voltage and with more charge stored per bit to lower the soft error rate enough to do without memory protection but it would cost considerable power and performance and cost more due to increased area.

They do not include ECC and parity unless the application requires it or it makes the processor better compared to the alternative of not including them.

The Flash memory quality thing really bugs me. I have tested various USB memory sticks and they all suffered from poor retention even when new of months to barely a year. Meanwhile I have floppy disks which never lost their data and are completely usable.

I would not mind so much if I knew who made good Flash memory devices.

Beamin · « **Reply #28 on:** September 10, 2018, 04:45:38 pm »

Quote from: David Hess on September 06, 2018, 02:04:43 pm

Quote from: Cerebus on September 05, 2018, 10:48:28 pm
Quote from: David Hess on September 05, 2018, 06:05:53 pm
It is certainly possible to make reliable systems without ECC; just lower the soft error rate enough. But this comes at the expense of speed and power.

One of the big advantages of ECC is in addition to correcting errors, it also allows notification that there was an error and where. Otherwise how would you know short of data corruption detected later?

The CDC 6600 was a 2 MIPS machine that consumed 30 kW. So, yeah, a tad slower than modern toothbrushes* and greedier on the juice too. It had a 100ns minimum memory access time, which was pretty damn fast for it's time.

I was referring to a modern design comparable to an existing processor. An Intel processor which currently has ECC and parity protection of its cache could be redesigned to operate its cache at a higher voltage and with more charge stored per bit to lower the soft error rate enough to do without memory protection but it would cost considerable power and performance and cost more due to increased area.

They do not include ECC and parity unless the application requires it or it makes the processor better compared to the alternative of not including them.

The Flash memory quality thing really bugs me. I have tested various USB memory sticks and they all suffered from poor retention even when new of months to barely a year. Meanwhile I have floppy disks which never lost their data and are completely usable.

I would not mind so much if I knew who made good Flash memory devices.

OLd floppy disks.

I last remember using them to store .jpgs and with time they would start to get green blocks half way down the image. It was like my porn was self censoring.

wraper · « **Reply #29 on:** September 10, 2018, 05:38:28 pm »

Spectek chips are not necessarily worse than Micron parts. Micron may just reject whole wafers or batches of packaged ICs due to poor yields and that goes to Spectek for further testing/recovery of good parts. But there is much lower confidence in component quality. Often you can see Spectek logo printed over micron branded IC.

Beamin · « **Reply #30 on:** September 13, 2018, 02:31:07 am »

Quote from: blueskull on September 10, 2018, 04:59:29 pm

Quote from: Beamin on September 06, 2018, 09:20:18 am
unscrupulous companies found they could just brand the shit products from china and put the higher price tags on them.
Quote

You only get those black chip SSDs from absolutely cheapest brands. Any reasonable brands won't use them.
They are usually sold with cheap computers or refurbs. If you build your computer from scratch, you will have to find hard to get them.

Generally if you walk into MicroCenter, the worst you will ever find are SSDs made of white chips (their house brand, using Spectek chips).
Spectek chips are perfectly fine for consumer use. They are a direct subsidiary of Micron, selling rejected chips for Micron.

That's like the microcenter brand Arduino. They google translated half the projects from Chinese to English then halfway trough they said fuck it and left the Chinese in the examples. They also make it look like its the real thing made in itally. Best part is the starter kit box says it has a nixie tube in it which is why I bought it. At microcenter a nixie tube is actually a cheap faded seven segment display with a mislabeled common cathode as anode. If you can learn from microcenters examples you can master the Arduino or pull your hair out trying.

hamster_nz · « **Reply #31 on:** September 13, 2018, 05:43:18 am »

If you want to see how fast a CPU can run, you don't have to test every part of the CPU, just the paths that have the worst timing at various process corners (eg. hot or cold, or more easily high voltage / low voltage...).

I think it would be easiest to include a little test design somewhere on the die (e.g. maybe a ring oscillator), which you can easily access to quickly test the likely performance. This would allow you to sort the best from the rest without having to package and power up the entire die.

Or maybe intersperse test patterns across a wafer, to see where things are better/worse than average.

Berni · « **Reply #32 on:** September 13, 2018, 06:01:01 am »

The problem is that you actually have to test the operation of the CPU not just its speed.

From the billions of transistors in there it only takes a single dead transistor to render a whole CPU core dead. So they have to run a test that exercises all parts of the CPU. Also its possible that only a few transistors ended up out of spec but still working and this results in extra timing issues on that signal path resulting in the CPU running fine at slow clock speeds but making a miscalculation at high clock speeds.

A rough test of the CPUs functionality can be done pretty quickly. I would guess they have a quick die probe test of the wafers as they come off the production line to make sure all the major parts of the CPU work (Perhaps even having dedicated testing hardware on the die to help in this process). This would reduce waste of packaging and wire bonding dead dies. The full test where they "speed bin" them likely done when the chip is already in a LGA package and doubles as a final quality control test to make sure they don't ship a dead CPU from the factory.

Beamin · « **Reply #33 on:** September 13, 2018, 06:31:38 am »

Quote from: Berni on September 13, 2018, 06:01:01 am

The problem is that you actually have to test the operation of the CPU not just its speed.

From the billions of transistors in there it only takes a single dead transistor to render a whole CPU core dead. So they have to run a test that exercises all parts of the CPU. Also its possible that only a few transistors ended up out of spec but still working and this results in extra timing issues on that signal path resulting in the CPU running fine at slow clock speeds but making a miscalculation at high clock speeds.

A rough test of the CPUs functionality can be done pretty quickly. I would guess they have a quick die probe test of the wafers as they come off the production line to make sure all the major parts of the CPU work (Perhaps even having dedicated testing hardware on the die to help in this process). This would reduce waste of packaging and wire bonding dead dies. The full test where they "speed bin" them likely done when the chip is already in a LGA package and doubles as a final quality control test to make sure they don't ship a dead CPU from the factory.

That's crazy a single dead transistor can do that. I have no idea how they get the silicon that pure besides make a huge crystal and stretch it. How do they dope things when the boron/doping agent is one part per million to make a hole when your transistor is a few hundred atoms?

Think we will ever run out of sand to get silicon? Its like aluminum the only reason we don't make everything out of it is because it takes massive amounts of energy to reduce or remove oxygen from things.

Berni · « **Reply #34 on:** September 13, 2018, 06:52:21 am »

Oh and on the topic. GPUs failures can be more interesting because those tend to mostly munch trough raw data and not make many decisions along the way or make much use of pointers. Because of that miscalculations or data corruption don't tend to crash it catastrophically. Such failures tend to quickly lock up or completely crash a CPU, but a GPU will often just happily keep on munching data and spitting out wrong results.

The effect in the video is likely caused by one of the graphics RAM pins being shorted out or loosing connection so its leaving streaks in the video output and reading the model and texture data wrong.

You can get a similar effect when overclocking a graphics card too far(Usually the memory clock), but its usually less predictable than this and it might still eventually crash completely.

wraper · « **Reply #35 on:** September 13, 2018, 08:22:31 am »

Quote from: Berni on September 13, 2018, 06:52:21 am

The effect in the video is likely caused by one of the graphics RAM pins being shorted out or loosing connection so its leaving streaks in the video output and reading the model and texture data wrong.

Certainly not that. You'd most likely have some line or boxed artifact pattern over the screen in windows (if it works at all) and as minimum have immediate GPU driver crash as soon as launching 3D application. Artifacts in the video are caused by very minor data corruption.

Cerebus · « **Reply #36 on:** September 25, 2018, 01:41:27 am »

Apropos the earlier discussion about parity in the CDC 6600. I happened to trip across a book on the design of the CDC 6600 online. It did use parity for the Extended Core Store. Not for the Central Store, at least there is no mention of it, but for the extended store most definitely. There is also mention of handling parity errors in the peripheral controllers.

Quote from: Design of a Computer The Control Data 6600 J.E.Thornton p 54

The Extended Core Storage is a linear-select 2-D magnetic core unit. This is a very large
unit in terms of the number of bits in one bank. Although the read-write cycle time is
over three times that of the central storage, the longer word length more than offsets,
at least for block transfers. The block diagram of Figure 40 shows the ECS unit as two
dimensional, with the word dimension of 488 bits, including 8 parity, and number
of words equaling 15,744.

The book is available here for any computer architecture aficionados out there. For those of a logic design bent, it goes down to transistor and gate level explanations of some parts of the machine's design.

Tom45 · « **Reply #37 on:** September 25, 2018, 05:06:11 am »

Quote from: Cerebus on September 25, 2018, 01:41:27 am

The book is available here for any computer architecture aficionados out there. For those of a logic design bent, it goes down to transistor and gate level explanations of some parts of the machine's design.

Thanks for finding and sharing the link to that book. Truly a trip down memory lane for those of us that worked with the 6600 back in the day.

The hand written note from Thornton's wife giving permission to scan and make the book available was interesting. He must have been pleased that people still remembered and appreciated the work he did on the 6600 and the book. He died just 3 years later in 2005.

His Obituary: https://www.legacy.com/obituaries/twincities/obituary.aspx?pid=3033494

tooki · « **Reply #38 on:** September 27, 2018, 04:19:45 pm »

Quote from: Beamin on September 13, 2018, 06:31:38 am

Quote from: Berni on September 13, 2018, 06:01:01 am
The problem is that you actually have to test the operation of the CPU not just its speed.

From the billions of transistors in there it only takes a single dead transistor to render a whole CPU core dead. So they have to run a test that exercises all parts of the CPU. Also its possible that only a few transistors ended up out of spec but still working and this results in extra timing issues on that signal path resulting in the CPU running fine at slow clock speeds but making a miscalculation at high clock speeds.

A rough test of the CPUs functionality can be done pretty quickly. I would guess they have a quick die probe test of the wafers as they come off the production line to make sure all the major parts of the CPU work (Perhaps even having dedicated testing hardware on the die to help in this process). This would reduce waste of packaging and wire bonding dead dies. The full test where they "speed bin" them likely done when the chip is already in a LGA package and doubles as a final quality control test to make sure they don't ship a dead CPU from the factory.

That's crazy a single dead transistor can do that. I have no idea how they get the silicon that pure besides make a huge crystal and stretch it. How do they dope things when the boron/doping agent is one part per million to make a hole when your transistor is a few hundred atoms?

Think we will ever run out of sand to get silicon? Its like aluminum the only reason we don't make everything out of it is because it takes massive amounts of energy to reduce or remove oxygen from things.

Wired just covered this earlier this summer: https://www.wired.com/story/book-excerpt-science-of-ultra-pure-silicon/

mac.6 · « **Reply #39 on:** October 15, 2018, 08:47:04 pm »

Quote from: Berni on September 13, 2018, 06:01:01 am

The problem is that you actually have to test the operation of the CPU not just its speed.

From the billions of transistors in there it only takes a single dead transistor to render a whole CPU core dead. So they have to run a test that exercises all parts of the CPU. Also its possible that only a few transistors ended up out of spec but still working and this results in extra timing issues on that signal path resulting in the CPU running fine at slow clock speeds but making a miscalculation at high clock speeds.

A rough test of the CPUs functionality can be done pretty quickly. I would guess they have a quick die probe test of the wafers as they come off the production line to make sure all the major parts of the CPU work (Perhaps even having dedicated testing hardware on the die to help in this process). This would reduce waste of packaging and wire bonding dead dies. The full test where they "speed bin" them likely done when the chip is already in a LGA package and doubles as a final quality control test to make sure they don't ship a dead CPU from the factory.

Today Design For Testing (DFT) is a big part of chip design.
You don't test chip functionality (as it will be mostly impossible to design such test), but you link every part to form a chain and issue test vectors to check that everything is interconnected up as it is intended to be.
This is way more faster and you can catch error deeply buried into the core logic.

And sometimes a perfect wafer is build (100% yield), but it very rare (Intel did their first in 1989, an 8051 wafer).


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: When CPU's are made is each one slightly different? (Read 7865 times)

Cerebus

Re: When CPU's are made is each one slightly different?

Cerebus

Re: When CPU's are made is each one slightly different?

David Hess

Re: When CPU's are made is each one slightly different?

Beamin

Re: When CPU's are made is each one slightly different?

wraper

Re: When CPU's are made is each one slightly different?

Beamin

Re: When CPU's are made is each one slightly different?

hamster_nz

Re: When CPU's are made is each one slightly different?

Berni

Re: When CPU's are made is each one slightly different?

Beamin

Re: When CPU's are made is each one slightly different?

Berni

Re: When CPU's are made is each one slightly different?

wraper

Re: When CPU's are made is each one slightly different?

Cerebus

Re: When CPU's are made is each one slightly different?

Tom45

Re: When CPU's are made is each one slightly different?

tooki

Re: When CPU's are made is each one slightly different?

mac.6

Re: When CPU's are made is each one slightly different?

Share me