EEVblog Electronics Community Forum
General => General Technical Chat => Topic started by: SeanB on December 28, 2015, 05:40:24 am
-
When hardware must „just work“ -An inside look at x86 CPU design
https://www.youtube.com/watch?v=eDmv0sDB1Ak (https://www.youtube.com/watch?v=eDmv0sDB1Ak)
-
I'm at the Q&A part, but I doubt this question will be asked.
He says that all flip-flops can be chained. Are those multiplexers actually left for production units? That seems like it would be a big performance hit to have a MUX in front of each register plus having a huge chain of traces going literally over the entire chip area. That's like extra metal layer right there.
-
I do not think they are there for all registers, probably only a limited set that are outputs and inputs of units, so you can eg run the ALU as a unit, or run the command decoder and see the output microcodes. They will be left in, as taking them out means more bugs may be introduced.
-
They will be left in, as taking them out means more bugs may be introduced.
That would be my feeling as well, but he explicitly called it a full dump. And testing of individual blocks was not a goal. The goal was to get a state that can be sucked into a model for a closer examination. And having only in/out registers will be limiting to a point of being useless.
The ability to feed arbitrary state into the design is kind of cool though.
-
As he also said, you might have to do multiple dumps to get the whole state out, providing you can reproduce the bug reliably ( and we all know of those bugs that vanish when you connect to test equipment), and that they often disable using fuse bits for production, so that they can still use the full debug during manufacture, but limit it or disable entirely for the final production QC check.
-
As he also said, you might have to do multiple dumps to get the whole state out
You may need to try multiple times simply because it is impossible to get to the same exact state on multiple runs, give how much stuff is going on in a modern CPU. And even then, as he said, by the time your system is visibly FUBAR, it is already impossible to trace what caused this state.
And that's why I have my doubts how useful this entire thing can be, given how expensive it is.
Single stepping the entire x86 CPU must be a lot of fun :)
and that they often disable using fuse bits for production, so that they can still use the full debug during manufacture, but limit it or disable entirely for the final production QC check.
I would hope they do. Otherwise it is like an ultimate security hole. 3 pins give you access to absolutely everything.
-
I haven't finished watching yet, so yes, I would guess security is a rather important thing. lose the hard wired key and you are royally FUBAR for sure. But then, they can single step really slowly to trace a bug, using the $$$$ machinery and a few hours of running.
Your testing is needed space, but really the biggest thing is clock distribution and caches, along with all the built in memory.
-
He says that all flip-flops can be chained. Are those multiplexers actually left for production units? That seems like it would be a big performance hit to have a MUX in front of each register plus having a huge chain of traces going literally over the entire chip area. That's like extra metal layer right there.
A scan flop is a flop with 2 extra inputs, SI (scan data in) and SE (scan enable). The library of cells used to build the chip will contain both flops and scan flops. The scan flop will be bigger than a flop but the area of a scan flop is brutally optimized by the library designer.
Most (99.9..%) flops will be replaced by scan flops by the chip designer since, to keep the fab honest, it is up to the chip designer to prove that there is a fab defect in a chip. The fab hopes your fault coverage is not that good and you miss a lot of their defects. Off the top of my head, the only flops that are not replaced by scan flops are things like synchronizer flops.
-
but the area of a scan flop is brutally optimized by the library designer.
And that's something to be expected, of course.
But the FF area itself is only a part of the issue. You can expect FFs to be distributed throughout the entire chip area, so just routing enable and data inputs will take at least one full layer of metal.
-
but the area of a scan flop is brutally optimized by the library designer.
And that's something to be expected, of course.
But the FF area itself is only a part of the issue. You can expect FFs to be distributed throughout the entire chip area, so just routing enable and data inputs will take at least one full layer of metal.
If you look at a die photo you will see that the area of regions where standard cell logic is laid out is a relatively small part of the total die area. The largest part of the die area is made up of SRAMs. SRAMs are not made with flops and their test logic does not consist of scan chains but BIST controllers.
-
But the FF area itself is only a part of the issue. You can expect FFs to be distributed throughout the entire chip area, so just routing enable and data inputs will take at least one full layer of metal.
There are several reasons it's not quite as bad as all that.
First, the timing constraints on the scan chain are very relaxed compared to the main clock. You might clock the chip at 2 GHz but you might only clock the scan chain up to a few 10's of MHz. Knowing this, ugly routings and lots of layer transitions (within reason) are not the end of the world.
Secondly, the designer will typically not stitch up the scan chain his/herself, but let a scan insertion tool do it. The scan insertion tool has the flexibility to order (and break) the chain in any order, and if run post-placement (and potentially, post preliminary routing), can make excellent choices that make the scan routing as easy as possible _without_ putting undue pressure on the "main" placement and routing.
Finally, someone else already mentioned that an important library part like a scan flop is going to just have the sh&t beaten out of it by experienced circuit and layout folks, who do amazing work. From a timing perspective, scan is still secondary; the main thing is to keep the main path fast. Googling around, I see most pictures of scan latch circuits show a mux, but in reality, a latch muxing is typically implemented using pass transistors and/or tristating inverter totem poles. On the data side, changing the 2:1 "keep or D" mux behavior to a 3:1 "keep or SI or D" behavior can be done without adding any add'l stages, just a bit more loading on the storage node for the add'l pass transistor(s). On the control side, you can arrange the logic so that the scan enable gets a slow path and CLK (and EN, if present) are fast. This is fine because, from the perspective of normal operation, the scan enable signal is more or less a DC constant, so the complexity of its path isn't so important.
-- dave j