I find it strange that some engineers seem to celebrate buckets of their conquests; in my whole career I would be able to hold all my destroyed devices on two hands [if I saved them], and that's just because two of them were modules that need a hand each.
And I find it strange that some engineers don't. 🙂 Could you explain the underlined: how is it possible? Don't you run stress-tests on your prototypes to find their real limits, but just calculate them theoretically? Or do you use pricy components with a huge margin of parameters (like >100%)? Or do you specialize exclusively in low energy devices?
Well, when I left the one place, we were doing burn-in tests, and over a hundred units or so, never saw anything that couldn't be blamed on something obvious, like a loose or missing bolt, etc.
In particular, I worked on inverter modules of 5kW (up to 400kHz) and 150kW (up to 50kHz) capacity. These were stacked to get induction power supplies up to 50kW, and 600kW or more, respectively. 480V 3ph mostly, with 240V an option for the smaller units.
To your point -- this was lowish quantity, kinda specialized, industrial sort of stuff. Not like motor drives, or uh, solar farms or something; we didn't
have to be especially picky. Whereas in the qty 100k's, you're likely to start seeing spooky failures that can possibly be blamed on semiconductor ratings. Also, conditions slightly outside of normal operation, like unusual swells and surges, in patterns outside of what you'd normally test for.
We did pick up a guy from Rockwell with drives experience, though I don't recall picking his brain on semiconductor reliability. (I do remember one thing he had an opinion on, mask screened heatsink paste -- practical and effective. Though I don't remember anymore if he got the service techs to use masks instead of fingers, when replacing modules. Hm, what a strange sentence, out of context.)
It all sounds really "strange" (unusual) to me: like an exception to the rule, but not the rule.
Well... an alternative explanation would be: mediocrity is the rule, not the exception.

I don't know your experience, you haven't mentioned much yet, so I don't want to go and assume. You do sound experienced. Not having found this particular observation suggests either an early career -- or an extremely lucky one! (Jealous!)
I've seen more than enough things, that are hardly worth an eye-roll anymore. To the best of my knowledge, the 80-20 rule applies in all fields. Like, take the range of experience you see on this forum -- it isn't much different in professional spaces, or academics, or etc. It's recursive, too: you can take the top 20% of, say, electrical engineers, and they'll be reasonably knowledgeable in their respective subsets; 20% of them will be expertly knowledgeable in those subjects in turn. And 20% again, and so on until you're in such a rarefied set that you've found the dozen subject matter experts in, well, some very obscure corner of the field, but damnit do they ever know it well!
Conversely, the 80% might not realize which population they're in, partly explainable by a modest misunderstanding of how one should characterize that set (most people think they are reasonably knowledgeable about the set of things that, well, they
know they know about; and chances are, few will share exactly that set of facts, and I would admit it's a fairly easy error to make regarding the definition of ones' set, assuming something more personal of onesself or less personal about another, y'know?), but also a large part of that is simply Dunning-Kruger at work.
And not to be outdone, the 80-20 rule applies almost uniformly across intersections of sets. Take that superlative dozen from before; they might have as much knowledge of, say, sociology, as any rando off the street. You need to search quite far indeed to find someone truly in-depth in many subjects, and truly broad in general knowledge.
For my part, I would ~guess~ I'm maybe two levels deep (three might be pushing it) regarding general EE and science knowledge; but let me also be the first to admit I'm likely in the base 80% on most general-knowledge, business, cultural, etc. subjects. (I do have my own business, but it's merely self employment, and I surely make much less than I would in a normal salaried position... some might suspect I must be an utter moron! And I'm not saying they're necessarily wrong...)
Anyway, that might be very non-sequitur, but if nothing else, anyone reading can gain whatever benefit this lecture entails, if it should prove redundant or too off-topic for your taste. At least, that's what I like to tell myself when I get into these long posts... I digress...
And the rule sounds something like this: the more times you find a way to kill your prototype, the less chances you left for this to the end user of the end product. In my opinion, this is the only way to make the created device truly reliable and most importantly safe. But probably this is just a point of view, one of the approaches. But yes, theory and practice always diverge, even the parameters values from a datasheet and the reality almost always diverge for the worse (or there is a note in small print about completely unrealistic testing conditions under which the value of this parameter was obtained for documentation). As they say, only for reference.
Anyway, it's certainly
one method of testing. Sometimes it's the only way. But it's an
extremely low information method. I've read many posts/comments from those beginning with SMPS and such, and until they understand the ratings, dynamics and so on, it's just a lot of feeling around in the dark. And that experience can take years (I know, I've been there!). But they're still expected to produce
something. The project clock is ticking. Budget is spending. We need to get
something out the door for
The Show! And, perhaps as much as a mental self-defense as a cultural phenomenon, those buckets of de-magic-smoked transistors become a peculiar badge of pride.
I far prefer high-information methods. I powered something up and blew it; well, that doesn't work, great, but
WHY? So try it at lower voltage, or current, or frequency, see what the dynamics look like, see if something changes at extremes, see if something is getting too hot or too much voltage or current or something. Anything that directly suggests a course of action, rather than throwing more spaghetti.
I will admit that that first 5kW inverter got up to, revision E I think it was. First big project I did out of college. The first two or three were built with the best information I could find -- postings, application notes, a book or two. They
fucking SUCKED.The key factor that was missing, I learned, is that traditional advice is faulty, and in particular,
what is missing from that advice. It is an approximation, and like any other, incurs certain assumptions. Which, sadly, always end up going missing from rules of so many thumb. So when it fucks up, you're clueless why; there's no information to suggest what's right or wrong, or by how much.
And, in that moment, you realize true knowledge is not a statement of fact, but a relationship between them. A relationship that suggests direction and action. But if all you have are statements, and those statements fail -- what do you do?
The particular rule I disproved, is: design your switching loop with "minimal inductance".
The assumptions missing from this statement are:
1. You will be able to achieve a low enough inductance in your layout.
2. "Low enough" is defined by the switching speed of the devices used.
That last point actually says something. It suggests a relation: for a device with some rise/fall time \$t_r\$, the switching loop, which includes stray loop inductance
and device capacitance, must have a time constant \$\tau = \pi \sqrt{L C} / 2 < t_r\$.
I suspect this particular statement is very old, perhaps dating to the 70s or 80s, when all we had were bipolar devices, or maybe into the 90s too when MOSFETs became mainstream. Most applications wouldn't have any way to violate the time constant of the switching loop: a whopping 50nH loop (which might be typical of single-side layouts like in ye olde ATX power supply) with ~1nF devices is some 11ns (that's the 1/4 cycle time, by the way), while the fastest power BJTs might turn off in a whopping 100ns. And why would you push a MOSFET harder, going above 150kHz attracts more attention from the FCC, why worry about it?
But statements get repeated, as they are wont to do -- just another meme, they absolutely exist in professional spaces as well as casual -- again, nothing is REALLY different, we're all humans here -- and people encounter, uh, Learning Experiences, like my above story.
And so, applying this knowledge, I was able to take stock of where I was, and what I could do.
Transistors were something like this,
https://cdn.ozdisan.com/ETicaret_Dosya/587044_5214130.PDFThe design was this: half bridge, two transistors in parallel each side, 320VDC in, 70A RMS out. 4-layer board, so the power, switch and ground nodes are all interleaved as closely as possible. I calculated that the total loop inductance was something like 17nH. It literally can't possibly be any lower. The SOT-227 devices are, well they clearly have a loop area, but they're close ~enough~ to the board I guess, and a whack of TO-247s isn't going to do any better for example. So, truly, I've done my due diligence, inductance is as low as it physically can be, while still using off-the-shelf parts. (And nevermind MOSFET modules -- the best I could find at the time was, as best I could tell, something like 30-40nH to the terminals. Preposterously useless!)
The measurement. When one of the transistors turns on, in hard switching, it becomes a short circuit over 10ns or so, and yanks the full supply voltage across its opponent. This suddenly charges its drain capacitance, through the loop inductance. The capacitance is about 2-4nF ballpark (it's nonlinear, so whether you count the fat or thin end of the curve is a matter of concern!). So the loop time constant is 9-13ns let's say. And we're switching fast enough that substantial energy is going to be put into that LC circuit.
How much energy? Well, the measurement was
80% overshoot with a resonant frequency around 60MHz. Oh shit!
Fuck me, I've never made anything that fast and hot before, and I certainly didn't mean to...
(Aha, on-topic content! There was no evidence of parasitic turn-on, despite the magnitude of rising edge (~25kV/us). Like I said, the feedback capacitance is minuscule, above much Vds. Also good that the gate drive impedance was low. Arguably too low, given how much trouble that risetime is causing, but I couldn't afford a whole lot of switching loss, either...)
So the first thing I tried was just brute-force clamping the voltage spike, because that was blocking
progress. I can't run at full power at all with these derated transistors; I have to make this thing work with 600V transistors, to deliver the nameplate spec.
So I ass-wired some TO-220 diodes across some nearby pads. 600V 8A parts should do it, right? It's like 80A peak, sure, but the average is a pittance, like, <1% duty cycle, that should be fine...
Nope, they blew up right after hitting the switch. Huh. Also, that's one SOT-227 transistor to the pile. (Not all four, because desat protection from the start, remember?

) Or maybe there was another few before this, I don't remember; I might've tested this at 100V and noticed the overshoot problem before testing at scale. I was SMRT like that, even back then...
Okay well, try 12A diodes... runs a few seconds? They.. don't even get hot? That's weird...
Okay, fucking 30A diodes... alright, now it runs steady state.
I think what was happening here is, under such high peak currents, the diodes were suffering from electromigration, or maybe some weird super-high-injection thing with the semiconductor itself, I don't know. They don't heat up, it's not a thermal thing, at least not a bulk thermal thing. They just die. (Wish I'd had an SEM to inspect the die and see how it failed, but yeh, this was 2010, not like today when every other science YouTuber has one.

)
Anyway, with the diodes in, the spikes were clamped to a modest 20% -- that's right, still a whopping ~60V applied to the poor diodes -- and even accounting for lead inductance (about 7nH for TO-220), I'm pretty certain that's largely dropped across the die itself. Forward recovery is absolutely a thing, and you get to see it in full swing at these kind of time scales.
Skip forward a few board revs. I analyzed the circuit, did some simulations, and figured out a solution.
I made two major changes:
1. The module is now H-bridge. Four transistors per board, but they're complementary. This solves problems with wiring (in a half bridge, the load current returns through the power lines, such a mess!), and reduces load current (it's only 35A now).
2. I added a slot in the copper, which DC+/- looped around -- thus adding about 100nH to the supply loop, at each half of the H-bridge. On the input side there's a row of film caps for bypass, then this series inductance, then the transistors in the middle of the board. At the transistors, there's a peak clamp snubber, a diode into something like 0.1uF || 20R, across the supply. The inner switching loop is still fairly tight, maybe 20 or 30nH between transistors and the snubber -- but the extra 100nH to supply allows some room for the transistors to bounce against each other. In return, the capacitor clamps, and the resistor dissipates, that extra switching energy. The overshoot was something like 20%, so that 600V transistors were feasible at 320V DC bus, and 900V at 650V bus (480VAC input).
I think the figure was around 100W switching loss per pair of transistors, at low load current and full frequency (hard switching at 400kHz). Switching loss decreases under load (as ZVS is achieved), then goes back up (as, what (at turn-off) used to be load current through one or the other inductor, has to be dumped into the snubber). I designed it so that the snubber dissipated 100W in the two extreme conditions, with a minimum inbetween.
So that was a, not wholly off-topic story, but it illustrates the thought process I had, in a very real example.
To be clear -- sources of information are best matched by frequency, or data rate or density. By that I mean, consider error correcting methods for example: when a comm channel has very low bit error rate (BER), it's feasible to run with no error correction at all, and simply retransmit when a fault has been detected. This is a huge PITA, it clogs the channel when it happens, it breaks all sorts of guarantees you might've otherwise had about timing say -- but your data
will get there, intact. Examples like TCP/IP. And this is a perfectly reasonable compromise to make, when you need to prioritize correctness, don't need perfect timing, and when the BER is low, these faults happen rarely, so the average bitrate is still very good.
How does this apply? Trying to do early development, when you're still trying to figure things out at all, requires a lot of information. Learning it from single bits at a time is no way to work. That's mine, and everyone else's, beginner story. But the converse is perfectly true: when things happen very rarely, it can be a big PITA to figure out, but as long as it's rare enough, it can be worth dedicating the time to figure it out. (And that's not as if to say there's less value in the tools that help you figure things out; the classic software ticket dismissal "cannot reproduce" is absolutely justified. If you can't slap a debugger on it, what can you really do? The same is true of our meters and scopes, when tracking down strange, rare failures in power supplies or whatever.)
Exactly, nanosecond-scale stuff or very high frequency stuff. That's why I am wary of an insulator of a capacitively coupled MOSFET gate, since even a very small capacitor for this frequency is effectively a short circuit. In addition, a small capacitor is also formed between the heatsink (which sits on the power ground) and the MOSFET tab (drain). And if that's not enough, these spikes also tend to spread all over nearby conductors through the electromagnetic field. Will there be enough energy to damage the gate insulator?
Actually, that kind of helps. The lead inductance is some nH, as mentioned -- or for short enough time scales, we can just as well look at it as some crude length of transmission line, which a lead through air is going to be ballpark like 100-150 ohms. Anyway, that provides a "squishiness", where the die tends to act as one (it's a huge capacitance, remember!), and the wave energy couples out, basically through all three leads in parallel. And maybe the gate drive isn't very strong, there's a 10 ohm resistor to it or something, so not much goes that way anyway, and Vgs basically just stays put while these waves wash around it.
I think they are the same lifetime curve, that's the point. Let's take LEDs for lighting as a simplest example and build a curve of their MTBF vs current. As a result, we will get exactly an exponential curve from just picoseconds to almost infinity (virtually immeasurable decades, without taking into account the degradation of the phosphor). The curvature of this characteristic depends on the specific expression, private variables used and their values. It is quite possible that the curve will be hyperbolic, I haven't dug that deep, I don't need it (since I don't manufacture semiconductors). But I can definitely say that a hyperbolic function is a special case of an exponential function--
I want to add a quick note on this -- they're obviously not the same function, having different words for one, but also, like, analytically speaking. Practically speaking, they may be close enough not to care, and in what we're talking about here (that neither of us really know about for sure), that's as good as any, so I'm not calling you out here. Just that, when there is statistical evidence and theoretical basis for it, the difference is absolutely stark: an exponential grows (or falls) without bound, for an increasing argument. A hyperbolic grows
infinitely to a single point. The real fun is if you reverse the equation. Normally, when you try and solve for points beyond a hyperbolic asymptote, you get some bullshit complex number -- there
literally does not exist, a solution to the equation here.* Whereas the exponential will tell you something obviously useless, like a Planck scale lifetime, but it still "works", mathematically speaking.
*Unless we want to make some stretches about the quantity being oscillatory out there, or in some hand-waving sort of way, geometrically perpendicular? Which you can sometimes get away with, even. But in general, yeah nah, who knows.
But yeah, for our purposes, we have neither data nor theory, all we know is it's some stupid sharp thing. Alas!
Yeah, I wish that too... But this is business, including the programmed failure in design. Sometimes this is really justified by physics: you can't jump above your head. Sometimes this is a deliberate degradation of parameters: it's not profitable to make components or modules/devices that live forever... So, such questionable information is unlikely to be ever published.
Well, it's not so pessimistic for semiconductors, as far as I know. Some things really do seem to inevitably wear; electrolytics for example. They can be made for long lives, but it's always finite; there's no -- aha, I have an excuse for my pedantry -- asymptote in the other direction.
I'll explain this property with another example. Some things do appear to have such a claim: some alloys, such as steel and titanium, have a fatigue limit, meaning, the cycle life of a spring for example, tends towards infinity for some (nonzero) displacement or less; or in general, some bulk loaded up to the corresponding maximum stress/strain. This isn't trivial -- aluminum alloys don't have such a limit, so the cycle lifetime increases merely inversely proportional to loading (or with whatever power law it has).
Stuff like that, I suppose makes one wonder about things like the millenium clock or whatever it is -- metals, under certain load conditions, can potentially survive "forever", so it would seem. Good frickin' luck with wear, even under rolling contact, but yeah, at least cyclic loading isn't a deal breaker, huh?
In a similar way, semiconductors seem to have, at least very good lifetime, if not hyperbolically so, near room temperature. One of the bigger troubles is electromigration, as the conductors in ICs have to be very fine to do their job, thus current densities are high, and lifetime depends on temperature and current flow or dimension. Go figure, it's not the semiconductor, it's the damn wire on it!
Which also means you could have random samples of chips that fail at various lifetimes, due to variation in width/thickness of interconnects for example.
Electromigration specifically, I forget if it's a hyperbolic cutoff, or a continuous (power law or exponential) relation. So there might be that.
I mean, there's a lot we can derive just from the sheer existence of modern technology as we know it -- again a bit of a low-information claim, but appreciate that there are, well and truly, BILLIONS of transistors in, probably whatever you're looking at right now; or somewhere nearby. And they all work together at a, probably <1ppb BER, and most of those errors are soft (induced by noise, radiation..), and total failures (e.g. a transistor stuck hard on/off, broken wire, etc.) are years inbetween. This argument is something of a null hypothesis, setting merely a lower bound on the real figure. But also, you don't see cellphones bricking themselves every day, and you might know -- whatever, dozens, hundreds, thousands of people from whom you're likely hear about it happening.
Nice, you got me! 😃 Well, it’s not an exponent, it’s a power function f(x)=x^2, and it is non-linear as well. My math terminology was always a little lame, especially in a non-native language, duh… 🤦🏻♂️
Ah -- fair enough. We're talking fairly precise things here (well, on the couple occasions when we have, heh), so it pays to get it right. I hope my (long winded) explanations have at least been illuminating..?
Cheers!
Tim