Yes, he meant Vgs(th). The tempco is negative (I think it goes as something like 1/T_abs, but it's usually just expressed as a roughly-constant tempco around room temperature).
Rds(on) is the characteristic of the left-hand (voltage saturation) region on the drain (output) curves of the device. The flat (constant current, linear) region has nothing to do with Rds(on), the voltage drop is >> Id*Rds(on).
Hot-spotting starts like this:
- The die is dissipating heat. The die is a finite area, so some points will be more deeply surrounded by nothing but heat generation (towards the center), and others will be partially away from heat generation (points along the edges). Thus, the temperature is simply lower on the edges than in the center, because there is less power density there. This very broad hot spot (warm spot, really) is inevitable, and normal.
- So the center heats up. So its Vgs(th) drops, causing more current density. The temperature warms up a little more. At low power density, the hot spot is exaggerated a little bit, but it still remains fairly broad. Some manufacturing variation will drive local areas slightly hotter (i.e., points in the silicon with slightly more doping or diffusion or whatever).
- Beyond a certain critical power density, temperature will begin to run away: Vgs(th) continues dropping, local temperature continues rising, and seeded by manufacturing variation, the spot becomes ever smaller, and the center becomes ever hotter. Eventually, that small spot hogs all the current of the transistor (surrounding areas actually cool down during the short moments this occurs in, because their Vgs(th) is now several volts above that of the spot!), and becomes so hot that two things happen: one, the silicon becomes completely resistive, and all transistor action ceases; two, diffusion sets in, scrambling the junction (permanent damage).
If power is limited at this point, you may get a silent failure, where the transistor becomes a two or three way short circuit, but does not explode. If not, then the fault current delivered by the circuit causes the plastic above the die hotspot to decompose and bubble up, the silicon or bondwires to melt, vaporize, and ionize (arc), and after a few miliseconds, the pressure is large enough to blow a hole through the plastic case, or crack it in half, or blast a flaming jet, etc. Either way, the magic smoke finds its way out. During this time, you can count on the voltages on all three pins to be near each other in some way or another, so... if you weren't expecting full drain voltage to appear on the gate terminal ever...guess what part of your surrounding circuit is exploded now?
The difficulty is not intrinsic freedom from runaway: that is a fundamental aspect of both BJTs and MOSFETs. The difficulty lies in that critical power density. In the old days of lateral MOS, the cell density was so low that it couldn't be reached: drain current just wheezes if you try to short it out, and drain voltage was limited by process limitations (it was hard to make reliable transistors over, say, 100V, or 200V*), and P = V*I over that wide (lateral!) area means... the transistors literally just suck too much to blow up!
Modern devices push current density as much as possible, at the expense of anything else: and especially in high voltage devices, there is more than enough opportunity to exceed that limit and destroy the transistor, while staying well below the nameplate rated dissipation. It used to be said that only BJTs exhibit this 'second breakdown' phenomenon, but that was because only BJTs had the power density to make that a problem; nowadays, MOSFETs are more than powerful enough to be dangers to themselves, and exhibit second breakdown.
*Say, are there any old timers here that might remember when sales literature / databooks / magazine ads started proclaiming high voltage BJTs or MOSFETs? Guessing it was during the 60s and 70s. I don't know when higher and higher voltage devices were introduced, would be interesting to get some history on that.
In short: the phenomenon is exactly the same reason why you can't go paralleling transistors willy-nilly, in linear operation (and often, the same is still true of switched operation as well). Internally, a transistor is just made up of millions of tiny transistors all in parallel; they're better matched, and better thermally coupled (silicon is a great conductor) than loose, random devices, but that doesn't preclude anything, it just means it takes more power density to get there. Transistors designed to tolerate this (i.e., usually by designing in the same features that help you parallel discrete transistors -- emitter / source degeneration) will perform relatively poorly under saturated conditions, but will handle the linear range very well.
Both BJTs and MOSFETs are current available in useful sizes for linear use: BJTs up to, say, 10-20A and 200-400V, and MOSFETs up to similar currents, and 1200V or maybe more (with some derating at the highest voltages I think).
Tim