Funny, because I have a 700VDC/1600A locomotive DC drive that says otherwise. I used film capacitors for both bypass and reservoir duty on the input, a laminated bus structure to minimize the inductance of the capacitor-switch-diode loop and even without any snubbering/damping of the switches or freewheeling diodes the output waveform is totally free of ringing with a max overshoot during switch turn-off of about 40V at 1000A output. PWM frequency is only 2kHz but turn-on time is in the 250ns range, with turn-off time a much more leisurely ~1us (both times are about as fast as the 1200V NPT IGBTs can manage).
Heroic efforts ("laminated bus structure"), slow switching (>250ns), low operating frequencies (~2kHz), and huge filter inductors (of course, implicit in the motor in this case, so not a waste of space, which is nice!) basically prove my point.

40V overshoot means you had on the order of (40V) / (1000A/0.25us) = 0.01uH, which would be quite good for modules. About as low as you can get, and indeed, a laminated bus will only add maybe 3nH, so this isn't unreasonable.
The real value is probably more like 20-30nH, because the switching doesn't occur instantly (dI/dt isn't instantly there and gone, but itself ramps up and down), and because capacitances, Miller effect and such slows down dV/dt and adds dampening. This is absolutely typical of modules, so the example is consistent.

The junction capacitance will be on the order of 5nF, and with a 20nH loop inductance, you have a 1/2 wave commutation period of 31ns, well under the risetime of the transistors used.
The same setup would be okay for switching at >2 ohms (e.g., 700VDC and 350A) with faster transistors (assuming no added capacitance) and a higher switching frequency, but would probably need to be derated a bit in the process. (We are talking dozens of kW, so a small hit in efficiency is a big deal.)
Those modules are many times larger than discrete parts, so it should be no surprise that they
can't be very fast. Indeed, even 1200V IGBTs can be made faster: e.g., IXYH82N120C3 claims 93ns t_f. But it would be foolish to release a module that can't physically handle its own performance, let alone whatever the customer might bolt onto it. (Which is to say, a laminar bus structure isn't at all necessary: parallel solid bars will do almost as well, and are cheaper to design and fab in small quantity.)
For motive power applications, it hardly matters. There are many exceptions to my previous post: if you are space constrained, but not inductor constrained, that's one right there. If you're not space constrained, that's another. If you're not efficiency constrained, that's still another (RF amplifiers, usually in the 50-80% eff. range, being the realization of this).
I'm not concerned about things that are unconstrained. That's boring and uninteresting!
I've mainly been using Semikron modules the last few years (which mainly use Infineon dice) but Fuji's modules are pretty good, too. Powerex modules (which use Mitsubishi dice) are pretty slow and tend to have a higher forward drop, relatively speaking, but I totally agree that Microsemi's modules are terrible.
Yeah, Semikron, don't know why that didn't come to mind yesterday. Also Eupec (now Infineon).
It is always possible to minimize stray inductance
My point is that it's not, and using one example (which fails to meet the criteria for a tightly constrained design) isn't disproof at all.

Take a look at GaN FETs, for instance. They have a tiny fraction of the capacitance that Si MOSFETs do, and switch in fractional nanoseconds if you want to! It is physically impossible to place a ~5mm DFN style package, on a PCB of commodity tolerances and stackup, and to get a loop inductance low enough to implement a low impedance (say, <30V, >30A) inverter. The board itself is too thick, and component packages are too long!
I've seen many appnotes, even just for silicon transistors, where they try to do the same old thing, try and try again, and fail miserably. I like to use this one as a negative example:
http://www.ti.com/lit/an/slpa010/slpa010.pdfSee how the "optimized" layout (waveform Fig.11) accomplishes nothing? Or damping (Fig.18) deals with the ringing, but not the overshoot.
The author (who I'm sure was just a summer intern, so...) didn't realize that, if he had instead increased the loop inductance, and clamped its flyback voltage with a diode, then both the peak and the ringing would be controlled, with little impact on efficiency (or even gaining benefit).
Other absurdities abound; I have an LTC3810 demo board that makes <5ns pulses, nearly the full height of the supply. The common mode noise extends pretty much everywhere on the board.
Dev kits might also be designed by interns, but I'd expect more from FAEs; the best ours could offer was "put a ferrite bead on it?".
These are the reasons I am still in business, and quite well at that.

Welp, now you've totally gone off the rails on a crazy train, to paraphrase Ozzy Osborne. No one minimizes loop inductance because a few volt-microseconds might be lost, one does it to minimize the amount of energy stored in an inductance that for certain portions of the switching cycle will be *unclamped*, and thereby reduce overshoot and/or ringing when the switch (or fwd) turns off (the latter only being an issue in discontinuous current mode operation). Indeed, the sole reason for needing to put a snubber across the switch in the first place is because of stray inductance in the input capacitor-switch-fwd loop!
See, this is the misunderstanding that so many people face:
The loop inductance is an integral part of the switching circuit.
There are two regions you can design to: "minimize" or "optimize".
To ignore either, results in poor efficiency, or outright destruction.
"Minimize" works much the same as an RLC circuit. Consider the waveform of discharging a capacitor.
If it's electrolytic, and the lead length is short, there won't be any ringing (ESR >> sqrt(L/C)). The current rises quickly to V/R (but not instantaneously, because L is nonetheless present), then discharges along an RC exponential decay.
If it's a film or ceramic cap, it's very likely that the component body length and terminals contain enough stray inductance that sqrt(L/C) > ESR, and it will ring down.
By the same way, if you have Zsw (= Vsupply / Ipk) much larger than sqrt(Lstray/Coss), and switching speed much longer than pi*sqrt(Lstray/Coss), you won't have significant overshoot or ringing, and while you won't be at maximum efficiency, you won't have excess noise and overshoot. You might not be able to increase efficiency any further, due to available choice of Vce(sat) and t_r, t_f (which was true back in the days of BJT switching circuits, when those old ideas were widely circulated).
However, for an 'optimal' case, you should match these quantities, and since you are storing an important amount of energy in those reactive components each half-cycle, you must ferry that energy back and forth in a responsible manner. As you mention, quasi/resonant snubbers can do this, or you can use diodes to clamp it. Even simply burning the energy into a resistor yields advantage: a dumb RC snubber allows the transistor to turn off more completely at a lower voltage, without dissipating too much energy itself, resulting in slightly less total losses (about a 10% reduction in switching losses). See:
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4158300&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4158300If you want cleaner waveforms, dV/dt and dI/dt (RCD and RLD) snubbers can be used. They can also be designed to dump the excess energy into a supplementary supply rail, which can be "stirred" back into the main supply with a secondary converter. Or you can use quasi-resonant types, but these consume excess switching capacity (nearly doubling the peak current or voltage demands), so while they remain simple, they aren't always the best option.
Tim