I've thought about this for a completely different application. In this case, it's low-voltage FETs (sub 100V) switching hundreds of amps each. The topic of cooling has been considered, as better heatsinking allows for fewer devices- keep in mind this application would have used many modules and tens of thousands of these devices- each module supplying about 10 kA at about 50 volts for short pulses, with perhaps up to 100 modules running in parallel.
One idea involves soldering them directly to a liquid-cooled copper bar. In this case, because it would be pulsed, the liquid cooling would mostly be used to cool the bar down between pulses (very low duty cycle). The bar would also double as the electrical connection, so obviously this won't work where the tab needs to be insulated.
In order to solder them, the heatsink bar would be uniformly heated to some temperature above the melting point of solder (leaded in this case). It would be tinned, then the many devices get applied all at once. The bar could then be cooled, hopefully before damage to the devices occurs. This would not be a trivial task, and it would require a lot of careful preparation and control over the process. It has never been tried, but could conceivably work. Smaller heatsinks would be easier to solder to.
The project above is unlikely to come to fruition at this point, but the cooling concept could still work.
My suggestion? Unless you have very good reason not to do so, just add some more devices. Minimizing thermal resistance is often a very tricky task. There are situations where it is necessary, but it's best to avoid needing to do so if possible.