Assuming the description is accurate to the events, it seems most likely a surge occurs when the fault clears, whether because the material itself blew out, or the breaker did first. In either case, some transient voltage could result, and it would be propelled by up to the fault current rating of the circuit -- which at 415V could be quite high indeed (100kA?). Presumably it's a lot less than that (either it's not actually giving a full-blown arc flash discharge, or you're a very understated kind of person..!). More likely the arc remains conductive down to some cutoff current (10s A?), and only then the transient goes off?
If the phenomena is this way, then the inductance in play is mainly the wiring between pad transformer and switching location, and that will likely give a fairly short duration transient,which a MOV would be capable of handling.
It might even be enough to add an SPD inline, near the bus bar connections say. That would avoid modifying much circuitry, and utilizes approved equipment. (Install according to recommendations and code, of course. Maybe this isn't an appropriate location; maybe it'll be less effective if placed elsewhere, no idea.)
As for the relays, if they're dying to overvoltage, the easiest way to deal with that would indeed be a MOV, but considerable current could flow through that path. A series resistor (of wirewound and pulse-rated type) could drop negligible voltage under normal conditions (e.g. if the coils are 100VA, that's about half an ampere, and 5% drop is 12V, or 24 ohms (say 20 or 22 for standard/common values), and 12V * 0.5A = 6W so a 10W+ resistor would suffice, and then even a small MOV will clamp transient voltages at the coil (dumping the difference across the resistor, hence the pulse rating). MOV size would then be determined by duration of surge/fault -- absorbed energy. If it's only arcing/inductive kick as speculated, probably even the smallest MOV would do (7mm 275VAC something or other?). If it's actually more of a cross-wiring fault (415V appears across the coil), there could be sustained overvoltage (10s, 100s ms?) which might want a larger MOV (how much larger, isn't so obvious though).
It seems like, if it's sustained overvoltage, it would have to be seconds, minutes even; and that should be evident on autopsy (windings are burnt/melted). Hrm, well, careful; if the winding breaks down due to overvoltage, you'd have to go digging for the initial spot and determine (after the fact) if it was dielectric breakdown or overheating. Well, overheating should be the whole thing, whereas breakdown would spare (short circuit around) part of the coil. That might be easy enough to see..?
Also some small fast-acting fuses prior to the relay coils. When the MOV clamps down on the voltage, the fuse blows before either the relay or MOV get damaged. Right now the MOV are likely burning up and not doing the job since they are trying to dissipate far more power than designed to. Adequate fusing could reduce that to just fractions of a second and hopefully spare both the relay and MOV, put a new fuse in and you're up and running again.
A semiconductor fuse plus a beefy industrial surge protector is probably the best bet for protecting the relays, although having to replace a fuse still means having to open a cabinet which might reduce the benefit versus a solution that would let you just reset the breaker and carry on.
Indeed, semi fuse would be the fastest option; that can clear a fault in less than a line cycle (typically ~1ms at fault currents!). With a typical fuse, one waits a cycle or two for current to pass through zero, and then the fuse opens.
This has the advantage that it's not vulnerable to continuous breakdown, as my resistor+MOV suggestion is; the disadvantage is, if this is a regular event, fusing may be a nuisance.
(Mind, the resistors should be fused too, these aren't exclusive options! Thermal fuses could even be used to protect against MOV failure -- typically the MOV fails short, drawing fault current, in this case limited by the resistor but rapidly overheating it. Thermal fuse on MOV, resistor, or both I suppose, would be something to consider)
Tim