Mind that filters can saturate, depending on materials used.
CMCs saturate in common mode, but diff mode is air cored, so that helps.
Y caps are Y5P ceramic and saturate, but at fairly high voltages (low kV).
If both of these combine, you can end up with significantly higher peak common mode voltage than you were expecting. This can be a good reason to add spark gaps or GDTs around the CMC, or to ground, as is sometimes seen.
X type film caps don't saturate, but do self-heal. Which actually helps, since it's a current sink acting in parallel with the MOV -- yay, at least until the cap is self-healed in half, which does happen over time!
There are a number of reasons the IEC 61000-4-5 surge shape was chosen. The mains network itself has a filtering effect, due to stray inductances (power lines are typically a higher characteristic impedance than the power they transfer, thus appear inductive), filter capacitances (typically for PFC), lightning arrestors (stacks of MOVs), transformers (a ball of wire, having parasitic capacitance and leakage inductance), and other loads (what with SMPSs being the norm, it's not uncommon for a few thousand uF to be clamping a household's mains line under transient conditions). Lightning itself is not terrifically fast; it may be a spark discharge, but due to its length, the 100kA+ is delivered at a fairly relaxed rate (many microseconds).
Overall, especially to differential mode -- a typical mains input DC rectifier circuit has good filtering effect, which greatly helps the MOV. The increased source impedance (mainly due to the CMC's LL and DCR) absorbs a lot of surge voltage, when the MOV is placed after it.
Tim