You can think about it in two ways: Energy is stored in the parasitic LC network, or based on transmission line theory, as reflections, whichever suits you.
Overshoot can be as much as 100% in worst case, and can easily break things. This is also bad for EMI.
It's not as much about impedance matching, even more importantly, it's about edge rate control. Meaning, you have quite a lot of (multi-)gigahertz content there you don't need for your slow fundamental 36MHz clock. Filter the high-frequency content away (meaning: smooth the edges to be less sharp). To do this, you add series resistance, which acts as RC filter together with the receiver's parasitic capacitance.
Or, you can think it as a transmission line thing. Your source sends the wave. The wave reflects back at the end. Right at that moment, the voltage at the destination is exactly correct (thanks to this reflection). But the wave keeps going, and reflects again from the source. And again, and again. In such point-to-point links with a single destination, series termination (resistor at the source) is typically the way to go. A resistor equivalent to your trace impedance eats the reflections. Now, assuming a 8 mil trace on a 1.6mm 2-layer board, your trace impedance is 145 ohms.
Put some 100-200 ohms there and see what happens.
Too much, and your clock edges will be so slow it either stops working completely, or is more susceptible to noise and jitter.
How long is your trace? I'm surprised you are having such a large issue if the board is very small. Maybe the MCO output is unnecessarily strong; I don't remember if it can be adjusted or not (like the IOs can with the OSPEEDR register), but if not, the external resistor is the only option.