Thank you megajocke and Kevin for your replies, you've been really helpful. Sorry I haven't gotten back here sooner.
Ferrite beads were the key to the high frequency oscillation. The high frequency kicked in when I had four FET stages loaded but was absent with fewer in many configurations, so I'm sure Kevin's article is right on the money.
The RC op-amp feedback is definitely more effective than the cap alone, but I found I had to go a bit higher than 1nF.
I've got two of us working on this, converging from different directions. The other engineer has something that is stable but so far has only been using two output FETs. He's got (smaller) resistors instead of the current source and hasn't change the pull-down transistor configuration:

I don't really want to ship with only two output FETs. While the power dissipation would be okay, I'd prefer to spread it across four devices as the heatsinks will be more effective. I'd also like to understand and solve the problem so that when it comes to do the 20A version we don't have to go around again.
I'm working through with more of the suggestions in this thread. I've removed the diode following the op-amp and am running the op-amp from just the +12V supply. I've kept the resistor between the op-amp and transistor because it prevents the op-amp from going into current limit in unregulated conditions and helps prevent cascading component failures if, for example, Q2 fails and the bus voltage appears at its base.
I've reduced the drive current to about 25mA. This was because the experience with the mico-U heatshinks I could fit tells me they're good for about 2W power dissipation. I've kept the emitter resistor on Q2 but made it smaller so the current the charge the FETs is the same as to discharge.

Unfortunately this is still unstable. Under load, I get 10kHz oscillation. The frequency stays the same but the amplitude grows a little with load.

I think I'm getting close, and just need to combine the two configurations, and I'd like to experiment more with the values of the Rcomp and Ccomp.
My main remaining question is: how would I hook a generator and scope up to see the phase and gain results? Do I perturb the reference or the feedback path? Or something else? Is there a way of using a spectrum analyser with tracking gen to sweep through and give me some more insight? (Unfortunately I don't have access to a vector analyser, so I appreciate I could only see the gain that way)
