Enhance for readability:
Don't leave current protection as an afterthought -- and don't just expect the circuit to behave itself, because most likely it won't! Provide limits for it, right away, so that everything is bounded to only the range of values it needs to cover. Then, determine the amount of gain required to transform from one bounded range (namely, the op-amp output) to another (the driver/follower).
So, if your op-amp has a 10V range, you need T1-T2 to amplify that to 0-40V, say. T1 can cover the full range, and if it does 0-1mA over that range, R3 should be 10k. (As shown, it's got a gain of 15, which is *way* more than the total gain needed here, so you're already inviting oscillation with that!) Now V and I are bounded, so there's no worry about limiting anything!
T2 is tricky. It has to have voltage gain, because of two things:
1. You can't waste gain on emitter degeneration, because that would waste saturation voltage. That's why R4 is small.
2. It has to drive T3, and not much else. The divider R5-R6-R7 is high resistance, and bootstrapped by T3; any additional load at Vo will directly affect voltage gain.
So we should probably not design this stage as a voltage gain stage, anyway. It's a current source.
Supposing we want, say, 3A output current, and if T3 min hFE is 20, then we need max 150mA from T2 (which by the way, will have to dissipate up to 6W, and T3 up to 120W, so you better use beefy devices with lots of heatsinking!). If we want max 1V dropout from T2, then we have to use R1 = (1.7V) / (1mA) = 1.7k (so T1 actually has <1 voltage gain, but that's fine, it's only doing level shifting), and R4 = (1V) / (0.15A) = 6.7 ohms. (4.7 isn't far off, and 6.8 would be the closest standard value.)
So that looks pretty reasonable. Now for compensation. We need to get the loop voltage gain under 1 by the op-amp's GBW (and probably sooner than that, due to phase shifts). Cx3 is probably the best candidate, because it's where the voltage gain is happening, and by applying NFB here, we can somewhat turn the CCS into a CVS, which is good for Zo, and for keeping gain stable against load variations.
If the loop gain should be 4, and the divider is 1/4, then we should have unity gain around GBW, let's say 3MHz if it's something like an LM358. T1 drives 1mA/10V of transconductance into Cx3 (which we're assuming is dominant at this frequency, so we can ignore R1, and assume T2 is simply following along as an integrator due to Cx3), so we'll have a voltage gain of 4 when X_Cx3 = 4 / (1mA/10V), or 1.3pF. (Which isn't much, so we might increase T1's transconductance. Which should probably be done anyway, because 1mA max assumes T2's min hFE is 150, which is plausible but kind of optimistic.)
To help with phase margin, some Cx1 can be placed across the op-amp, but with a resistor in series, of value comparable to R6 || R7 (give or take how much gain/phase margin is actually needed; takes some tweaking to decide).
So the full changes should be:
R3 --> 1k (let's say)
R1 --> 220 ohms (optionally, with a diode in series to get a more linear transfer function on T2)
R4 --> 6.8 ohms
Cx3 --> 22pF (for starters?)
Cx1 --> 10k + 100pF?
If you also want to try a Cx5, put a resistor in series with it, so it doesn't dominate over Cx3. This gives you a lead-lag compensation, where the two RC networks can have different time constants, kind of twisting/shearing the Bode plot, allowing for more phase margin, or compensating much more ornery loads (such is common for voltage mode SMPS, where the two pole (LC) output filter is fully in the loop).
Tim