Yes the prob before was where you where putting the feedback cap across R6 ,Its not a very good place to put it (becuase your taking it from the output of the second opamp in the loop and it's output is also phase lagged,so then the cap wont be very effective). I didnt take the second opamp into account in my previous reply .
Heres how I would get it stable ,You put the compenasation cap from the output
of U4 (not the output of U2 as before) to the - input of U4 .So it's in parallel with your D1. (D1 also has some capacitance of it's own about max 4 pF but this will reduce with it's reverse voltage), .
This cap (Cf) combined with R2 and R6 rolls off your U4 bandwidth ,the larger Cf is the slower your opamp output rises but the greater your phase margin (upto a point) so increased stability ,you optimise your speed vs phase margin . Your crossover (loop gain =1 ) frequency for this opamp with a gain of -1 will be approx 1/2 pi Cf (R2*2) , so reducing R2 and R6 from your original 10k to 1k allows you to use a larger Cf for a given bandwidth ,(Requiring a larger Cf of ~20pF rather than a couple of pF means is good since your final circuit wont be effected as much by small pF size parasisitc capacitances like that from the diode and pcb tracks).
Heres the response using the values I have used to a fast input pules ,notice no overshoot and quite a good rise time on the ouput of ~ 150 nS .
Heres the bode plot (plot of gain and phase shift around the feedback loop) of this circuit ,if phase shift reaches 180 and Gain is > 1 that would be oscillation point .Just slightly Less than 180 and it will still ring .you can see a phase shift of 151 so we have about about 30 deg phase margin, which looks to be ok going from the earlier input pulse transient response test.
Heres another bode plot of what happens when we put Cf directly across R6 like on your first post . You see phase margin is only 2 deg here .