We can analyze this readily with analytical tools, no need for simulations.

The op-amp is stable down to a certain noise gain. At unity gain, it has whatever phase margin. For the TL071, something like 60 degrees. (Or if it's not a unity-gain-stable type, then some phase at whatever specified gain.) If we introduce more gain to the loop, the margin goes away, and it at least peaks more, or oscillates outright.
And this is just for instantaneous (unlimited bandwidth) gain. For real gain elements (like a transistor), there is some additional phase shift, which reduces phase margin even further.
The transistor has voltage gain. With a grounded source (more or less), the voltage gain is close to R4 * gm (it's less by the output resistance, but that's generally quite high for FETs (channel length modulation factor) so we can ignore it). With a source resistor R8, the maximum transconductance is 1/R8; however, the device's gm acts in series with R8, to realize the total equivalent.
The hybrid-pi model of the BJT uses r_e, a current-dependent resistor in the same position as R8, to represent the BJT's transconductance. We do precisely the same here, except r_e * h_fe is not reflected back to the gate terminal as r_π, instead we only have gate capacitance and strays there. Huh, I forget if they still call it a "hybrid-pi" model when it's a FET, but in any case, yeah, it works the same with these changes.
R8 usually dominates under class A bias conditions, because gm is usually large. At ~0.7A as shown, gm looks to be around, oh, 0.25 S, which is comparable (larger, though not by much) to R8's conductance of 0.2 S. The series total is 9 ohms or 0.11 S.
The voltage gain then is around 9, which isn't astonishing, and is actually more than matched by your feedback network.
That leaves the capacitance, which I would guess is dominant right around the opamp's cutoff, hence the hit to phase margin.
You can extend phase margin by strategically dulling the op-amp -- add an R+C from its output to -in, and adjust values until step response is ideal. With a unity-gain-stable op-amp, this should give better performance than using R11.
The ideal values of the R+C will depend on capacitances as well as gain. All of which are dependent here, so you can't compensate it very effectively over the full output range -- it will always be slower at low voltages (where capacitances are higher) and high voltages (where drain current is small).
Noise gain: the, well, gain of the op-amp's noise. Typically, the feedback ratio. Note that you've strapped a resistor across the inputs, which increases the noise gain without affecting the signal gain. This does exactly what it sounds like it does -- noise goes up, including offset which is just DC "noise". You've found this to be necessary, for precisely these reasons -- you need to reduce the amp's performance (unfortunately, this necessarily includes the noise floor too) to deal with the transistor gain in the loop.
General note -- collector- or drain-output amps aren't popular among amateurs, because they're harder to design and aren't really called for by their applications (mostly driving speakers, which are designed for very low source impedances, fractional ohms). Every op-amp with rail-to-rail outputs, though, is such a type!
On that note, about output impedances -- the TL071 will itself have a follower style output, and an impedance around 100s of ohms around fT. Consider increasing R7 to suit -- that way the op-amp has a reasonable fraction of its own output to read, without being completely swamped by gate capacitance.
And to improve gate drive speed, you can add a zero-offset follower if you like, or use a gruntier amp. Using a newer transistor also helps, since MOSFET performance has improved by about 5x since the days of the IRF640 (though this comes at the price of smaller die area, so the power dissipation ratings tend to be lower for the same V,I ratings -- YMMV).
Tim