IMO, as a minimum, I would take 1 vector at the very beginning of the decay (1 vector being 2 consecutive samples, a tail and a head) for the very small metallic component, another vector a small time after this as the first vector will be swamped or saturated by large metallic components. another vector in the middle of the decay and one towards the end for the ferrous and magnetic components.
You can also take another vector (or at least a single sample) in a very quiet period for ground noise and magnetic field cancellation but as Kleinstein has stated, you can probably get away without the last vector.
That makes it a total of 6 to 8 samples per decay pulse.
I think even a Teensy 3.0 can do it in terms of processing power and ADC speed but do not expect the full 16 bits of useful data until you use a separate ADC.
If you want a faster ADC (80 MSPS) but only 12 bits, take a look at LPC4370, with some good analog front end variable gain amplifier VGA it could also work.
A bit hard to use the LPC as it's BGA so maybe take a look at a board called LPC-Link 2 (can be had for under $30) but not so easy to implement flash based boot on this board as it runs user firmware in RAM but there might be a way to inject your code into flash?
Teensy would be my first try if you have one laying around and if it's not good enough, add an external ADC or go the LPC with a VGA.
I'm not sure about the Teensy 3.6, I don't think this can do 16 bit conversions?