0. Are you after THD exactly, or THD+N? Because if I understand correctly, you already have the latter.
More useful of course would be THD+N in 20-20k Hz bandwidth for example, or just plain old THD (i.e., the fundamental and harmonics only; but not IMD, but that doesn't exist in a single-tone test so that's fine).
1. Note that the output impedance is probably capacitive, and a speaker (if you're using it as load during this test) isn't very much of anything, it's all over the place. So to get a reasonable filter, you will probably need a constant-resistance type, or at least, informally, enough damping (at one or both ports) to keep it behaved.
2. Looser tolerances are found in lower-order filters, and the softer rolloff types (Bessel or Butterworth). You can probably get by with cored inductors just as well, too (but mind that they can introduce some distortion, so if you're down in the 0.01% range, you may consider air-core after all).
The filter order, and to a lesser extent the filter type, determines the attenuation beyond the cutoff frequency. All [all-pole*] filters of a given order, have the same cutoff asymptote (-20dB/decade/pole), it's only shifted around depending on type. Your signal bandwidth, then, sets what order the filter must be (for a given type).
*There are other types; if we allow a zero in the response, we can notch a specific frequency, or mold the stopband as we please; but this comes at the cost of less asymptotic attenuation. In the extreme case, the elliptic (Cauer) filter has as many zeroes as poles, giving actually a flat stopband; such is the price paid for a wickedly sharp cutoff. There are also filter types of historic interest (m-derived) which are easier to design than modern (analytic, pole-zero) types; but, they require more inductors and capacitors for the same overall response.
To get -80dB at ~400kHz, that's four decade-poles; a single RC would thus have to roll off at 40Hz, which I'm guessing is a bit silly. To roll off by 20kHz, you'd need a bit less than a 4th order filter; maybe a bit more (5th?) if it's a bit on the loose side (closer to a Bessel).
3. To apply damping, the dumbest, easiest method is simply resistive attenuators. Put a 3 or 6dB pad between the amplifier and filter, and again between the filter and load (if it's a reactive load like a speaker). This is a tee or pi network of resistors, set so that the input and output resistances (when the other port is loaded by system resistance) equals the system resistance. So, in this case, 8 ohms. Easily calculated:
http://www.chemandy.com/calculators/matching-pi-attenuator-calculator.htme.g. 8 ohms, 8 ohms and 6dB requires 24, 24 and 6 ohms. (The network is always symmetrical if the input and output resistances are equal.) Or for tee, 2.7, 2.7 and 10.7 ohms.
If you can't afford loss in the circuit, then a parallel R+C and/or series R||L (Zobel network) can be applied, which has no effect at low frequencies, but which introduces losses around the transition frequency, and this works in the same way as the attenuator, making the filter less sensitive to the amplifier or load impedance. Downside is the filter response gets softer (or maybe lumpier) as a result, so some tuning should be done to keep a reasonable response. (Set up the circuit in SPICE and keep poking values until the response looks right, whether the source is a voltage source, resistor or current source.)
Tim