Let me continue here with the ADR1001#1 vs HP399 (inside my 34401A) and vs LM399H#2 (inside my 34401A as the replacement of the HP399).
The old HP399 is pretty hairy - it pops and cracks couple of times every 12 seconds period on the o'scope screen when looking at it with my Noise Indicator (below see some shots). Its background noise is rather low, and perhaps once in many hours staring into the scope I saw a popfree clean 3.1uVpp which is at the best end of all my 399s.
When the HP399 was off the meter I replaced it with a LM399H#2 which showed me no pops in past (as seen with the noise indicator).
Below the results while measuring my ADR1001#1 based 10V reference (with also none pops while in Noise Indicator).
HP399 - it pops and cracks always, wildly, like radon gas in the cloud chamber, but it is "pre-selected by HP" and 25+ years in service, therefore its ADEV (pink) shows lower sigma with long taus. The amplitudes of its fluctuations are lower than those of the LM399H#2 it seems.
LM399H#2 - it "cracked" only 6 times over the night run

, amplitudes of the fluctuations are a bit higher.
Its ADEV (blue) is better at lower taus, worse at the longer taus compared to the HP399.
You may also see how a running STD can nicely detect the pops and cracks in the incoming data stream (like you staring into the smartphone at the incoming numbers and you see the STD is 1.5uV - it means something happened during last N samples)..
Conclusion - we need a Popcorn Indicator
