Good graph and comments. In reply:
1.
Yes, these all are short term sources; < = single digit minutes of stability; thus its important to have the "standard" DMM side by side with the DUT to check for drift. The variance function can help confirm drift is occurring, but since it doesn't readout in real time you'll need to rely on LSD variations.
Yes, electronic sources are noisy for this purpose.
I used stacks of good quality NiMH batteries, [eneloops]: 8 cells ~ 10.0VDC etc.,, to compare 1 & 10 VDC ranges. There is a slow 10 uV drift down the more batteries are stacked [ each cell isn't at the exact same potential as others in the stack, so stronger charge the weaker ones ] but hold long enough to compare the DUT and the standard down to the uV. To reduce voltages further I use a Kelvin-Varley voltage divider.
For the 100VDC scale, I used as many batteries as I had then I reconfirmed the reading at full scale with an electronic reference to 100VDC and then to 1kVDC, comparing both DUT and standard meters continuously. It isn't as stable as the battery, but a point is to keep the DUT and standard as tight a reading possible to reduce error, see #3.
2.
I'd insure all DMMs connected together are first working properly individually, particularly after a repair. It may not be accurate without calibration, but the repaired meter should be precise. Once the user is satisfied all is well, then connecting them together should not be a problem unless you are concerned at the uV level.
On low voltages, once other low level issues are stable: temperature, Seebeck, etc., you're still left with the leads picking up EMI, so the less leads are involved, and the shortest they can be, is best. To reduce pickup, all uneeded additional connections are avoided.
3. There are problems even if the calibrated meter were identical to 3456a in accuracy, giving a TUR of truly 1:1. The theory of how this uncertainty is minimized is that each measurement comparing the DUT and the standard must have as little difference as practical, but tighter than allowed by the manual. This is done by reducing the limits of a specification that the meter is allowed to vary, the idea of a guard band.
http://www.agilent.com/metrology/pdf/guardband_beginners.pdfFor example, at 24hrs, at 10V the acceptable range is 9.999 90 to 10.000 10; this is 8ppm + 2 digits. To adjust it with a TUR of 1:1, the acceptable range must be reduced >= 4:1, which is range spanning 2ppm. So the acceptable range after adjustment is now at least ~ 10.000 00 to 10.000 04 relative to standard used. The Agilent papers details how to make a more accurate calculation of the new limits, but its easier to keep the difference nearly nil between the DUT and the standard, so you can keep making as many subtle adjustments until the DUT and standard are measuring as tightly as possible.
A simple test after adjustment is to measure the same same transfer reference simultaneously, say at 10V. After some time, both meters should have ~ same mean and variance.
Given the 3456a must already be out of cal to warrant adjusting it, which would be best, leaving it out of cal, or approximating the 34401a? The 3456a precision in unaffected. Regardless of TUR ratio, the best that can be achieved is the accuracy of the lesser unit: if the standard is better than the DUT, the best you can achieve is the spec of the DUT; if the standard is less than the DUT, then at best the DUT will be as good as the standard.
1. Even after warm-up, short-term (5 minute) fluctuations of some signal sources/power supplies I tried were 50 ppm and 200 ppm, .....
2. What kind of interactions are you talking about? A defective DUT having a much lower impedance, loading the voltage source? A DUT putting out a bias current much larger than specified? Unless you're talking about huge amounts of current or very low voltages, two parallel connect meters should see the same voltage.
3. Another 6.5 digit DMM is likely to be worse than 1:1. For example, 1 years specs for HP 3456A compared to HP 34401A are 23 ppm:35 ppm, or about 1:1.5. A 34401A calibrated within the last 90 days would be a little better than 1:1, and a 24 h calibration interval would give you 1.5:1. Much better instruments are needed if you want to calibrate the 3456A to 90 days or even 24 hour specs (8 ppm for 10 VDC).