This topic is dedicated to the discussion of the various methods to document stability of our voltage references.
Each voltage reference deserves a few graphs and statistics so that we get to know them better.
But what is appropriate for sub ppm voltage measurement? HP3458A?
Maybe. But I think we can do better at much lower cost. Use more than one reference and measure them back to back with a 6.5 or 7.5 digit DMM.
I currently have 2 LTZ1000 based references. One that has been on 24/7 for over 2 years and one that has been on 24/7 for almost one year.
Measuring them back to back for almost 2 months and calculating the overlapping allan deviation result in the following "fingerprint" of the combined stability of the 2 references:

Note how the graph spans from 40 milliseconds to almost 2 million seconds.
Actually this graph is combined from two separate measurements sessions:
1) 360,000 samples (NPLC=100, autozero on) with a sampling period of 10 seconds.
2) 92,000 samples (NPLC=1, autozero on) with a sampling period of 0.04 seconds.
as shown here

The graphs are combined at t=100 second.
A lot of other graphs could also be relevant to show:



Unfortunately pressure and relative humidity sensors was not ready until the last 300 hours.

When you have this amount of data available some data aggregation seems obvious.
Let's down sample data from T=10 second to T=1 hour (mean value of every hours of measurements):

and we can go even further. Here the mean value of each day is shown:

And then there are statistics of all sorts:

(disregard pressure and humidity due to missing data)
Some are probably useless but looking for sub ppm improvements correlation figures comes true.
And then there is TC measurements but I'm currently working on a setup with two thermal chambers - one for each reference. More on that when ready.
I hope this post will bring more focus on the actual characterization of all your reference builds.
Show me the stability of your voltage references below!