It seems to me that the heater should be attached to an aluminum enclosure say 6 mm thick with 50 mm of insulation. Placing the heater on the LM399 makes a large thermal gradient. Self heating makes gradients unavoidable, but they can be minimized.
I'd been working on this off and on all day. I'll just include it as is.
Some random comments.
The resistors are as much of a problem as the zener. The critical parts of the reference should be in an enclosure made from 5-6 mm aluminum. Two U shaped pieces with external heaters attached to two faces and embedded in insulation should be inexpensive and provide a uniform temperature. Place some thermistors in suitable spots to check for gradients, at least for the prototype.
One might let the user input the expected maximum ambient temperature and set the chamber temperature based on that. But that would conflict with fitting an aging curve calculated from the first 500-1000 hours of operation. Aside from temperature, that seems to me to be the largest error source. There is the question of whether it is better to burn in at elevated temperature or at operating temperature. I'm skeptical of having a stable aging curve with high temperature burn in, but it might not matter.
Interestingly, a 40 x 40 mm Peltier device is under $3 each on Amazon for 10. For a shallow enclosure, two might allow setting the operating temperature independently of the ambient temperature. It's a big power drain, but it might be tolerable for a lab reference if the operating temperature were average ambient. Quite a bit more complicated to design.
Such an enclosure is easily made using a 20 ton hydraulic press and a very simple die set. Anneal before and after forming and then mill the mating surfaces, drill and tap. Seal the enclosure with butyl sealant to eliminate humidity effects.
In reading back through the thread I realized I'd never properly answered the question about how you get 6.5 digits from a 5.5 digit meter. You deliberately add random Gaussian noise which exceeds the resolution of the 5th digit. You take two long measurements and crosscorrelate them. The noise does not correlate so the cross correlation only sees the DC value. It's apparently called "dithering" in metrology. The limitation is the length of time it takes to make a measurement. TANSTASFL.
The early part of the aging curve has the most curvature. It might be that lowering the operating temperature would extend the duration of the large curvature region.
My original premise was that if you built a device of the best, low cost, parts and then measured the behavior during a 500-1000 hour burn in, that you could describe future performance sufficiently well to pick up an order of magnitude improvement. Such a burn in would include temperature excursions to quantify resistor TC, tolerances and aging. in addition to the reference and buffer amp behavior.
Fluke believes that the observed hysteresis effects are due to stress on the die and have patented temperature annealing the die on cold startup on the 7001. I'm not convinced that is sufficiently innovative to warrant a patent, but patents are really just a license to sue others.