Apologies for the delay in replying!
Happy New Year, BTW!

What power supply are you using for both cases + what is the input voltage in both cases ?
I'm using the same bench (regulated) power supply for all setups.
The input voltage is the same 15V for all setups.
Are you consistently getting the same readings + are the readings stable ?
Readings are stable per power cycle and, apparently, the same across power cycles.
Although, today, I am getting a different reading compared to before (although, still consistent during and across subsequent power cycles): 4.975V with the Gossen meter.
Do you have multiple AD586 that are doing the same things ?
I have another AD586, but I have doubts the current one is faulty, since the behaviour is different across different multimeters, not with a given meter.
And I'm trying to find out what difference between these meters can explain the difference in readings.
Try loading the reference with a 50k load and see if you get the same results.
I might try this, but what's the objective with this (since some difference in meters influences the readings)?
What happens if you hook the Fluke 8842A to the voltage reference output and then measure the output voltage one by one using the other multimeters. Does other meters affect Fluke 8842A reading too? Can you hook multiple multimeters to the voltage reference output at the same time and how the readings differ then?
I tried this and the interesting thing is that connecting any of the other meters at the same time as the Fluke 8842A did not affect the readings of the latter.
It's quite the opposite, actually: the Fluke 8842A seems to impact the readings of the other meters (the other meters showed 5.000V).
Any ideas why?
Do you have the additional noise reduction cap connected? Add a 1uF film cap from pin 8 to GND. It's probably noise that your DMM's are picking up.
No noise reduction cap.
I am aware of the function of pin 8, but I'm not convinced it's noise we're dealing with here (might be wrong, though).
The reason I'm not convinced: I've added an LT1001 op-amp connected as a buffer to the output of the AD586 and guess what reading I'm getting from the output of the op-amp (with all my meters): 5.000V.
If there was noise on the output of AD586, I would expect that to have been propagated by the op-amp, right?
Also, the fact that using a buffer on the output seems to make the issue (i.e. difference in readings) go away, I am inclined to think the problem is with load introduced by the meters...
But, again, all my meters in the list above have an input impedance of ~10Mohm...