Let me share an update of my measurement results (10V output, so we don't look at the raw zener voltage).
Any shift of the signals can be explained by either turning on/off instruments, removing equipment from the setup, changing batteries and the like, so what really counts is the overall behavior of the references, not the details.
What can be observed is a downward drift first, a change in sign second, followed by a steady state at the moment, while we have to be patient how that evolves next.
My current suggestion for a future experiment is to have three ADR references running at different oven temperatures:
1. running ADR oven at its zero t.c. temperature (~50 °C, that is a resistor ratio of about 11.5:1 for the datasheet circuit or about 0.52 V for the circuit using the temperature sensing transistor as a diode)
2. running ADR at 75 °C oven temperature (that is a resistor ratio of about 13:1 for the datasheet circuit or about 0.465 V for the circuit using the temperature sensing transistor as a diode)
3. running ADR at 100 °C oven temperature (that is a resistor ratio of about 14.6:1 for the datasheet circuit or about 0.41 V for the circuit using the temperature sensing transistor as a diode)
The comparison of the drift between these three references should indicate what time it takes for the references to stabilize at which temperature, though it's not an abolute measure. Once stabilized the oven of the latter two references can be set to the zero t.c. temperature of their zener. For this experiment I would use the raw zener voltage buffered and preferable nanovoltmeters to measure the differences.
-branadic-