Thanks for the effort of finding out about the averaging method used.

Averaging over about 1 minute makes sense. Normally, if everything is done right this should give relatively low noise values for the gain. I would not expect much faster drift for the gain, though with noisy resistors (e.g. like the NOMCA arrays or worse thick film resistors), there can be multiplicative 1/f noise too. Normal thermal drift would be rather slow, so that 1 minute steps sound like OK for this.
However for the offset averaging over 1 minute is kind of a disaster as it would add lots of 1/f noise from the amplifier and ADC. It would set the relevant frequency for the 1/f noise to some 0.01 Hz instead of the possible 1.5 or 15 Hz for 10 or 1 PLC. The added 1/f noise would be especially bad when using the BS-amplifier (AD548), that has lots of 1/f noise.
The natural way would be to use directly the zero readings 1:1. This way one can also avoid the delay from the zero and signal readings. Good averaging would be the zero reading before and after the signal, no more. Later digital filtering for the result (signal - zero) would also filter the zero reading - so no need to do extra zero filtering. Filtering is reducing the white noise a little, but adds 1/f noise, so there is an optimum time window, that may well be in the 10-50 ms range, so maybe a thing for faster readings < 1 PLC.
For some reason the early plot shows quite some drift for the voltage readings, possibly the ADC gain. So there may also be some source for gain drift, like a high TC resistors in the hope that the real time gain measurement can correct it. With so much drift one may have to shorten the interval and use more like running average instead of just fixed blocks.
With the captured data one could in theory use a more sensible calculation to get a lower noise result.