In most of the later data files the raw ADC reading look good, so little residual noise for the 2 consecutive readings (e.g. columns 10 and 17) of the µC internal ADC. The version with the slower ADC clock still seems to have some problem (more noise and the last column is allways 0). So one can probably go back to the faster ADC clock.
The comparison of the noise for the different modulation speeds show quite some difference: for the cases P,Q and W ( double, normal and half the speed) I get noise of 2.9 µV , 2 µV and 1.4 µV for the 3 cases. So there is quite some noise related to the switching / jitter.
One could try to slow down the modulation, e.g. about to make the slow case W mode like the more normal case, by increasing the xdelay constant in the ASM program (e.g. from 12 to something like 40). The slight higher clock (16 vs 12 MHz) makes it start a bit faster anyway. The integration cap is still large enough for this.
The other point would be trying to find the actual jitter source. The main candidates are the oscillator, the HC74 and the LV4053. This could be the chip itself, or there supply / decoupling. For finding the weak point in the HW side the faster modulation would be an advantage. Normally the HC74 and LV4073 should not be so bad, unless there supply is unstable. Trouble with the clock decoupling would likely be also visible in the INL test via the difference test (B).
A very short xdelay (in the ASM program) could explain a little, though I still think there is more jitter than it needs to. A values of 1000 is likely way too high and may drive part in saturation. The upper limit is likely at around 63, as there may be some 4xdelay that has to fit in 1 byte. The numbers in the ASM and Pascal program also have to match !
A noise of only 100 nV would be too good to by true. The Johnson noise of the resistors should contribute about 300 nV. The best noise I got with the AVR version was with slow modultion, at some 420 nV. With faster modulation the noise is more like 500 nV. With the ARM version and an slightly slower modulation I get down to 360 nV, which I would consider well good enough and better than hoped for.
edit:
I lookes at the raw data: there is still some scattering in the raw resuslt, just the math for the 7 V ref reading is way of an this than divides down the result so much. WIth the slower ADC clock the reading also show more scatter. The values for the K1 and K2 factor should normally be relatively stable. So no real change needed unless some HW change or very different temperature. If K2 changes with the setting of the trimmer ( no need to be strictly in the center of the ADC range, just avoid hitting the bonds) this would indicate going to high in the residual votlage. I don't exactly know how the 2 transistors behave.
2nd edit:
If the ASM code still has xdelay=12 and the pascal code had 0, than the scaling was wrong (and massive INL/DNL errors) before too. So the actual noise may already be better by about a factor of 2. So the appearent 1.4 µV noise would be more like 700 nV noise for the slow mode. Still not very good but already useful. One can get a quick check of the linearity by watching a capacitor discharge an error in the size of the runup steps is quite obvious.