I wonder how linear it is.
A legitimate question - and there is a reason why I've stated that we have to expect up to 10% error...
I've done another test run and this time the measurements were different - the error was higher.
Here are the results:
As can be seen, the error is consistently between +3% and +7%, hence <4% linearity error.
Yet I wondered where the constant error comes from and did a long term observation on the DSO, where I noticed some ultra low frequency interference, with a peak to peak amplitude almost 10% of the measured value.
I checked the 1mV test signal with a Keithley 2015 THD bench meter and sure enough, the reading was unstable there as well, with readings between 950µV and 1.047mV.
Finally I realized what's going on and changed the frequency to 60Hz, thus clearly separating it from the mains frequency. Lo and behold, the DMM reading was suddenly a fairly stable 1.017mV.
The ultra low frequency interference component was just the difference frequency between 50Hz from the AWG and the 50Hz mains, and that difference was just too low to cancel out within a reasonable number of signal periods. On the DSO, I could use the line trigger and the trigger point moved very slowly - slower than the minute hand of a clock...
I can't be bothered to repeat the test with 60Hz right now, but might do it eventually. Anyway, because of this the results of my test run could be taken as some sort of worst case scenario - and serve as a warning at the same time, that measuring mains related signals might include some interference that is hard to detect and cannot be separated.