You are right of course, I did some quick calculations and the number of points in the RAW subset even with 32-bit float would give an error of ~0.01% and in your last example you are seeing ~ 0.8%. However I don't think they do anything as sophisticated as suggested as even 5000 points would probably give around ~1% depending on the complexity of the waveform. With a fixed amount of points the error would increase as the number of samples increased, however they could vary by some fixed ratio to maintain reasonable accuracy.
It's likely they re-use existing code bases wherever possible from their older scopes which if I'm not mistaken where Zync based which have much less CPU power (one or two cores at less than 1Ghz) but they could definitely get things more accurate in real time with some effort. However even in stop mode calculating the whole dataset for 500Mpts is going to be slow, so I can see why they would use a subset as 1% for this class of scope is fine imo.
It is not ok when scope that is 1/3 of the price uses 10Mpts for calculations. Measuring complex signals is actually the name of the game. For pure sine signal I don't need RMS measured, it is mathematically connected with P-P value.
Let me remind you what I wrote: RMS (AC or AC+DC) is vulnerable to crest factor errors.
While there was large discrepancy found measuring scope internal noise that started this question of RMS measurements, that is very wrong type of signal to verify RMS is functioning properly, because of everchanging BW and internal shenanigans it might be changing all the time. Also noise can be measured even when undersampling with only 1000 points from 100Mpts. You just need to sample repetitively for some time. You need to use periodic complex signal for test, to expose "holes in vision"....
So,external known signal needs to be used.
Secondly, signal has to have large crest factor to reveal potential problems.
One scenario where old DS1000Z was making large errors was for instance PWM signal that had large amplitude and low duty cycle. Like signal changing from 0 to 5V but at 2% ON duty cycle. If you had 10-20 of those pulses on the screen it still calculated about right. If you make timebase longer to capture 1000 cycles, it would start having large errors, despite sample rate did not drop and it had long memory enabled.
While on the other hand other scopes had measurement largely not changing until there was drop of timing resolution because of sampling.
With signals that have large crest factors and more complex shape than PWM it was even more visible, but those signals are hard to replicate for everybody and therefore hard to compare the results.
In attachment an Arb signal at 100ns/div and 200µs/div, 2000X time spread. All measurements include AWG errors and fact that both AWG and scope were on for few minutes only, so still drifting slightly.