Thanks for the explanation, Performa01! It's good to know that it's not something wrong with my scope.
From an ignorant users standpoint, the behavior does look like a bug to me. I can definitely understand that it's processing a lot of data, and presenting that data will not always be real time. But as I could clearly see, presenting the data is not a bottleneck in itself, as it's quite fast when the memory length is shorter.
So the fact that the trace lags when updating measurements makes it look like the classic "doing background work on the ui thread" issue. Usually solved by doing time consuming work in the background and updating the ui when done. There could be good reasons for doing it the current way, including technical difficulties or complexity, accuracy (though since it only updates once a sec anyway it's not really representing any current state), or they don't worry about it since no one has complained.
I understand what you're saying - and I also can only speculate about the reasons.
On the other hand, I had quite some discussions with Siglent R&D about measurements and I know that they do care - even though quite often I am the only one complaining about something

The fact that it works smooth with 1.4Mpts record length (which you generally should not take for granted) shows that the implementation in the Siglent DSOs isn't that bad after all. Manufacturers usually don't specify the max. data length for math and measurements, but you can bet not many will actually analyze up to 14Mpts. There is one particular (expensive) brand whose DSOs are infamous for getting unresponsive as soon as they have anything demanding (like math or measurements) to do. I'm not sure you'd like such an approach better?
The bottleneck is certainly processing and bus transfer speed as well as the (limited) size of fast internal memory, and embedded platforms (even with the pretty powerful Xilinx Zynq) cannot be compared to a modern PC (which is what upper-midrange and high-end DSOs are based on). We cannot acquire and display new data while analyzing the sample memory at the same time, because the analyzed data need to remain consistent during analysis. With 14Mpts record length, there are "only" 3 records in the history (42Mpts total per channel pair!), hence the currently analysed record gets overwritten by the acquisition process much faster than the CPU can analyse it to get all the measurements. As a consequence, the acquisition process has to be halted until analysis is complete.
A 4 channel SDS1004X-E has a total of 28Mpts sample memory (up to >100Mpts in History and Sequence mode). If we were able to take a "snapshot" of the sample memory, i.e. copy the entire 28Mpts to some internal memory of the ARM CPU (on a separate memory bus) and then analyse it locally in the background, then the interruptions of the acquisition and consequently display process could certainly be much shorter and would most likely not be noticable at all even with 14Mpts record length. But my guess would be that there is just no local memory on a separate bus that is as big as the maximum record length for all channels together (at least 28MB) ...
To cut a long story short, neither the UI nor the display is blocked by the measurements, it is actually the acquisition that needs to be briefly paused in order to prevent it from corrupting the data just being analysed. With shorter record lengths (either by limiting it in the Acquisition menu or selecting a faster timebase) this problem eases pretty quickly.