Not sure if anybody is still interested, but I would like to write down some of my findings.
Implementing the digital delay was a bit more tricky than expected, I had some more modifications. The strangest thing is that the delay of a loop cycle seems to be an odd number, not constant, and with an average of something like 41.546 ns. See attached image...
40.0 ns would make perfect sense to me, as the MCU is running at 100 MHz and 4 processor cycles sound reasonable for increasing a variable, doing a comparison (subtraction and zero flag check), and a conditional jump. How could it happen that I am getting weird non-constant fractions of processor cycles? Or is it all the non-linearity of my oscilloscope? I use the cursors for the measurements by placing them on the edges of the pulse (50% level) manually.
Funny thing is, I'd need a DSO with way more than 1 GS/s to get the same timing resolution as my 40 MHz analog scope seems to offer.
Of cause, the horizontal speed of my analog oscilloscope is not 100% spot on, but I calculated a correction factor using my frequency generator. The error of the oscilloscope seems to be consistent over the used ranges, so I can easily compensate that error before doing any other calculations.
I also checked the MCU clock input speed, which is 8 MHz spot on (constant phase shift vs. function generator @8 MHz).
Any ideas on how to resolve this?
Another big issue is the analog delay circuit. It is not as easy as I first thought: The delay time set by the digital potentiometer is not entirely linear. There are "jumps" of up to 2 ns when several least significant bits of the set value toggle, like 95->96 (0b01011111 -> 0b01100000). It took me a while to find that. If the cable length is close to the position where such a "jump" happens, the lenght is measured incorrectly. This also explains the small sawtooth spikes in the plot is showed in post #6.
I think the reason for this is the internal structure of the digital pot. It seems like the bits directly represent the internal switches of a BCD-type resistor decade, and I am seeing the effect of the capacitance of the open internal switches (FETs). The bloody thing is rated to 600 kHz...
However, this should be rather easy to solve, either by writing a small compensation algorithm or brute force: adding a 256-value table to the program.