The key to handling software extended capture registers with a software extended timer is: right at the start of the ISR, to grab the timer high bit, timer overflow interrupt flag, and all capture interrupt flags, and pack them into a byte or word for further processing. Then do it again and check they haven't changed. Repeat until you get two consistent sets of high bit and flags, and you then avoid all race conditions due to captures on different channels happening too close to each other or to the timer rollover. All the possible cases encoded in the flag set, including incrementing the software timer extension if the stored overflow flag is set and doing a soft capture of the timer extension if any capture flag is set can then be handled without races or severe timing constraints.
If multiple close captures on a single channel are required, that capture channel's capture register must be read in the same loop that gets a consistent flag set, and also checked for consistency.
On MCUs with vectored interrupts like the AVRs, point the timer overflow and all the capture interrupt vectors at the same ISR as described above. Arduino attachInterrupt() is not useful here as it only supports interrupt capable digital pins, and one needs to route the timer overflow to the same ISR.
However that's a *LOT* of work and is only really appropriate when you have multiple input events (and possible generated output events) that you need to relate to a common long period timebase, so if you just want to measure an interval (or a frequency), its much simpler conceptually to use an internally or externally hardware gated timer that can simply count up from zero, with or without software extension, and only needs to be read when the gate signal has ended and the timer is thus stopped. I described this for a PIC18 Timer 0 in my previous post.
The original method of clocking a prescaler under program control to extract the count in it that you cant read directly comes from
Microchip AN592.