If the captured signal is not synchronous with the MCU clock there is no guarantee that two I/O pins will pick it up in the same cycle. Perhaps you could figure out some way of cleaning it up in software.
So I tried this on an ATMEGA128 board that I had lying around. Like the ATMEGA32U4, that has two 16-bit timers that have input capture. Sure enough, with the two ICP pins tied together (in a not very tidy fashion), and the two timers synchronised and clocked at the same rate, the capture counts are occasionally out by one. Just for the heck of it, I changed the test signal to a 1kHz ramp, and with the timers running at 16MHz, the captured counts can be out by 40 or 50.
The 2nd capture would run from a slower clock. If this is still significant faster (e.g. 2 x or more) than just the overflows one would not even need to care if the 2 really capture at exactly the same µC clock clycle. So f_clk/1024 is sensible for the 2nd timer.
I guess I was thinking of in effect concatenating the timer counters (wired together externally if required), rather than just select different prescaler taps for the two timers, but the latter sounds like a better option.
It still needs some care and checks for the boarderline cases close to the steps of the slower timer. So the coding does not get much simpler than with counting the overflows in software.
Yes, having looked at it in more detail now it seems to cause more trouble than it is worth, with other things to worry about.
One thing that is does allow, and for the type of things I do could perhaps turn out to be useful one day, is to synchronise the two timers and set the one timer to capture the rising edge, and the other the falling edge. Otherwise you have to be quick-smart to get in and change ICESx.
The simplest cascade of timers is via a true hardware option, but not all MCUs offer this capability.
SiLabs parts and others can cascade capture on 2 16 bit timers, but the mega328 is a very old part that lacks this feature.
If changing hardware is an option, some of the not so bleeding edge processors I tinker with these days have two 32-bit timers, and for all I know maybe those can be chained together. 64-bits should to be enough for anybody?