If F_CPU is 8 MHz, then F_CPU / 1024 is approximately 8 kHz, or 8 counts per period. So the uncertainty is +/- 1 count, or +/- 125 µS, or +/- 125 Hz. For better resolution, you would either have to increase the speed of the timer, or switch to conventional counting. An alternative might be averaging the period over multiple cycles.
It's running on an Arduino board, with an ATmega328P @ 16MHz, so F_CPU / 1024 is actually 15.625 kHz. But if I'm following your line of thought correctly, the uncertainty is +/- 64 Hz. I note this happens to be very close to the difference between 1041 and 976, so I guess this is the cause of the inaccuracy!
But, if I am understanding correctly that we're saying the MCU only has 64 timer clock cycles in which to capture the edge of the signal, why is this not enough? Seems like plenty to me.
So, I need to increase the timer clock speed. Hmm, but leaving the code as-is, I can't go below clk/256, as then my OCR value would be too large for 16-bits. I shall try clk/256 and see what results I get.
Incidentally, would also disabling the noise reduction feature help? As I understand it from the datasheet, enabling this means it will wait until it has a consistent input level for 4 clock cycles before capturing the counter value. I probably don't need to use it, as I will eventually be running the input signal through a low-pass RC filter first.
The idea of using the input capture function is good. However the way it is implemented is not that good. There is some delay in resetting the timer to zero - the delay can vary, depending on which ISR is active at the time. Also the timer is running at a low speed and this the resolution is limited - this could become a problem at relatively high frequencies. So a first step would be running the timer faster.
The better way to do the measurement would be to use the ICP function the measure the time for a start and a stop event. The periods length is than calculated from the difference in time for subsequent events.
I had initially thought about taking the delta between the current and previous counter values, but as I intend to ultimately run this on an ATtiny with limited memory, I decided on going forward with my current solution that involves the least number of variables in memory, hoping that it'd be okay. Guess not!