If you are using Timer 0 in 16 bit mode, you must be using a PIC18 as all other families supported by XC8 have an 8 bit Timer 0.
The PIC18 Timer 0 has a latch for the high byte, which is used both for read and write operations. TMR0H is the latch, not the actual timer. To write the timer you must first write TMR0H to load the latch then write TMR0L to transfer the latch to the timer high byte, while simultaneously writing the low byte. To read the timer you must first read TMR0L which also updates the latch from the high byte of the timer, then read TMR0H to get the previously latched value.
Although the XC8 headers include a definition for a 16 bit TMR0, It should NOT be used as you are relying on
undefined behaviour, namely the order of byte reads or writes to a 16 bit volatile unsigned short.
Even your:
counts = TMR0H*256+TMR0L; // load current TMR0 counter value
is unsafe as the compiler is free to rearrange operations between sequence points and the RHS doesn't contain any.
To guarantee a correct read, even if you upgrade the compiler or change optimisation level, you *must* use:
counts = TMR0L;
counts|=TMR0H*256U; // load current TMR0 counter value
or a similar sequence to ensure you get the high byte that was latched most recently.