Your problem is:
TMR1 = PRELOAD_TIMER1 - TMR1;
See
Section 12. Timer1 of the PICmicro MID-RANGE MCU Family reference manual, specifically 12.5, 12.5.2, and 12.12.
You cant safely write a running Timer1 in asynchronous mode
#. If the next 32768/8 Hz increment occurs during the instruction that writes the low byte (or the high byte when the low byte is 0xFF) the result is unpredictable. As you are writing a large value to the timer to get the next rollover interrupt one second later, a randomly corrupted timer will loose an average of 7 seconds for every high byte collision, and an average of 31.25ms for every low byte collision, which are 256 times more frequent. Also if a low byte rollover occurs between reading the two bytes of the timer, there will be a 62.5ms error.
Additionally, every time you write to the timer, the prescaler is cleared, discarding the number of T1OSC/T1CKI rising edges it has accumulated since the last timer increment, loosing on average 10.5 seconds a day (assuming the timer is modified once a second and the two individual writes to the high and low bytes of the timer occur within one T1OSC cycle).
The preferred solution depends on how precise you need your clock to be, but in all cases you need to let timer 1 free-run (i.e. dont write to it or stop/start it). If you need to display seconds, as the natural rollover rate with no (1:1) prescaler is 0.5 Hz, you have to poll the high bit of the timer and increment the seconds count if it changes. If you need to combine this with SLEEP (only worth it if you are driving a 'bare glass' LCD display, or power down other display types during sleep), one possibility is to use the watchdog timer to wake from sleep. Alternatively if the display is powered down, you can wake up on Timer1 rollover interrupt, and increment by TWO seconds then return to sleep, only resuming polling for the intervening seconds once user action has caused an exit from sleep mode.
If you don't display seconds, you can simply use 1:2 prescaler for a 0.25 Hz rollover interrupt rate, and count 15 interrupts per minute.
Once you've got cycle accurate timekeeping you then need to trim the crystal frequency
* by adjusting the load capacitance, as unadjusted it may result in up to +/-1.73 seconds per day error, or more if your load capacitance is out of tolerance. You can reasonably expect to reduce this error by an order of magnitude, which will get the error down to under 0.2 seconds per day short term and together with the crystal's aging drift will have an annual error of under +/- 4 minutes (assuming reasonably constant temperature and supply voltage). If you need to do better you need a higher quality external time reference e.g. radio time signal, GPS, NTP, or in 1st world countries, the mains frequency is tightly controlled so the long term average of the daily cycle count is locked to realtime +/- a defined error band.
# ... unless you poll for a LSB transition before writing and can complete writing the timer well before the next T1OSC/T1CKI rising edge. However polling loops in ISRs are *EVIL*!
* Method depends on available test equipment. If all you have is a phone or radio time signal its a PITA, as you need to stick on a circle of card, punched to tightly fit over the trimcap adjusting screw so you can reposition it precisely, note the error over a day or more, and adjust the trimmer to reduce the error, each time marking the card disk and noting the exact time/date of adjustment and the disk position vs error, to build up a calibration curve for the trimcap. As you narrow in on the correct setting you'll need to check over longer intervals e.g weekly. If you have a >6 digit frequency counter with a GOOD reference oscillator (e.g GPSDO or rubidium oscillator), you can simply use an active probe weakly capacitively or inductively coupled by proximity to the crystal without contact so it doesn't significantly 'pull' the frequency, or even a microphone with HF response >33 KHz, a high gain tuned amplifier, and directly adjust the frequency to 32768 Hz +/-<0.5 Hz. If you don't have/cant build a suitable tuned amplifier, an alternative method is to clock the PIC with a 20MHz external clock to minimize jitter and latency and write code to set up timer 1 with no prescaler and toggle a pin, for a 1Hz output, then *slooooooly* adjust the trimcap for 1.00000 Hz. You can also use this method without a frequency counter, triggering a scope set to a slow (>1s timebase) from GPS 1pps, with the PIC 1 Hz on a different channel and adjusting for minimal long term drift, ignoring short-term jitter.