Nothing is getting really hot on the board : it always draws around 250 uA whether the glitch already occurred or not. But it is a fact that if I keep U2 very cold it keeps running, I tried it several times and it's not randomness, without doing anything the meter stops working after 10 seconds and with the air duster I managed to keep it running for 5 minutes and the glitch came soon after I stopped cooling the IC.
I tried removing all decoupling caps (C13, C4, C10, C14) also C2 which generates an initial reset signal and the meter behaves exactly the same, it works for several seconds then glitches.
I'm more and more inclined to think that U2 has some kind of subtle defect. The communication between U1 and U2 is not documented but I tried to understand how it works. You have two 4-bit busses C0-3 (probably stands for Control) and D0-4 (data ?) then there is a CLK signal and finally NDAV.
- CLK just looks like a cleaned up digital version of the 32 kHz crystal signal and it works even after the glitch, so no worries here.
- The data bus looks unidirectional from U1 to U2, I think it's dedicated to A/D samples, after the glitch I still see data spit by U1 on this bus.
- NDAV is maybe "new data available" ? because it is driven at roughly 25 Hz by U1 and 25 Hz is the update rate of the Bar Graph so I think U1 pulses this line each time is has a new sample. This line also functions after the glitch.
- Finally the control bus is probably data from U2 to U1 (or maybe it is bidirectional ?) in order to tell U1 to switch to a different range. This bus has a lot of activity before the glitch and nothing after.