I have also noticed that sometimes just touching the output of the voltage reference using my multimeter probe fixes the problem until the board is powered again. The multimeter wasn't even switched on when I did this.
That is a classic symptom of oscillation. You have multiple boards, some of which work and some don't. You must probe it with a scope, not just a DVM. Q2_Drain is a key place to look with a scope. Look at the start up timing using a digital storage oscilloscope.
Remember that the scope probe itself can suppress oscillations. If it always starts up correctly when the scope is connected, that is another clue.
I designed one of these myself, where each cell was around 1kg (that is BIG). The circuit needs to be protected against single faults, requiring more circuitry to prevent failed over-charge signals etc.