Thanks all!
Wondering out loud if something like the the Jumperless https://www.crowdsupply.com/architeuthis-flux/jumperless-v5 and some sort of pogo pin "jig" that ran code to cycle through a battery of tests designed for each IC is a feasible system to use to detect such issues?
I think there will be ESD damage that won't be detectable by exercising the chip electrically, until it fails. However, I cannot be absolutely certain. I doubt that a comprehensive study has been made - it would require hundreds or thousands of chips of all different types, variable strength electrostatic discharges made to a random subset of them, comprehensive testing and characterisation of input currents, functionality, power currents, temperatures of all the chips. Then they would each need to be installed in a "product" (in reality, a test bed that exercises their functionality) and run until they fail, which means (in my experience) at least a year. Or don't fail, of course.
Only then can we know whether there were any signs at the beginning of the test that predicted a failure a year later.
As I say, I doubt it has been done, but I might be wrong. Perhaps some sort of characterisation is performed on chips destined for spacecraft, undersea cables - anywhere a failure would be catastrophic and prohibitively expensive. I would imagine the tests would be to select chips right from the middle of the range on the test parameters. I doubt there would be much interest in then testing the rejected chips to see if they do fail. They'd be onto the next job.
I would love to hear other's experience in this topic.
Back in the 80s BT (or was it still the GPO at the time?) did extensive studies of component failures, trying to find why early electronic telephone exchanges, like the TXE4, worked smoothly for ages, crashed, and usually were OK after a reset. The biggest issue turned out to be tantalum capacitors growing whiskers, and we moved to solid aluminium ones. Those appeared in the market at just the right time. However, among this work a lot of dead semiconductors were inspected, and a lot of writeups and photographs were distributed to BTs pool of suppliers. The photographs of ESD damage were really surprising. Everyone expected the damage would be around the I/O ring, and mostly it was. However, quite often a pin hole could be seen punched somewhere in the heart of the device. It seemed very strange, and I don't remember seeing an analysis that really got to the true cause. Just a lot of speculation. Most total failures leave physical damage that goes well beyond the failing device, so there is so much damage something critical to the product's operation has almost certainly been affected. However, when the stress is just quite bad, but not of the popping level, a visual inspection with a SEM doesn't seem to show anything, yet things like pin protection diodes may no longer be doing their job.
In the 70s I had several MPUs of different types that responded to shaking. Shake them and things were OK, shake them again and they crashed, shake them again and they might be OK again. I did a detailed analysis of 2 of these. One was an MC6800. I can remember the other. I was able to determine exactly what was breaking and fixing itself. The MC6800 had one bit in the ALU stop working. The other I think had an internal bus bit failing.