That other component is actually a GS7133.
It and the G9661 are both 3A LDOs with 5V input.
That combined with the fact that the failures are occurring when off suggests they are on the 5Vsb input (you can check this by measuring continuity between the input of the LDO and pin 9 of the ATX connector), and definitely agrees with the arcing scenario --- the high frequency noise that creates could be sending spikes into the 5Vsb rail and killing connected components such as that LDO.
Measure the 5Vsb voltage of the PSUs, to make sure that they haven't been damaged by the arcing, as otherwise they could be the ones continuing to kill mobos after the cause has been fixed.
A C14/13 should not be loose. Loose connections create heat and are a fire hazard.
Thanks, that confirms my suspicions. As Rasz states in his post, those LDO's apparently have inherent issues as is, and fluctuations are definitely not helping. The cleaning staff must be either knocking the cable around or plugging their hoover in the same socket, as these sockets face towards the middle of the room this seems to be the most likely scenario, they are the most easily accessible powerpoints to plug in. I have asked them not to use those sockets, before any of the issues ever started. I have also asked them to be careful around the area when cleaning, but I don't think they took it seriously enough. Will have a more stern chat with them.
I have tested the PSU's after failure, they seem fine, and are currently in rotation with the new motherboards.
The C14/13 isn't really that bad, but I will bite the bullet and replace the PSU's anyway, you are right, better safe than sorry.
Quick question, would I get away with replacing the GS7331 on the MSI board with the G9661M that I ordered for AsRock, looking at the datasheets, they use resistors on the motherboard to control the output voltage by using feedback/ADJ pin 7 on both LDO's, would G9661M have the same output voltage as GS7331 with the same set of resistors? I'm not knowledgeable enough to be able to tell. I can see the pinout for the SOP-8 package is the same for both, assuming that pin 7 marked as FB (feedback) in one and ADJ(adjustment?) on the other performs the same function.
What do you have for ESD precautions when handling the motherboards and installing them? Usually I see ESD cause failures about 5-15 days after a system is put into service.
If you are rough installing the ATX power supply connectors, or flexing the motherboard, or its mounting spacers+screws are shorting to parts, I could see parts fail.
I have seen cleaning staff plug their vacuum cleaner into the same power bar as office PC's and cause damage. That was very hard to track down. Failures and HDD corruption every 2 business days over night, when they came to clean.
Anti-Static mats, wristbands, and gloves(for more sensitive/expensive parts, CPU etc.), have been building/repairing PC's and servers for 14 years, last 8 years - professionally. Very unlikely to have been caused by me, just like anyone else I may make a silly mistake here or there but normally those are caught straight away, and definitely not 3 out of 6 machines on the same day. I had maybe 2-3 minor hiccups due to my fault (not paying attention) in the last 14 years. The current lab has been used for the past 3 years in it's current form and have seen close to a thousand machines go through it, so very unlikely to be environmental. Sorry don't mean to come off standoff-ish, just clarifying your questions.
Man, that must have been annoying. I know the feeling, trying to explain this to people not familiar with IT/Electronics is fairly annoying, they think you are being an ass and trying to seem "smart" and looking for things to "annoy" them about. I will certainly have a more stern chat with them and might even arrange a demonstration just to drive it home.
Judging from your ability to find the faulty components then I suspect is thermodynamic cause you could have easily seen ripples and harmonics disturbances if you wanted.
You didn't elaborate on the cooling. I suspect is heat induced. Improper channeling of airflow across the board or insufficient air flow or fans not rotating enough.
Put thermal pads onto the ICs or small aluminum heatsink onto it [yes there are small heatsink].
Use a thermal camera to see where is glowing and put the heatsink or pads onto it.
Edit: And your cooling consideration should cater for the worst ambient environment for the PC to be located e.g. unconditioned factory floor, local heat sources etc.... or clearly labelled onto the PC "Ambient 25 Deg.C MAX!"
Thanks for the input, my ability to troubleshoot electronic issues isn't really something to write home about, I got lucky that in both cases the components are shorted and are getting fairly hot, I found them by hand, looking for hot spots, while plugged in. If I had to find them with a multimeter, without a schematic available it would be another story. Just to confirm the gaps in my knowledge about electronics: "ripples and harmonics disturbances" makes no sense to me, I'm assuming you mean fluctuations/surges in the grid?
With regards to airflow, it is very unlikely, I have a considerable amount of experience building PC's, and familiar with airflow best practices, there is a fan blowing right over the component, in both cases, the ambient temperature does not go above 27 C in the summer, the machines are monitored when on, none of the monitored temperatures ever go high enough to be a cause of concern. I am more inclined to go with my initial suspicion and the comment from Amyk.
Once the replacement LDO's arrive, I will seat some heat sinks with thermal glue onto them, as you suggested. I have a bunch luying around the office and was looking for an excuse to use them.
I will also invest in a thermal camera just to ease troubleshooting and add another step to QA when building, by checking the airflow visually, I really should have done it a long time ago.
ASRock H110M-DGS R3.0, the failed component is the power regulator G9661M
Asrock has been selling factory defective motherboards since 1150 socket, they all die when powered off _and_ unplugged, something to do with cmos battery/standby circuit resulting in fried standby LDO (something reverse biasing? feeding power back? or latching?).
https://www.reddit.com/r/buildapc/comments/2jrt1b/psarequest_possible_reoccurring_problem_with_the/?sort=new
I have Z87 Pro4 board with same issue. Worked great 4 years plugged in until I swapped GPUs(had to disconnect power), then started having problems booting, would boot on the second try, then on the fifth, then on the 50th, now completely dead. In most cases fault gets progressively worse until death of the board.
Maybe someone is turning off power during the weekends in that office, or cleaning lady uses power socket for the vacuum cleaner
Edit Ha, floobydust beat me to it! altho I meant staff unplugging main PC power strip to plug her equipment.
Thanks for your post, the inherent issues with the LDO would certainly not help matters. I do believe it is the cleaning crew either plugging into the same socket or messing around with the power cable, too many things point to that.
Wow, my first guess with your Z87 Pro4 issue would have been leaking caps in the PSU or on the board itself, as in my experience they exhibit the exact symptoms you described, but I'm assuming you have done your research/testing on the matter.