Electronics > Beginners
Pulling my hair out. Circuit boards stop working once shipped to client and more
Jackster:
--- Quote from: Psi on June 01, 2019, 11:10:28 am ---Note: I am adding to this list as i think of things. So you may want to re-read it in case i have added more since you read it last.
Questions
- Have you got any units back that a client says don't work? If so do they work for you or are they dead.
- Is your QC test automated? If so are you sure it is not faulty and letting dead units out the door? Maybe a manual test is needed so you know for sure all units leaving you are 100% working.
- Is there anything unusual about your location or places you ship to. Some places irradiated all their mail with high energy x-rays and this can destroy electronics.
- Does the MCU/software system interface/talk-to something that is different in different parts of the world. Here's examples of what i mean. Maybe bluetooth to a phone or maybe comms to a desktop PC app. Maybe some people have their phone/desktop set to a different timezone/country/language and may your app is incompatible with this.
- Do you have the sourcecode to your ATMega or does the freelancer hold that?
Few possibilities i can think of.
- Are users connecting the battery around the wrong way and damaging the device. (9V around wrong way for a split sec, that sort of thing)
- Do you have any floating inputs pins on the MCU? Maybe the code only works if an input is read as either low or high but it keeps changing with ambient noise. When built it might stay in one state but in noise environments maybe it floats to high and stops the code running. (Floating inputs should have MCU pullups enabled in software but maybe they are not set in your code?)
- Have you tried powering the device from 4.5V and with lets say 50mA current limit. Not all USB ports are created equal. Maybe your product is quite critical on power and not all USB ports can power it.
- Where are you getting your parts from, maybe you are getting lots of fake ICs
- Could be a PCB track routing issue where tracks run too close to a hole or board edge and sometimes get cut by the drill/router. etc Some PCBs work, some don't, some intermittent.
- Are you sure you have the ATmega Fuse Bits set correctly, maybe the startup delay, brownout detector or crystal settings are wrong and this is making it run intermittently.
- Does the product have protection from ESD or PSU spikes, like a TVS? Does the product get used in a location where it might need this. etc automotive/industrial
- How are you programming the MCUs? I one had a crappy USBASP programmer that would brick 2 our of 5 AVRs it flashed. Not sure why, maybe clock was out of spec and kept erasing fuse bits.
- Does your MCU programming system include a verify check?
- There is one AVR MCU, cant remember which, that comes with fuse bit set to put it into a compatibility mode where it pretends to be a different AVR chip. Some of the IO/peripherals don't work until you get it out of that mode. :palm: (my guess is they have a supply agreement to sell a compatibly chip for 25 years for MIL/MED/AERO) EDIT: All ATMega128 pretend to be a ATMega103 until you change the M103C fuse bit
--- Quote from: Jackster on June 01, 2019, 10:56:11 am ---And once the PCBs arrived, none of them would allow me to burn the boot loader onto the ATMEGA via the programming header.
Had to pull all the ATMEGAs off and flash them in a socket.
Put them back on and 90% of the boards I made just don't work as expected.
--- End quote ---
This makes me lean towards a PCB/SCH issue.
Are you using MISO MOSI SCK pins for anything else other than programming?
You can use them for other things too, but you need to make sure you don't load the lines so much that programming is effected.
It can become intermittent if you load them or have caps on the line to gnd.
Also, grab one of those boards that doesn't program and use DMM to check the tracks between the programming header pins GND VCC MOSI MISO SCK RESET and the ATmega pads for those pins. Also check none are shorted together.
Help
- Are your PCB files in Altium? if so i'm happy to take a look at your SCH/PCB/CODE and see if i can spot any potential problems.
--- End quote ---
Thanks for the long reply and offer to check boards. Ill export them and DM you the files.
I have the boards that have failed from when clients used them. More than half worked as expected. Some did fail testing.
QC is done by hand. I run the boards on a test program for 24 hours. We also test after this that the PWM input, switches and firmware all work.
We ship from the UK to all over. Ill need to dig into locations that we have shipped to and have had units fail. But France and Sweden have had boards fail. The boards are encased in an aluminium block with >1.5mm wall around with a few cutouts for buttons and display.
The device has a PWM input from a sensor and displays it on some 7 segment displays. It has a NRF24L01 that talks to another one of the same device that I make.
I have source code.
I have a diode protecting reverse polarity though I don't know if I did it the correct way. But the power is only able to be plugged in one way using a LEMO socket and cable, which we provide.
We have 1 pin that is pulled to ground if it is a receiver and not a transmitter.
USB is only for updates, though the device works using it. The actual power source is 16v batteries and I use an off the shelf power regulator board to take 13-20v down to 12v, which then gets taken down to 5v with a normal REG.
This might be a good point, all but a few of the new boards I have made up from batch 3, have been with parts from LCSC. I will check this tonight.
I have used the default Circuit Maker design rules. But the outside edge I made quite far in. At least 1-2mm.
Not sure.
This product gets used all over, but most boards that have failed, arrived to the user bad. My hunch was xray but I have nothing to back that up with.
I use an Arduino to load the boot loader onto the ATMEGA and then a FTDI for flashing firmware.
AVRdude checks and we make sure it confirms all is good.
Not sure about the ATMega328p doing this or not.
I do use those programming pins for the NRF24L01.
Ill send you Altium files later but here is top and bottom.
Board on left is batch 1 and 2. Board on right is batch 3.
vk6zgo:
--- Quote from: dmills on June 01, 2019, 01:07:24 pm ---What does your power input stage look like? Ceramic cap and LDO by any chance?
If so, try plugging in the power with the supply already switched on, and with reasonably long power leads, you might be blowing the regulator due to ringing in the LC circuit formed by the power cable and the input cap (Cure is a jellybean electrolytic about 10 times the value of the input ceramic in parallel with it, the ESR damps the ringing).
Are ALL of your external IO lines fitted with some form of ESD protection?
No floating inputs on the micro?
Is everything run well within datasheet ratings?
I had an issue with a production board once where we suddenly started getting a very high failure rate, turned out the spi bus was being driven on the wrong clock edge and the first batch of the peripheral chips just happened to work with zero hold time, the next ones not so much.
You need to get a few duds back and investigate.
Regards, Dan.
--- End quote ---
Reminds me of the Transmitter remote control system we had in to modify.
While doing so, I mislaid one of a number of monostables which were used in it.
No worry,-- plenty in stock!
Fitted a new one-- damn thing wouldn't work!
It turns out that with that particular IC, people had been having problems getting stable operation at short "on" times.
The manufacturer's answer was to split the line into two devices, one with the original part number, which they redesigned to be optimised for short "on" times, sacrificing its performance (which had been perfectly satisfactory) for long "on" times.
For long "on" times, they produced a new device & type number
Not knowing this, we got caught out when the thing didn't work.
In the end, we had to get a stock of the new devices in, replace them in both remote control systems in use, & change the documention.
I wonder how many "young (& old) players" got caught by that, & "tore their hair out" over the years?
Jackster:
--- Quote from: mariush on June 01, 2019, 03:13:40 pm ---Maybe the flux you're using needs to be cleaned off the boards otherwise is causing some resistance or short circuits?
Maybe you're accidentally joining two pins of your microcontrollers during soldering?
Do you have screw holes near traces? maybe you're shorting traces with screws or breaking them with friction over traces?
If usb powered... are you assuming you're getting clean 5v? Maybe the guys at the other end have too long usb leads causing voltage drop, or maybe they have stupid unregulated phone charger style usb things pumping 5.5-6v in your boards?
Inductance on the long usb cable causing voltage spikes? Not enough capacitance on input and output of regulators that could damage the regulators or cause them to reset/glitch? Bad output capacitors on regulators?
Where do they install these products? are there powerful magnets or some induction things or something that could be picked up by your circuit and affect it
maybe share at the very least a picture of the assembled board ... if it's too much of a secret to show a schematic or something more complex
--- End quote ---
I can check the flux idea later.
No screws near traces. Only ground planes.
Not USB powered but can be powered via USB. USB only really for firmware updates.
The product is not installed near anything like that. Just near other electronics. But issues have been happening before being used.
--- Quote from: imo on June 01, 2019, 12:09:39 pm ---Hard to help without details. Anyhow, the fact the 5-10% that get shipped end up not working is an indication there is something wrong with the product or processes around and the manufacturer should have stopped shipping such a product.
--- End quote ---
Looking at it, it is less than 5%. Total of around 100 boards, only 5-6 have failed. One just needed a repair after user error.
--- Quote from: dmills on June 01, 2019, 01:07:24 pm ---What does your power input stage look like? Ceramic cap and LDO by any chance?
If so, try plugging in the power with the supply already switched on, and with reasonably long power leads, you might be blowing the regulator due to ringing in the LC circuit formed by the power cable and the input cap (Cure is a jellybean electrolytic about 10 times the value of the input ceramic in parallel with it, the ESR damps the ringing).
Are ALL of your external IO lines fitted with some form of ESD protection?
No floating inputs on the micro?
Is everything run well within datasheet ratings?
I had an issue with a production board once where we suddenly started getting a very high failure rate, turned out the spi bus was being driven on the wrong clock edge and the first batch of the peripheral chips just happened to work with zero hold time, the next ones not so much.
You need to get a few duds back and investigate.
Regards, Dan.
--- End quote ---
Power regulation is a bit of a hack tbh
I am using an drone regulator board to take 12-20v input down to 12v, I then have a 5v REG for the IC.
I did this for a few reasons, the main one being heat, I found that the LDO and other REG SMD packages got too hot with upwards of 20v.
The other being height. I was able to get these drone power regulators on a PCB less than 4mm which was important.
OwO:
First thing you really have to do before considering anything else is to triage the failures and get to the bottom of it; exactly which part failed and in what way? otherwise all we can do is speculate and brainstorm 1000s of unrelated and probably irrelevant possibilities.
OwO:
You did mention half of the reported failed boards then subsequently passed testing on your end; in these cases what did the customer observe? Wireless comms not working? device not responding to any input and appearing "dead"? Is each customer different or is there a pattern of a type of failure? We still need far far more info before we can be helpful.
--- Quote from: Jackster on June 01, 2019, 03:54:04 pm ---I have the boards that have failed from when clients used them. More than half worked as expected. Some did fail testing.
--- End quote ---
Action items right now: debug the boards that do fail testing, try to find the exact root cause, record all observations. Dig out all failure reports with devices that subsequently DID pass tests, record them somewhere and look for patterns. Ask customers for more info if necessary.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version