EEVblog Electronics Community Forum

Electronics => Beginners => Topic started by: Jackster on June 01, 2019, 10:56:11 am

Title: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 01, 2019, 10:56:11 am
So I designed a small product a few years ago and it has been selling well.
But around 5-10% that get shipped end up not working once received by the end user.

The product is pretty simple and only requires power to work.
But a small % just don't work once they are in the hands on my clients.

All they have to do is plug in power, which is either via a battery (a standard used in the industry) or via USB.

I did the whole PCB design, case CAD and offloaded the software to a freelancer.
In house (literally my house) this all works fine and passes all our QA testing.



Another problem I have just had is that I needed more PCBs the other month.

I wanted to improve 1 thing which was when installing the PCB this corner was getting in the way a tad.
So I chopped off this corner in PCB design software and re-poured the ground plane.

And once the PCBs arrived, none of them would allow me to burn the boot loader onto the ATMEGA via the programming header.
Had to pull all the ATMEGAs off and flash them in a socket.
Put them back on and 90% of the boards I made just don't work as expected.
This is after P&P and hand soldering over £300 worth of components >.<


I just don't know what to do any more.

Would love to just hand off the PCB design to someone else but I don't have enough money to fund that right now.
I don't understand why one minute the thing works but then dies as soon as it arrives at the end user.

Any advise on what I should do?



[edit]
For clarification as I know I have not added a lot of details here.



Any boards that have failed have been replaced with working boards. Full warranty was provided.

The two issues above are not the same issue/related.
 I have done 3 batches of boards.
 Batch 1 out of a total of 50 boards, 2-3 failed and were replaced.
 Batch 2 we have had 3-4 fail and were replaced. One has failed again, we are investigating.
 Batch 3 with a slight PCB change, all but a handful have failed to work as expected. None have shipped.

Boards that arrived back here were examined but I don't have the tools or knowledge to go deep into scoping pins or anything like that.

The flashing of the boot loader is only done once, which is why we can flash them on a socket and then P&P.
The USB interface is used to flash firmware and updates.
I am aware that there might be underline issues with batch 3, which is why we are not shipping any of these boards.

My clients are aware that this is a project being done out of my garage and that I am not a "pro" at this.



Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Psi on June 01, 2019, 11:10:28 am
Note: I am adding to this list as i think of things. So you may want to re-read it in case i have added more since you read it last.

Questions
- Have you got any units back that a client says don't work? If so do they work for you or are they dead.
- Is your QC test automated? If so are you sure it is not faulty and letting dead units out the door? Maybe a manual test is needed so you know for sure all units leaving you are 100% working.
- Is there anything unusual about your location or places you ship to.  Some places irradiated all their mail with high energy x-rays and this can destroy electronics.
- Does the MCU/software system interface/talk-to something that is different in different parts of the world.  Here's examples of what i mean. Maybe bluetooth to a phone  or maybe comms to a desktop PC app. Maybe some people have their phone/desktop set to a different timezone/country/language and may your app is incompatible with this.
- Do you have the sourcecode to your ATMega or does the freelancer hold that?

Few possibilities i can think of.
- Are users connecting the battery around the wrong way and damaging the device. (9V around wrong way for a split sec, that sort of thing)
- Do you have any floating inputs pins on the MCU? Maybe the code only works if an input is read as either low or high but it keeps changing with ambient noise. When built it might stay in one state but in noise environments maybe it floats to high and stops the code running. (Floating inputs should have MCU pullups enabled in software but maybe they are not set in your code?)
- Have you tried powering the device from 4.5V and with lets say 50mA current limit.  Not all USB ports are created equal. Maybe your product is quite critical on power and not all USB ports can power it.
- Where are you getting your parts from, maybe you are getting lots of fake ICs
- Could be a PCB track routing issue where tracks run too close to a hole or board edge and sometimes get cut by the drill/router.  etc Some PCBs work, some don't, some intermittent.
- Are you sure you have the ATmega Fuse Bits set correctly, maybe the startup delay, brownout detector or crystal settings are wrong and this is making it run intermittently.
- Does the product have protection from ESD or PSU spikes, like a TVS?  Does the product get used in a location where it might need this.  etc  automotive/industrial
- How are you programming the MCUs?  I one had a crappy USBASP programmer that would brick 2 our of 5 AVRs it flashed. Not sure why, maybe clock was out of spec and kept erasing fuse bits.
- Does your MCU programming system include a verify check?
- There is one AVR MCU, cant remember which, that comes with fuse bit set to put it into a compatibility mode where it pretends to be a different AVR chip. Some of the IO/peripherals don't work until you get it out of that mode.   :palm:  (my guess is they have a supply agreement to sell a compatibly chip for 25 years for MIL/MED/AERO) EDIT: All ATMega128 pretend to be a ATMega103 until you change the M103C fuse bit

And once the PCBs arrived, none of them would allow me to burn the boot loader onto the ATMEGA via the programming header.
Had to pull all the ATMEGAs off and flash them in a socket.
Put them back on and 90% of the boards I made just don't work as expected.

This makes me lean towards a PCB/SCH issue.
Are you using MISO MOSI SCK pins for anything else other than programming?
You can use them for other things too, but you need to make sure you don't load the lines so much that programming is effected.
It can become intermittent if you load them or have caps on the line to gnd.

Also, grab one of those boards that doesn't program and use DMM to check the tracks between the programming header pins GND VCC MOSI MISO SCK RESET and the ATmega pads for those pins. Also check none are shorted together.



Help
- Are your PCB files in Altium? if so i'm happy to take a look at your SCH/PCB/CODE and see if i can spot any potential problems.


Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: vk6zgo on June 01, 2019, 12:00:25 pm
I remember many years ago, a separate State branch of the organisation I worked for were tasked with making boards that would automatically ring various phone numbers  when required.

We duly received our portion of those devices, but unfortunately they didn't work.
When we complained to the other State, they protested:
 "But we tested them & they all rang up who they were supposed to!"

Yup! They dutifully programmed the whole number needed to call those sites from their State into the PROMs.
Those additional numbers, of course, weren't needed in the State they were intended for, & would "freak the exchange out".
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: iMo on June 01, 2019, 12:09:39 pm
Hard to help without details. Anyhow, the fact the 5-10% that get shipped end up not working is an indication there is something wrong with the product or processes around and the manufacturer should have stopped shipping such a product.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: dmills on June 01, 2019, 01:07:24 pm
What does your power input stage look like? Ceramic cap and LDO by any chance?

If so, try plugging in the power with the supply already switched on, and with reasonably long power leads, you might be blowing the regulator due to ringing in the LC circuit formed by the power cable and the input cap (Cure is a jellybean electrolytic about 10 times the value of the input ceramic in parallel with it, the ESR damps the ringing).

Are ALL of your external IO lines fitted with some form of ESD protection?

No floating inputs on the micro?
Is everything run well within datasheet ratings?

I had an issue with a production board once where we suddenly started getting a very high failure rate, turned out the spi bus was being driven on the wrong clock edge and the first batch of the peripheral chips just happened to work with zero hold time, the next ones not so much.

You need to get a few duds back and investigate.

Regards, Dan.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Psi on June 01, 2019, 01:23:45 pm
turned out the spi bus was being driven on the wrong clock edge and the first batch of the peripheral chips just happened to work with zero hold time, the next ones not so much.

hehe
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: AndyC_772 on June 01, 2019, 02:26:25 pm
It sounds like you've got two problems here.

1) Difficulty programming the MCU, for reasons unknown

2) Customers reporting that products don't work, when they passed your own in-house testing

It's possible these are related, but not necessarily. For now, I'd treat them as separate issues, and if fixing one leads to a solution for the other, that's a bonus. I'd also concentrate on exactly one example of each problem; don't get bogged down with the fact that there are many boards involved, even if their symptoms aren't identical, they may all have the same root cause.

Pick one board that doesn't program, check carefully all the signals required to program it, and find the definitive root cause for why that specific board doesn't work. There really aren't many things it can be; power supplies, clocks and timing, logic thresholds are about it.

Also, make sure you get back from customers at least some of the boards which are reported as faulty. Test them the exact same way you test boards prior to shipment, and see if they now fail. This tells you whether the difference is that the boards have stopped working, or if they have an inherent flaw which your testing has failed to pick up.

This sort of forensic fault-finding and testing is one of the things I do for a living. Feel free to PM me if you need some more detailed, specific advice.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: wraper on June 01, 2019, 02:52:52 pm
And once the PCBs arrived, none of them would allow me to burn the boot loader onto the ATMEGA via the programming header.
Had to pull all the ATMEGAs off and flash them in a socket.
Put them back on and 90% of the boards I made just don't work as expected.
This is after P&P and hand soldering over £300 worth of components >.<
As a first thing, you just brute-forced a workaround instead of solving the actual problem. Instead you should find the root cause of the problem, and then take measures to fix it. The same goes with DOA delivered boards. You should get some of them back from the customers and find what exactly causes them not working.
Quote
I just don't know what to do any more.

You apparently did not even try to find the problem but already don't know what to do next.  |O
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mariush on June 01, 2019, 03:13:40 pm
Maybe the flux you're using needs to be cleaned off the boards otherwise is causing some resistance or short circuits?
Maybe you're accidentally joining two pins of your microcontrollers during soldering?

Do you have screw holes near traces? maybe you're shorting traces with screws or breaking them with friction over traces?

If usb powered... are you assuming you're getting clean 5v? Maybe the guys at the other end have too long usb leads causing voltage drop, or maybe they have stupid unregulated phone charger style usb things pumping 5.5-6v in your boards?
Inductance on the long usb cable causing voltage spikes? Not enough capacitance on input and output of regulators that could damage the regulators or cause them to reset/glitch?  Bad output capacitors on regulators?

Where do they install these products? are there powerful magnets or some induction things or something that could be picked up by your circuit and affect it


maybe share at the very least a picture of the assembled board ... if it's too much of a secret to show a schematic or something more complex
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: typematrix on June 01, 2019, 03:52:49 pm
Hi

Have you gotten boards back from customer?
i.e. customer returns.

If you get a customer return board back on your bench does it work?
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 01, 2019, 03:54:04 pm
Note: I am adding to this list as i think of things. So you may want to re-read it in case i have added more since you read it last.

Questions
- Have you got any units back that a client says don't work? If so do they work for you or are they dead.
- Is your QC test automated? If so are you sure it is not faulty and letting dead units out the door? Maybe a manual test is needed so you know for sure all units leaving you are 100% working.
- Is there anything unusual about your location or places you ship to.  Some places irradiated all their mail with high energy x-rays and this can destroy electronics.
- Does the MCU/software system interface/talk-to something that is different in different parts of the world.  Here's examples of what i mean. Maybe bluetooth to a phone  or maybe comms to a desktop PC app. Maybe some people have their phone/desktop set to a different timezone/country/language and may your app is incompatible with this.
- Do you have the sourcecode to your ATMega or does the freelancer hold that?

Few possibilities i can think of.
- Are users connecting the battery around the wrong way and damaging the device. (9V around wrong way for a split sec, that sort of thing)
- Do you have any floating inputs pins on the MCU? Maybe the code only works if an input is read as either low or high but it keeps changing with ambient noise. When built it might stay in one state but in noise environments maybe it floats to high and stops the code running. (Floating inputs should have MCU pullups enabled in software but maybe they are not set in your code?)
- Have you tried powering the device from 4.5V and with lets say 50mA current limit.  Not all USB ports are created equal. Maybe your product is quite critical on power and not all USB ports can power it.
- Where are you getting your parts from, maybe you are getting lots of fake ICs
- Could be a PCB track routing issue where tracks run too close to a hole or board edge and sometimes get cut by the drill/router.  etc Some PCBs work, some don't, some intermittent.
- Are you sure you have the ATmega Fuse Bits set correctly, maybe the startup delay, brownout detector or crystal settings are wrong and this is making it run intermittently.
- Does the product have protection from ESD or PSU spikes, like a TVS?  Does the product get used in a location where it might need this.  etc  automotive/industrial
- How are you programming the MCUs?  I one had a crappy USBASP programmer that would brick 2 our of 5 AVRs it flashed. Not sure why, maybe clock was out of spec and kept erasing fuse bits.
- Does your MCU programming system include a verify check?
- There is one AVR MCU, cant remember which, that comes with fuse bit set to put it into a compatibility mode where it pretends to be a different AVR chip. Some of the IO/peripherals don't work until you get it out of that mode.   :palm:  (my guess is they have a supply agreement to sell a compatibly chip for 25 years for MIL/MED/AERO) EDIT: All ATMega128 pretend to be a ATMega103 until you change the M103C fuse bit

And once the PCBs arrived, none of them would allow me to burn the boot loader onto the ATMEGA via the programming header.
Had to pull all the ATMEGAs off and flash them in a socket.
Put them back on and 90% of the boards I made just don't work as expected.

This makes me lean towards a PCB/SCH issue.
Are you using MISO MOSI SCK pins for anything else other than programming?
You can use them for other things too, but you need to make sure you don't load the lines so much that programming is effected.
It can become intermittent if you load them or have caps on the line to gnd.

Also, grab one of those boards that doesn't program and use DMM to check the tracks between the programming header pins GND VCC MOSI MISO SCK RESET and the ATmega pads for those pins. Also check none are shorted together.



Help
- Are your PCB files in Altium? if so i'm happy to take a look at your SCH/PCB/CODE and see if i can spot any potential problems.


Thanks for the long reply and offer to check boards. Ill export them and DM you the files.

I have the boards that have failed from when clients used them. More than half worked as expected. Some did fail testing.
QC is done by hand. I run the boards on a test program for 24 hours. We also test after this that the PWM input, switches and firmware all work.
We ship from the UK to all over. Ill need to dig into locations that we have shipped to and have had units fail. But France and Sweden have had boards fail. The boards are encased in an aluminium block with >1.5mm wall around with a few cutouts for buttons and display.
The device has a PWM input from a sensor and displays it on some 7 segment displays. It has a NRF24L01 that talks to another one of the same device that I make.
I have source code.


I have a diode protecting reverse polarity though I don't know if I did it the correct way. But the power is only able to be plugged in one way using a LEMO socket and cable, which we provide.
We have 1 pin that is pulled to ground if it is a receiver and not a transmitter.
USB is only for updates, though the device works using it. The actual power source is 16v batteries and I use an off the shelf power regulator board to take 13-20v down to 12v, which then gets taken down to 5v with a normal REG.

This might be a good point, all but a few of the new boards I have made up from batch 3, have been with parts from LCSC. I will check this tonight.

I have used the default Circuit Maker design rules. But the outside edge I made quite far in. At least 1-2mm.
Not sure.
This product gets used all over, but most boards that have failed, arrived to the user bad. My hunch was xray but I have nothing to back that up with.
I use an Arduino to load the boot loader onto the ATMEGA and then a FTDI for flashing firmware.
AVRdude checks and we make sure it confirms all is good.
Not sure about the ATMega328p doing this or not.



I do use those programming pins for the NRF24L01.
Ill send you Altium files later but here is top and bottom.
Board on left is batch 1 and 2. Board on right is batch 3.
(https://i.postimg.cc/qhgyrmC8/Bottom.png) (https://postimg.cc/qhgyrmC8)

(https://i.postimg.cc/Q9P5dppv/top.png) (https://postimg.cc/Q9P5dppv)

Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: vk6zgo on June 01, 2019, 04:03:03 pm
What does your power input stage look like? Ceramic cap and LDO by any chance?

If so, try plugging in the power with the supply already switched on, and with reasonably long power leads, you might be blowing the regulator due to ringing in the LC circuit formed by the power cable and the input cap (Cure is a jellybean electrolytic about 10 times the value of the input ceramic in parallel with it, the ESR damps the ringing).

Are ALL of your external IO lines fitted with some form of ESD protection?

No floating inputs on the micro?
Is everything run well within datasheet ratings?

I had an issue with a production board once where we suddenly started getting a very high failure rate, turned out the spi bus was being driven on the wrong clock edge and the first batch of the peripheral chips just happened to work with zero hold time, the next ones not so much.

You need to get a few duds back and investigate.

Regards, Dan.

Reminds me of the Transmitter remote control system we had in to modify.
While doing so, I mislaid one of a number of monostables which were used in it.
No worry,-- plenty in stock!
Fitted a new one-- damn thing wouldn't work!

It turns  out that with that particular IC, people had been having problems getting stable operation at short "on" times.
The manufacturer's answer was to split the line into two devices, one with the original part number, which they redesigned to be optimised for short "on" times, sacrificing its performance (which had been perfectly satisfactory) for long "on" times.

For long "on" times, they produced a new device & type number

Not knowing this, we got caught out when the thing didn't work.
In the end, we had to get a stock of the new devices in, replace them in both remote control systems in use, & change the documention.

I wonder how many "young (& old) players" got caught by that, & "tore their hair out" over the years?
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 01, 2019, 04:49:31 pm
Maybe the flux you're using needs to be cleaned off the boards otherwise is causing some resistance or short circuits?
Maybe you're accidentally joining two pins of your microcontrollers during soldering?

Do you have screw holes near traces? maybe you're shorting traces with screws or breaking them with friction over traces?

If usb powered... are you assuming you're getting clean 5v? Maybe the guys at the other end have too long usb leads causing voltage drop, or maybe they have stupid unregulated phone charger style usb things pumping 5.5-6v in your boards?
Inductance on the long usb cable causing voltage spikes? Not enough capacitance on input and output of regulators that could damage the regulators or cause them to reset/glitch?  Bad output capacitors on regulators?

Where do they install these products? are there powerful magnets or some induction things or something that could be picked up by your circuit and affect it


maybe share at the very least a picture of the assembled board ... if it's too much of a secret to show a schematic or something more complex

I can check the flux idea later.

No screws near traces. Only ground planes.

Not USB powered but can be powered via USB. USB only really for firmware updates.

The product is not installed near anything like that. Just near other electronics. But issues have been happening before being used.

(https://i.postimg.cc/qhgyrmC8/Bottom.png) (https://postimg.cc/qhgyrmC8)

(https://i.postimg.cc/Q9P5dppv/top.png) (https://postimg.cc/Q9P5dppv)





Hard to help without details. Anyhow, the fact the 5-10% that get shipped end up not working is an indication there is something wrong with the product or processes around and the manufacturer should have stopped shipping such a product.

Looking at it, it is less than 5%. Total of around 100 boards, only 5-6 have failed. One just needed a repair after user error.




What does your power input stage look like? Ceramic cap and LDO by any chance?

If so, try plugging in the power with the supply already switched on, and with reasonably long power leads, you might be blowing the regulator due to ringing in the LC circuit formed by the power cable and the input cap (Cure is a jellybean electrolytic about 10 times the value of the input ceramic in parallel with it, the ESR damps the ringing).

Are ALL of your external IO lines fitted with some form of ESD protection?

No floating inputs on the micro?
Is everything run well within datasheet ratings?

I had an issue with a production board once where we suddenly started getting a very high failure rate, turned out the spi bus was being driven on the wrong clock edge and the first batch of the peripheral chips just happened to work with zero hold time, the next ones not so much.

You need to get a few duds back and investigate.

Regards, Dan.

Power regulation is a bit of a hack tbh
I am using an drone regulator board to take 12-20v input down to 12v, I then have a 5v REG for the IC.
I did this for a few reasons, the main one being heat, I found that the LDO and other REG SMD packages got too hot with upwards of 20v.
The other being height. I was able to get these drone power regulators on a PCB less than 4mm which was important.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: OwO on June 01, 2019, 05:07:42 pm
First thing you really have to do before considering anything else is to triage the failures and get to the bottom of it; exactly which part failed and in what way? otherwise all we can do is speculate and brainstorm 1000s of unrelated and probably irrelevant possibilities.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: OwO on June 01, 2019, 05:11:05 pm
You did mention half of the reported failed boards then subsequently passed testing on your end; in these cases what did the customer observe? Wireless comms not working? device not responding to any input and appearing "dead"? Is each customer different or is there a pattern of a type of failure? We still need far far more info before we can be helpful.

I have the boards that have failed from when clients used them. More than half worked as expected. Some did fail testing.
Action items right now: debug the boards that do fail testing, try to find the exact root cause, record all observations. Dig out all failure reports with devices that subsequently DID pass tests, record them somewhere and look for patterns. Ask customers for more info if necessary.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on June 01, 2019, 11:18:40 pm
First thing you really have to do before considering anything else is to triage the failures and get to the bottom of it

Quote
Action items right now: debug the boards that do fail testing, try to find the exact root cause, record all observations.
Exactly. Investigation on dead boards is essential. There are too many causes and considerations to find the problem without proper debugging. Absolutely analyze where those boards failed and report your discoverings, possibly together with a schematic.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Psi on June 02, 2019, 12:56:44 am
I have the boards that have failed from when clients used them. More than half worked as expected. Some did fail testing.
QC is done by hand. I run the boards on a test program for 24 hours. We also test after this that the PWM input, switches and firmware all work.
...
The boards are encased in an aluminium block with >1.5mm wall around with a few cutouts for buttons and display.

Do you by any chance put screws into this aluminum case? If so are they pre-threaded?
Thread forming screws (or just tight screws) into aluminium creates lots of metal filings!
The metal flakes may not cause a problem initially but after being shaken around in transport maybe they get all over the place and short IC pins.
Test: Put down a clean sheet of copy paper on a desk. Grab a finished unit and carefully open it up on the paper. Tap out the board & case and see if any metal flakes come out. The paper will give a good contrast to make them easy to see.

This product gets used all over, but most boards that have failed, arrived to the user bad. My hunch was xray but I have nothing to back that up with.
Normal airport x-ray is totally fine, that wont do anything. Only the high power X-ray's used to sterilize mail are a concern. Those are usually found at government buildings.

I use an Arduino to load the boot loader onto the ATMEGA and then a FTDI for flashing firmware.
AVRdude checks and we make sure it confirms all is good.
Not sure about the ATMega328p doing this or not.
FYI - There's a new chip out, the ATMega328PB which is not the same as a ATMega328P.
It's easy to think 'oh that's just the lead version' but no, it's a different chip with some different pinouts.

I do use those programming pins for the NRF24L01.
Right, so the ATMega328 SPI pins is used for flash programming of the MCU and also for talking to the NRF24L01 chip over SPI?
How are you handling the reset line on the ATMega328? Is it pulled high externally? Is it connected to an external button or something?
I just wonder if it's possible for the ATmega to go into reset state for some reason while comms to NRF24L01 are active and somehow get garbage send to the ATmega while it's in reset low state (program mode).
I'm not sure this is actually possible, because there should be no SPI clock once MCU goes into reset.
I'm just thinking out loud. Maybe someone else will have a through reading this.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Psi on June 02, 2019, 01:20:06 am
Have a look at these 2 areas on dead PCBs.
There maybe issues where the track has broken or shorted etc..

Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Psi on June 02, 2019, 01:23:00 am
How are you handling merging of the USB Vbus power onto the 5V from the voltage regulator output?
Normally you would diode OR the two sources, but from the pcb layout it looks more like connecting 5V usb to reg output?

Voltage regulators do not like a higher voltage on their output than their input. They tend to die.
That can happen if you connect 5V from USB onto the output of a 5V reg and then remote the battery that's powering the input!

I could see you doing all QC test with a battery always connected but a user connecting USB first because they have a shinny new toy and can't wait to plug it in before they can source a battery.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 02, 2019, 10:54:59 am
I have the boards that have failed from when clients used them. More than half worked as expected. Some did fail testing.
QC is done by hand. I run the boards on a test program for 24 hours. We also test after this that the PWM input, switches and firmware all work.
...
The boards are encased in an aluminium block with >1.5mm wall around with a few cutouts for buttons and display.

Do you by any chance put screws into this aluminum case? If so are they pre-threaded?
Thread forming screws (or just tight screws) into aluminium creates lots of metal filings!
The metal flakes may not cause a problem initially but after being shaken around in transport maybe they get all over the place and short IC pins.
Test: Put down a clean sheet of copy paper on a desk. Grab a finished unit and carefully open it up on the paper. Tap out the board & case and see if any metal flakes come out. The paper will give a good contrast to make them easy to see.

This product gets used all over, but most boards that have failed, arrived to the user bad. My hunch was xray but I have nothing to back that up with.
Normal airport x-ray is totally fine, that wont do anything. Only the high power X-ray's used to sterilize mail are a concern. Those are usually found at government buildings.

I use an Arduino to load the boot loader onto the ATMEGA and then a FTDI for flashing firmware.
AVRdude checks and we make sure it confirms all is good.
Not sure about the ATMega328p doing this or not.
FYI - There's a new chip out, the ATMega328PB which is not the same as a ATMega328P.
It's easy to think 'oh that's just the lead version' but no, it's a different chip with some different pinouts.

I do use those programming pins for the NRF24L01.
Right, so the ATMega328 SPI pins is used for flash programming of the MCU and also for talking to the NRF24L01 chip over SPI?
How are you handling the reset line on the ATMega328? Is it pulled high externally? Is it connected to an external button or something?
I just wonder if it's possible for the ATmega to go into reset state for some reason while comms to NRF24L01 are active and somehow get garbage send to the ATmega while it's in reset low state (program mode).
I'm not sure this is actually possible, because there should be no SPI clock once MCU goes into reset.
I'm just thinking out loud. Maybe someone else will have a through reading this.



I have the screw holes pre-threaded on the cnc.
The whole case is then cleaned, bead blasted then anodised. They are super clean.


I am aware of the ATMega328PB. I make sure not to order or use them.


I burn the bootloader before installing the WiFi board. But burning while it is on, not had any problems with that either.
The reset pin on the ATMega328p is shared between ICSP header and the FTDI chip.
There is a 0.1uF cap to ground. This is all to Arduino spec I believe.
(https://i.postimg.cc/J0R4m2vb/Capture.png)



How are you handling merging of the USB Vbus power onto the 5V from the voltage regulator output?
Normally you would diode OR the two sources, but from the pcb layout it looks more like connecting 5V usb to reg output?

Voltage regulators do not like a higher voltage on their output than their input. They tend to die.
That can happen if you connect 5V from USB onto the output of a 5V reg and then remote the battery that's powering the input!

I could see you doing all QC test with a battery always connected but a user connecting USB first because they have a shinny new toy and can't wait to plug it in before they can source a battery.

So the power is always delivered via the 16v input. The only time users use USB is to test the device and update firmware.
Both are not used at the same time and we make that very clear in the documentation.

The power is as per Arduino spec for the Arduino Nano.


Have a look at these 2 areas on dead PCBs.
There maybe issues where the track has broken or shorted etc..



Ill check, thanks for spotting.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 04, 2019, 05:40:11 pm
Ignore how dirty it is but the board on the left has an ATMega328p from LCSC and the one on the right is from RS.
The text on the LCSC one is barley visible, like this is better than what I can see with my eyes.

The RS one burned the boot loader just fine.
Will add all the other components to see if it runs my firmware without issue.

(https://i.postimg.cc/KvhXDxsz/IMG-20190604-183426.jpg)
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: wraper on June 04, 2019, 06:10:36 pm
The text on the LCSC one is barley visible, like this is better than what I can see with my eyes.
Text is not visible because IC is dirty.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: viperidae on June 05, 2019, 11:04:27 am
LCSC part looks counterfeit.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: bd139 on June 05, 2019, 11:45:09 am
Yeah. Package is different as well. Only slightly.

There’s a company out there, the name I forget (green something) which sells “compatible” mega328p clones. Wonder if some of them got rebranded and chucked in the supply chain.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mikerj on June 05, 2019, 11:53:49 am
LCSC part looks counterfeit.

Counterfeit 328P devices (https://forum.mysensors.org/topic/9388/atmega328p-au-counterfeit/15) certainly exist, but it's a little premature to say this one looks counterfeit when the text can't even be seen through the crud on the board.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Psi on June 05, 2019, 11:59:15 am
Also its common for even genuine chips to come in a few different looking versions depending on the factory they were manufactured in.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: OwO on June 05, 2019, 12:25:58 pm
 :palm:
No triaging work has been done so far, and already starting to blame parts.
This is not how you do troubleshooting, you start from the symptoms and work backwards from there. The OP has not even said HOW the devices are malfunctioning.

Example troubleshooting steps:
Device not showing up on WiFi? => Attach the debugger and verify the MCU is running. MCU not running? => Attach debugger again and see where the execution is at, or even reset MCU from the debugger and step through the code. Debugger does not see MCU? Check all supply rails. Unsolder MCU, solder onto an arduino board to check if the MCU is functional.

NOT
Device not showing up on WiFi? => Is my LDO blown? Is my programmer broken? The MCU is counterfeit!

Please if you aren't going to actually troubleshoot it, at least describe the symptoms so we can have a better guess.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 05, 2019, 12:33:55 pm
I am not blaming parts or anyone. Just pointing out something that looks odd to me. I am not making any presumptions here.
The text was barley visible when I got the tray.

Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on June 05, 2019, 12:38:20 pm
:palm:
No triaging work has been done so far
I think it's correct, debugging is essential.
You must absolutely understand before WHERE the boards failed.
Only after discovering a problem you can solve it.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 05, 2019, 01:12:57 pm
Just soldered on a new ATMega328p that I ordered last night.
Does not burn the boot loader.

Thinking the ATMega that I had already had a boot loader or by pure chance it burnt on that board.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Psi on June 06, 2019, 02:49:55 am
Have you checked the fuse bits to confirm settings are correct?
- Clock source (often there are different settings for <8Mhz and >8Mhz xtal.  If you use the <8Mhz setting for 16mhz clock it works just not reliably.
- Startup delay (set max delay if unsure)
- Brownout setting.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on June 06, 2019, 07:54:48 am
Quote from: Jackster
I have the boards that have failed from when clients used them. More than half worked as expected. Some did fail testing.
Sho you had perfectly working boards and now some of them aren't working.
So why don't you trace what's failed in that boards? It's the MCU? It's one of the power rails? The crystal? What else? That is essential! Knowing that would be a huge help.

Please, do it and report it here.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 06, 2019, 07:58:36 am
Quote from: Jackster
I have the boards that have failed from when clients used them. More than half worked as expected. Some did fail testing.
Sho you had perfectly working boards and now some of them aren't working.
So why don't you trace what's failed in that boards? It's the MCU? It's one of the power rails? The crystal? What else? That is essential! Knowing that would be a huge help.

Please, do it and report it here.

I am not sure what was wrong with them other than the firmware stops running as it should be.
Power is fine, sensor input is fine.

The only thing it could be is the MCU or WiFi board.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mikeselectricstuff on June 06, 2019, 08:25:47 am
Only read the first post and skimmed the rest but my guess would be oscillator startup issues - assuming you use a crystal or ceramic resonator. This would be consistent with inability to program and some units working and some not.
There can be a number of causes, bad PCB layout being one, in particular capacitance across the device, as well as non-optimal load capacitance.
A quick fix can often be to add some resistance across the resonator ( 1-10M), or if possible select a different oscillator power - I don't recall the AVR options offhand but many MCUs have drive-level options to trade off power draw vs startup time.

If you have a dead board in front of you, poke the oscillator pins and see if it starts.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on June 06, 2019, 08:35:03 am
I am not sure what was wrong with them other than the firmware stops running as it should be.
Power is fine, sensor input is fine.
The only thing it could be is the MCU or WiFi board.
As I (and the wise Mike) said: check the crystal/oscillator. Put an oscilloscope to see if it's working. Keep in mind correct probe range/capacitance while probing.
Also, "power is fine": did you check it with a DMM or with an oscilloscope?
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mikeselectricstuff on June 06, 2019, 08:37:19 am
I am not sure what was wrong with them other than the firmware stops running as it should be.
Power is fine, sensor input is fine.
The only thing it could be is the MCU or WiFi board.
As I (and the wise Mike) said: check the crystal/oscillator. Put an oscilloscope to see if it's working. Keep in mind correct probe range/capacitance while probing.
Also, "power is fine": did you check it with a DMM or with an oscilloscope?
Don't forget that the scope probe itself can start it oscillating. If possible use a x100 probe. Or better look at an output pin that is being toggled by software after startup
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 06, 2019, 12:01:50 pm
I don't have an oscilloscope. Just multi meter.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on June 06, 2019, 12:03:36 pm
I bet that with an oscilloscope you could find the issue in minutes. Since you're doing circuits for work, consider purchasing one.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 06, 2019, 12:25:38 pm
I bet that with an oscilloscope you could find the issue in minutes. Since you're doing circuits for work, consider purchasing one.

Will a £50 USB one be enough?
https://www.amazon.co.uk/HANTEK-Portable-Based-Digital-Oscilloscope/dp/B00EEX1W5G/ref=sr_1_28?keywords=oscilloscope&qid=1559823810&s=gateway&sr=8-28 (https://www.amazon.co.uk/HANTEK-Portable-Based-Digital-Oscilloscope/dp/B00EEX1W5G/ref=sr_1_28?keywords=oscilloscope&qid=1559823810&s=gateway&sr=8-28)
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mikeselectricstuff on June 06, 2019, 01:31:55 pm
I don't have an oscilloscope. Just multi meter.
Buy one.
Without it you are pissing away your time.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on June 06, 2019, 01:43:08 pm
Will a £50 USB one be enough?

An USB like that has a very very low bandwidth; you need at least double bandwitdh of your clock.
However a proper real DSO with proper knobs on the chassis is much more confortable to use.

Like Mike said, without one you're really wasting your time.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: tggzzz on June 06, 2019, 01:44:08 pm
I don't have an oscilloscope. Just multi meter.

That was the equipment I used as an amateur when I was still at school. It got me a long way, but I knew when it wasn't sufficient and used a scope where necessary.

Quite frankly I'm surprised you do still have paying clients.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: tggzzz on June 06, 2019, 01:45:44 pm
you need at least double bandwitdh of your clock.

False. All that matters is the transition time; the period/frequency is irrelevant.

For an outline of the theory and some measurements, see https://entertaininghacks.wordpress.com/2018/05/08/digital-signal-integrity-and-bandwidth-signals-risetime-is-important-period-is-irrelevant/
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: tszaboo on June 06, 2019, 03:13:21 pm
You dont have power supply bypassing. Whatever you put there is not enough, too far and so on. You dont even have a ceramic cap next to that FTDI chip. I'm not surprised it is not working.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 06, 2019, 04:08:35 pm
So something like the Hantek DSO5102P?

Even so, would it even be worth it if I need to hire someone to remake the circuit anyway...
And it would take a while to learn how to use the thing..
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: daqq on June 06, 2019, 04:32:41 pm
One of the issues we've had were crystal load capacitors - in the prototype everything worked. After a batch was made, quite a lot didn't work. This was because the load capacitance was above the one specified by the processor (STM32L1). The oscillator simply wouldn't start. We had to replace all of the capacitors.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 06, 2019, 11:32:48 pm
So I managed to get some time in the office today and got one of the bad boards and started removing components and testing.

After a few, I removed the FTDI and it burned the boot loader.
Put all the other components back other than the FTDI and again it burned.

I then put the same FTDI chip back and it did not burn. Removed it and it burned.
I then got a known working FTDI chip from RS and it burnt the boot loader.

I need to double check all of this in the morning so this is more of a note for myself.
The LCSC ones are dated year 17 and the RS ones are dates 18.

Will take a working board and remove the working FTDI chip and replace with one of the new FTDI chips and see if it burns the boot loader in the morning.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on June 07, 2019, 10:13:54 am
If I may, I will invest time in understanding WHAT is failed instead trying to fix something that you don't know exactly.

HantekDSO5102P is fine, you could choose also a Siglent SDS1052DL or a used classic Rigol DS1052E.
I know that you should learn how to use it properly but it's something mandatory and not so difficoult (if I can use it , anyone else can do). You can't live without it.

Once you have an oscilloscope, you can't (and you wouldn't) get back!

Imagine this: without that instrument you are blind.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 07, 2019, 10:32:18 am
If I may, I will invest time in understanding WHAT is failed instead trying to fix something that you don't know exactly.

HantekDSO5102P is fine, you could choose also a Siglent SDS1052DL or a used classic Rigol DS1052E.
I know that you should learn how to use it properly but it's something mandatory and not so difficoult (if I can use it , anyone else can do). You can't live without it.

Once you have an oscilloscope, you can't (and you wouldn't) get back!

Imagine this: without that instrument you are blind.

IDK what I am looking for though with it.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Ian.M on June 07, 2019, 10:34:44 am
So I managed to get some time in the office today and got one of the bad boards and started removing components and testing.

After a few, I removed the FTDI and it burned the boot loader.
Put all the other components back other than the FTDI and again it burned.

I then put the same FTDI chip back and it did not burn. Removed it and it burned.
I then got a known working FTDI chip from RS and it burnt the boot loader.

I need to double check all of this in the morning so this is more of a note for myself.
The LCSC ones are dated year 17 and the RS ones are dates 18.

Will take a working board and remove the working FTDI chip and replace with one of the new FTDI chips and see if it burns the boot loader in the morning.

Possible fake FTDI chips?  Hack a bad board to loop the FDTI's RX and TX pins, open the USB serial port with a terminal program and see if it responds
Code: [Select]
NON GENUINE DEVICE FOUND!
character by character as you type instead of the loopback echoing what you type.  (see: https://www.eevblog.com/forum/microcontrollers/ftdi-gate-2-0/ (https://www.eevblog.com/forum/microcontrollers/ftdi-gate-2-0/) )
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: tggzzz on June 07, 2019, 10:38:43 am
If I may, I will invest time in understanding WHAT is failed instead trying to fix something that you don't know exactly.

HantekDSO5102P is fine, you could choose also a Siglent SDS1052DL or a used classic Rigol DS1052E.
I know that you should learn how to use it properly but it's something mandatory and not so difficoult (if I can use it , anyone else can do). You can't live without it.

Once you have an oscilloscope, you can't (and you wouldn't) get back!

Imagine this: without that instrument you are blind.

IDK what I am looking for though with it.

I agree with your previous inference that you will be getting someone else to redesign it (viz. "Even so, would it even be worth it if I need to hire someone to remake the circuit anyway..."). That seems like a sensible course of action.

If the hardware is poorly designed/implemented, then any software you add would be building castles on sand.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: AndyC_772 on June 07, 2019, 10:58:20 am
If you think the FTDI chip might be involved, the next step is to work out possible ways in which it could cause the symptoms you're seeing.

Are there any physical pins on the FTDI chip that are also shared with pins which are needed to burn your boot loader?

Without knowing the details of your design, I'd suggest two possibilities - either:

a) there's a logic signal (or signals) in common. Are your programming (SPI / reset) pins connected to the FTDI, or are they separate? If they're completely separate, then it really shouldn't be able to interfere with boot loading via that route.

b) they share a common power supply, and something bad is happening which is causing the voltage at the MCU to go out of spec during programming. Does anything get warm?

Another option (c) is that the FTDI chip is a complete red herring, and the difference is caused by heating, cooling and flux contamination of your PCB when you remove and replace components. Be sure to thoroughly clean the board after every rework operation, especially in and around the MCU crystal if it has one.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: NivagSwerdna on June 07, 2019, 11:08:22 am
Be careful to not conflate problems... looks like the earlier revision boards have a reliability issue but that needs proper post-mortem analysis.

Rev3 boards...
And once the PCBs arrived, none of them would allow me to burn the boot loader onto the ATMEGA via the programming header.
This is a probably a different problem, consider your design/manufacturing to be flawed and work from first principles on one board.  Does your PCB software have a rules check?  Check the uP pins for things that are GND that shouldn't be.
Populate one board with the bare minimum to program the uP onboard and work backwards.

Good Luck!
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: ebastler on June 07, 2019, 11:10:05 am
Have you gotten boards back from customer? i.e. customer returns.
If you get a customer return board back on your bench does it work?

I second that question, and don't think the OP has answered it (unless I overlooked some comment). This is an important step in finding the root cause of the failures in the field.

@Jackster: Do you actually know that the boards somehow "broke" in transit? Or are they still in the same shape they were in when you sent them out, but don't work at the customers while they still work when you (re-)test them at home?
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 07, 2019, 11:16:08 am
I am not saying the FTDI chip is the cause btw. Just that when removed and replaced with a known working one, there is no boot loader issues.
I am pretty much on board with the issue being the board design. Probably just coincidence that the old FTDI is working and the new ones are not?
Only tested a handful.

Possible fake FTDI chips?  Hack a bad board to loop the FDTI's RX and TX pins, open the USB serial port with a terminal program and see if it responds
Code: [Select]
NON GENUINE DEVICE FOUND!
character by character as you type instead of the loopback echoing what you type.  (see: https://www.eevblog.com/forum/microcontrollers/ftdi-gate-2-0/ (https://www.eevblog.com/forum/microcontrollers/ftdi-gate-2-0/) )

I don't think so. They all have different serial numbers. They were cheap though at $2.78 per chip.
Doing as you said, it just echos back what I type.


If you think the FTDI chip might be involved, the next step is to work out possible ways in which it could cause the symptoms you're seeing.

Are there any physical pins on the FTDI chip that are also shared with pins which are needed to burn your boot loader?

Without knowing the details of your design, I'd suggest two possibilities - either:

a) there's a logic signal (or signals) in common. Are your programming (SPI / reset) pins connected to the FTDI, or are they separate? If they're completely separate, then it really shouldn't be able to interfere with boot loading via that route.

b) they share a common power supply, and something bad is happening which is causing the voltage at the MCU to go out of spec during programming. Does anything get warm?

Another option (c) is that the FTDI chip is a complete red herring, and the difference is caused by heating, cooling and flux contamination of your PCB when you remove and replace components. Be sure to thoroughly clean the board after every rework operation, especially in and around the MCU crystal if it has one.

No shared pins for boot loading other than RESET.
(https://i.postimg.cc/gjZgvtBx/Capture.png) (https://postimg.cc/9wcP2b22)

Boards are pretty clean. Been using acetone as that is all I have :/




Be careful to not conflate problems... looks like the earlier revision boards have a reliability issue but that needs proper post-mortem analysis.

Rev3 boards...
And once the PCBs arrived, none of them would allow me to burn the boot loader onto the ATMEGA via the programming header.
This is a probably a different problem, consider your design/manufacturing to be flawed and work from first principles on one board.  Does your PCB software have a rules check?  Check the uP pins for things that are GND that shouldn't be.
Populate one board with the bare minimum to program the uP onboard and work backwards.

Good Luck!

I tried with just the ATMega328p, crystal and cap on the reset pin.
 Pretty sure it burned the boot loader. Was a few days ago and forgot to write it down. Ill try again.



Have you gotten boards back from customer? i.e. customer returns.
If you get a customer return board back on your bench does it work?

I second that question, and don't think the OP has answered it (unless I overlooked some comment). This is an important step in finding the root cause of the failures in the field.

@Jackster: Do you actually know that the boards somehow "broke" in transit? Or are they still in the same shape they were in when you sent them out, but don't work at the customers while they still work when you (re-)test them at home?

The boards are sent out in an aluminium case.
They are physically fine. They just develop a fault where the software no longer cycles.

This can happen on new boards too. It will go through the code 3-6 times and then hang.

Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Mr. Scram on June 07, 2019, 11:21:26 am
I remember many years ago, a separate State branch of the organisation I worked for were tasked with making boards that would automatically ring various phone numbers  when required.

We duly received our portion of those devices, but unfortunately they didn't work.
When we complained to the other State, they protested:
 "But we tested them & they all rang up who they were supposed to!"

Yup! They dutifully programmed the whole number needed to call those sites from their State into the PROMs.
Those additional numbers, of course, weren't needed in the State they were intended for, & would "freak the exchange out".
I'd argue that's an error on the exchange end, but you'll still have to deal with it.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: AndyC_772 on June 07, 2019, 11:55:07 am
Why have you got reset connected to DTR via a capacitor?

What happens if you remove that capacitor?
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: ubbut on June 07, 2019, 12:35:16 pm
Did not read the whole thread just leaving this here:


ATMegaxxxPB devices do not support the 'high amplitude crystal oscillator mode' only the 'low power mode'.
If you set the fuse bits like for a PA device, some will work, others won't. But always unreliably. Usually first programming action is fine, but then they might just be unresponsive..
Cost me ~100 faulty atmegas until the problem was discovered.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: NivagSwerdna on June 07, 2019, 12:37:25 pm
FWIW I have a high value R across my XTAL... no idea why... think I must have stolen the idea from the Uno reference design.

.... but https://forum.arduino.cc/index.php?topic=176297.0 so I wouldn't lose any sleep over that one  :D
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: oPossum on June 07, 2019, 01:18:01 pm
Why have you got reset connected to DTR via a capacitor?

Copied from Arduino. Not a good design IMO.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on June 07, 2019, 02:35:58 pm
I'm puzzled.

We don't know what the device in question does exactly. We know only it uses Wi-Fi and USB.
We have no data on what is wrong on the board or what do not work (hardware? software?). We know only that some boards doesn't work anymore.
We don't know in which mode the boards fail (what should they do? They work partially? They don't turn on? The microcontroller won't start up? The Wi-Fi does not connect? There is no output? What?)
There is no diagnostic data except "Power is fine, sensor input is fine".

I don't think we can guess the problem because there are so many possibilities, from fake chips to esd to user fault... the list goes on and on.

I think this is not the right way to proceed. We need data.

Can you post here (or privately via PM) a full schematic and at least a working diagram of your sketch and principle of operation toghether with detailed symptoms?

Here there are many people that wants to help you, but try to provide something more to work on.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on June 07, 2019, 02:41:13 pm
It will go through the code 3-6 times and then hang.
THIS. You must understand why it hangs. If it runs at least 3-6 times it's not the hardware "broken" (intended as burnt or physically broken) probably. There is something changing or at limit on the hardware that is affecting the software readings, maybe.

Could be a power issue or an input issue.

Assuming the power is perfectly fine and there isn't some glitch, noise or bounce affecting the MCU (which you can be sure only with a DSO) when something happens in the code running, you should use a debugger to understand what's happening into the MCU.

If you cannot use a debugger, try to comment your code in excess and output this comments in serial, to use them as "poor man debugger", so you can see at which point it hangs.

Depending on what the code is doing when it happens, you could guess better what's the cause.

But be sure about the power rail.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Mr. Scram on June 07, 2019, 03:38:07 pm
What a messy debugging process this is, if you can call it that. You really need to describe what you're dealing with in as much detail as possible. Then you systematically start eliminating potential issues, the most likely first. Right now it's just a haphazard scramble with major parts left in the dark.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 07, 2019, 05:30:03 pm
I'm puzzled.

We don't know what the device in question does exactly. We know only it uses Wi-Fi and USB.
We have no data on what is wrong on the board or what do not work (hardware? software?). We know only that some boards doesn't work anymore.
We don't know in which mode the boards fail (what should they do? They work partially? They don't turn on? The microcontroller won't start up? The Wi-Fi does not connect? There is no output? What?)
There is no diagnostic data except "Power is fine, sensor input is fine".

I don't think we can guess the problem because there are so many possibilities, from fake chips to esd to user fault... the list goes on and on.

I think this is not the right way to proceed. We need data.

Can you post here (or privately via PM) a full schematic and at least a working diagram of your sketch and principle of operation toghether with detailed symptoms?

Here there are many people that wants to help you, but try to provide something more to work on.


I can't go into any more detail into the debugging as I am not an electronic engineer and can only go as deep as "it is not burning the boot loader" and "it stops after X many cycles".
I know which each part of the circuit does but not how it works on the sort of level required for debugging.

The device takes a PWM input from a sensor and displays the result on 7 segment displays.
It can transmit this info over the WiFi interface to another that will take the data from the WiFi and display it on its seven segment display.



Happy to PM the files.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: ebastler on June 07, 2019, 05:58:13 pm
The device takes a PWM input from a sensor and displays the result on 7 segment displays.
It can transmit this info over the WiFi interface to another that will take the data from the WiFi and display it on its seven segment display.

Is that the unit conversion device for sonar measurements which you had posted about earlier, by any chance? These are used on boats, I would assume. Are you sure they handle the vibrations and humidity well?

(Is this maybe also the "three boards connected via FFC connectors" design you have also asked questions about in earlier threads? If so, are you sure the connectors are robust enough, and are you sure the signals make it across the connections in good shape?)

May I add a personal comment:  It would not hurt if you had the courtesy to let us know what your product is and does, and give us a link to the website you presumably have. You are selling these for profit, it seems, and are asking for free advice here. At least satisfy our curiosity in return; and the information may help with the troubleshooting as well.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: DimitriP on June 07, 2019, 06:27:17 pm
Quote
I'm puzzled.
The lack of an oscilloscope is also a bit puzzling. 
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 07, 2019, 06:41:02 pm
The device takes a PWM input from a sensor and displays the result on 7 segment displays.
It can transmit this info over the WiFi interface to another that will take the data from the WiFi and display it on its seven segment display.

Is that the unit conversion device for sonar measurements which you had posted about earlier, by any chance? These are used on boats, I would assume. Are you sure they handle the vibrations and humidity well?

(Is this maybe also the "three boards connected via FFC connectors" design you have also asked questions about in earlier threads? If so, are you sure the connectors are robust enough, and are you sure the signals make it across the connections in good shape?)

May I add a personal comment:  It would not hurt if you had the courtesy to let us know what your product is and does, and give us a link to the website you presumably have. You are selling these for profit, it seems, and are asking for free advice here. At least satisfy our curiosity in return; and the information may help with the troubleshooting as well.


Yea a few years back. Same project but a lot further on. Not used on boats, and the boards are conformally coated.


I am not using the FFC connectors. That design was stupid and over complex. Moved all the processing to a single board and now use 4 pin cables with JST connectors to transmit power and data.
Note that this is for a different board. This topic is about the other board I make.


I very much don't want to link the product. This is obviously not a good look to have.
Yes I do sell for profit, I hope I made that clear in the OP. This topic was more for knowing what to do next rather than actual debugging which I think is above my skill level.

As mentioned previously, quite happy to pay for someone to take a look and re-make the boards.


Quote
I'm puzzled.
The lack of an oscilloscope is also a bit puzzling.

I would not know what to do with it..
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: GreggD on June 07, 2019, 07:18:01 pm
You might want to inject a external oscillator into the atmel crystal input pin. Then try to program and set crystal drive level fuses. Works for me.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: soldar on June 07, 2019, 09:37:44 pm
... my guess would be oscillator startup issues - assuming you use a crystal or ceramic resonator. This would be consistent with inability to program and some units working and some not.
...
If you have a dead board in front of you, poke the oscillator pins and see if it starts.
good thinking!
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: langwadt on June 07, 2019, 11:36:52 pm
I am not saying the FTDI chip is the cause btw. Just that when removed and replaced with a known working one, there is no boot loader issues.
I am pretty much on board with the issue being the board design. Probably just coincidence that the old FTDI is working and the new ones are not?
Only tested a handful.

Possible fake FTDI chips?  Hack a bad board to loop the FDTI's RX and TX pins, open the USB serial port with a terminal program and see if it responds
Code: [Select]
NON GENUINE DEVICE FOUND!
character by character as you type instead of the loopback echoing what you type.  (see: https://www.eevblog.com/forum/microcontrollers/ftdi-gate-2-0/ (https://www.eevblog.com/forum/microcontrollers/ftdi-gate-2-0/) )

I don't think so. They all have different serial numbers. They were cheap though at $2.78 per chip.
Doing as you said, it just echos back what I type.


the FTDI "NON GENUINE DEVICE FOUND!" is a driver thing so it might work just fine with an old driver
and fail if using a newer driver



Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Ian.M on June 07, 2019, 11:53:48 pm
OTOH that particular FTDI Windows driver quirk will definitely cause AVRDUDE to fail to communicate with a serial bootloader in an ATmega, without any other evidence of why its failed.   If you are even slightly suspicious of the authenticity of a FTDI USB<=>serial chip, you are using a Windows PC, and the FTDI driver is more recent than the 'FTDIgate' event, doing that loopback test is essential.   Don't bother if you are using Linux or a Mac - its a Windows driver only issue.

If you still have FTDI on your 'design in' list in spite of their shenanigans, and you aren't a big enough player to buy direct from FTDI to avoid supply chain contamination, IMHO its essential to design your board to make it easy to do that loopback test - a loopback jumper would be a reasonable choice.

Another possibility with FTDI clones is failure at higher baud rates.  It only takes one lost or corrupted character in a few thousand to totally screw up bootloading, so you may wish to try a lower baud rate.  However the bootloader *MUST* support the baud rate you chose as both ends need to match.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: NivagSwerdna on June 08, 2019, 10:29:43 am
This is a fun thread... it has something for everyone and not enough information for any proper conclusions...

.. The schematic showed a ICSP header... I presume that is what is being used to program the uP?  The requirements for ICP are minimal... if it isn't working on Rev3 boards you either have some extra shorting to ground (due to errant ground pour) or dodgy chips.

Ignore any talk about Flux, ESD etc... until all the obvious has been eliminated.  For now just use a multimeter on continuity to determine if any ground shorts on Rev3 boards.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on June 08, 2019, 10:49:16 am
This is a fun thread... it has something for everyone and not enough information for any proper conclusions...

.. The schematic showed a ICSP header... I presume that is what is being used to program the uP?  The requirements for ICP are minimal... if it isn't working on Rev3 boards you either have some extra shorting to ground (due to errant ground pour) or dodgy chips.

Ignore any talk about Flux, ESD etc... until all the obvious has been eliminated.  For now just use a multimeter on continuity to determine if any ground shorts on Rev3 boards.

Yes the ICSP header is for programming.

I can't find any pins on the MCU that are grounded that should not be.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on June 10, 2019, 08:27:40 am
it has something for everyone and not enough information for any proper conclusions...
In my opinion it lacks of something. Not enough data.
I renew my offer: PM complete schematics and at least a flowchart principle of operation (not the sketch) toghether with failure symptoms (what should do and what instead do the board).
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: thinkfat on June 10, 2019, 08:48:05 am
If you look at the top of package of the Atmega from LCSC, it looks like the chamfer is a lot thinner, close to nonexistent. Were I to venture a guess, I'd say the top face of a chip has been milled or ground down a few thou to get rid of the marking and then a new marking has been stamped on. It's hard to see on the photo, however. To get a better image, maybe  wet the chip with IPA and try a different light angle.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on July 01, 2019, 04:03:44 pm
Turns out PCBway messed up a bit.

I was unable to see this with my eyes at first but now I have seen it, it is obvious.

(https://i.postimg.cc/3NL4M89n/pins.png)

The green PCB is correct. The Black one is one from the bad batch from PCBway.
The spacing between the holes is 1.27mm and the distance between the solder pads should be 0.375mm.


Not saying my problems are all their fault.
We have worked out what was causing the other programming issues that affected other boards in earlier batches.
I am working on improving those issues now with a Tag-Connect connector for programming.

I know there are a few other things that are not correct but we will see if they need fixing.


Thank you all for the support.
Learnt a lot.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: bd139 on July 01, 2019, 04:14:21 pm
Might want to look at JLCPCB. They actually test the boards...
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on July 01, 2019, 04:15:38 pm
The green board is from them.

I have had hundreds of boards from PCBway and this is the first issue I have had.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on July 01, 2019, 04:25:08 pm
Thank you for letting us know about the issue

I have had hundreds of boards from PCBway and this is the first issue I have had.

As a suggestion for the future, remember to always order board testing for production boards.


Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on July 01, 2019, 04:29:22 pm
Thank you for letting us know about the issue

I have had hundreds of boards from PCBway and this is the first issue I have had.

As a suggestion for the future, remember to always order board testing for production boards.

I thought they did do that over x number of boards ordered?
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on July 01, 2019, 04:55:49 pm
I thought they did do that over x number of boards ordered?
Proabably. But it's the "every X boards" that makes the difference.

Of course class2 testing (all traces are tested individually) it's not included.

Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: AndyC_772 on July 01, 2019, 05:02:51 pm
The shorted pins on the Wifi connector are certainly interesting, but they don't obviously explain your symptoms.

You've described cases where boards work for you but then stop working for your customer, or where swapping an apparently unrelated part can allow your CPU to program correctly (or not). Those symptoms aren't consistent with a bunch of shorted pins.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some years ago, I was working for a company making fairly complex ISA cards for PCs. On one batch, we were seeing roughly a 25% failure rate during testing, and we traced the fault to a broken PCB trace in the exact same position on every board.

The boards were made locally by a reputable supplier, who invited us in for a meeting to discuss what had happened.

As it turned out, the board was made 4 up on a panel, and the master artwork for one of the layers had a scratch in just the same position as our fault. We pointed out that the boards were supposed to be 100% tested before shipment.

Upon questioning, the operator doing the testing admitted that he'd removed the test of that particular net because it was failing too often. He had neither checked a board manually by himself, nor told anyone else that there was a problem. The defect was on an outer layer, plainly visible and easy to check with a multimeter, so no excuse whatsoever.

If I recall correctly, the operator in question was fired, the artwork was reprinted and we had no more issues until the board was discontinued some years later.

Bare board test: *always* do it, *usually* believe it.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: free_electron on July 01, 2019, 05:18:59 pm
a couple of other things ( apart from the fouled up pcb spacing )

- not enough bulk capacitance in the design
- not enough local capacitance in design
- the crystal you use is a resonator. your processor fuse bits may need to be tuned for that ! check what the load capacitance and bleed resistor is in those things. you can buy those in different variants and the tuning needs to be done.
- aluminum case.... do you connect that electrically to your system ground ?
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: mcinque on July 01, 2019, 05:45:47 pm
Upon questioning, the operator doing the testing admitted that he'd removed the test of that particular net because it was failing too often. He had neither checked a board manually by himself, nor told anyone else that there was a problem.
This is insane.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: bd139 on July 01, 2019, 05:51:24 pm
Upon questioning, the operator doing the testing admitted that he'd removed the test of that particular net because it was failing too often. He had neither checked a board manually by himself, nor told anyone else that there was a problem.
This is insane.

If you think that’s insane come to the software industry. Unit test not working? Delete!
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on July 01, 2019, 06:19:22 pm
a couple of other things ( apart from the fouled up pcb spacing )

- not enough bulk capacitance in the design
- not enough local capacitance in design
- the crystal you use is a resonator. your processor fuse bits may need to be tuned for that ! check what the load capacitance and bleed resistor is in those things. you can buy those in different variants and the tuning needs to be done.
- aluminum case.... do you connect that electrically to your system ground ?

I have copied the Arduino Nano circuit. Is there design wrong or am I not placing the caps in the correct place?

The case is anodised so is not conductive. So no I do not. Should I?
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: free_electron on July 01, 2019, 06:49:48 pm
Arduino is not exactly a case of 'proper' design. They take too many shortcuts.
That aside , what else is on your board. Anything drawing pulsed currents like muxed displays , rf transmitters etc ?
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on July 01, 2019, 07:16:38 pm
Arduino is not exactly a case of 'proper' design. They take too many shortcuts.
That aside , what else is on your board. Anything drawing pulsed currents like muxed displays , rf transmitters etc ?

Yea I have 4-8 seven segments displays and a nrf24l01.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Psi on July 02, 2019, 12:05:18 am
Arduino is not exactly a case of 'proper' design. They take too many shortcuts.
That aside , what else is on your board. Anything drawing pulsed currents like muxed displays , rf transmitters etc ?

The main problem with using arduino in a professional product is obscure bugs in the libraries.
They are written by random people who may not be very good at it.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: OwO on July 02, 2019, 04:42:27 am
The green PCB is correct. The Black one is one from the bad batch from PCBway.
The spacing between the holes is 1.27mm and the distance between the solder pads should be 0.375mm.

Check what the annular ring width is. It's possible their software enlarged the pads because the annular ring spec was violated.

If it took this long to triage a fault in production as simple as some shorted pins, then I would say the process needs some work too and not just the design. Were these first few boards soldered manually or did you put the whole batch to automated assembly? The contractor I use always assembles one board from each batch by hand as a sanity check before starting any automated assembly.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: free_electron on July 02, 2019, 07:22:08 am
Arduino is not exactly a case of 'proper' design. They take too many shortcuts.
That aside , what else is on your board. Anything drawing pulsed currents like muxed displays , rf transmitters etc ?

Yea I have 4-8 seven segments displays and a nrf24l01.

That would be one possibility. muxed displays draw peak currents. Any kind of noise on your power rail and the cpu may brown out. Same for RF transmitters. it looks like you have those mounted above the cpu ...
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Ysjoelfir on July 02, 2019, 08:33:00 am
I know there are a few other things that are not correct but we will see if they need fixing.
I have to admit that I am slightly furious after reading this phrase. If there is something wrong with your design and you know it but decide to go like "nah, isn't that bad, people buying this won't notice!" you are up for a very bad suprise. Reputation is slowly getting more important again, after years of cheap electronics that fail just after warranty ends because of obvious* design flaws that are not corrected because of cost and "well, it works NOW, why should I care if it works in 3 years?" - mentality.


* or intentionally created
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on July 02, 2019, 09:30:17 am
eMail from PCBway

Quote
Thanks for your information, I checked that your order BATCH3 is with the same pad design
and the production file of it is with smaller pads as your design. The difference is that order BATCH3
and BATCH2 produced at different production line, and it need different way to prepare production file.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: bd139 on July 02, 2019, 09:32:45 am
So they fucked up basically
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Rerouter on July 02, 2019, 09:41:10 am
Fun to know to avoid them,

Another agreeing that the default arduino layout is not always the best, but the software librarys can be used in commercial systems without issue. provided you review the code, and test the crap out of it.

There are about 1000 arduino based card readers of my creation floating out in the wild in some of the worst electrical and environmental conditions you can imagine, and yet have not had a single lock up and only 2 replacements due to vehicles being submerged in flood waters. Because at the end of the day, arduino code is just AVR code for most things, If you make sure everything checks out, then your sweet (verbose output for compilation is a good way to catch potential issues early)
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Dubbie on July 02, 2019, 10:14:38 am
I can’t really see how this could happen. If the Gerbers have whatever size in them, how is it possible for them to magically change?
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: AndyC_772 on July 02, 2019, 10:21:13 am
It's very normal for PCB manufacturers to make changes to artwork in order to adjust for the physical characteristics of their process. If, for example, they know that their etching process will over-etch by <x>, then they'll adjust the artwork to increase track width by <x> to compensate.

The problem comes if the process doesn't actually do what the (possibly modified) artwork was intended for.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Rerouter on July 02, 2019, 10:27:01 am
same for how they will usually erase silkscreen from copper pads and treat any hole with 2 touching copper pads as plated, these are simplifications in there process that generally lead to the best customer relation outcome, they just goofed. and somehow missed it in testing (likely because the production file was the flaw the optical never caught it)

The part that the less cheap PCB suppliers will do is give you feedback on ways to make your PCB more production ready for the next run, Allpcb ironically lets you download there production gerbers, which let you see what has changed, generally on the second run of boards I'll see what they shifted and adjust accordingly, Its fun to see just what they let through, Oh you want 0.4/0.3mm vias, yep straight on through without modification, its your own fault if you get a breakout.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on July 02, 2019, 10:59:09 am
Arduino is not exactly a case of 'proper' design. They take too many shortcuts.
That aside , what else is on your board. Anything drawing pulsed currents like muxed displays , rf transmitters etc ?

Yea I have 4-8 seven segments displays and a nrf24l01.

That would be one possibility. muxed displays draw peak currents. Any kind of noise on your power rail and the cpu may brown out. Same for RF transmitters. it looks like you have those mounted above the cpu ...

So I added a 0.1uF in the original design for the NRF24L01 board.
Reading up on it, people are recommending 10uF and some are recommending either a second or a tantalum as well as.
I gave the NRF24L01 board its own 3.3v power supply (everything else is 5v) and added the 0.1uF between the regulator and the NRF24L01.
The NRF24L01 boards I use are a bit higher spec that the Arduino hobby boards bought on eBay. It has a +10 or +20 dB gain circuit in it too.

The driver for the LEDs 7 segment displays is pretty close to the main 5v regulator.
There are no caps near it though. I looked at off the self boards for that IC and to see how they did it.
No caps on it other than on the signal lines in.

Any recommendations?
Thank you
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Siwastaja on July 02, 2019, 02:05:04 pm
Looking at the clues posted in this thread, I'm almost 99.9% positive there's more to your problems than just the PCB mishap. Although it's possible such "almost shorted" pads could give intermittent operation, it's unlikely it happens in multiple units. You have so many unexplained incidents of it failing, working again, then failing again.

If you want to become a professional design engineer, do yourself a favor and as soon as you have a moment of silence, don't go on to design more features, or a more advanced product, but instead, try to do a proper root cause analysis. Instead of just building products, try to build a process/a "factory" where you can robustly build these products without wasting a lot of time.

You seem to have many issues, some are likely correlated, some are not.

In a stressful situation, we tend to fall back into trying to just get things to work by whatever means. Like can't get the MCU flashed? It's not a total showstopper, swap the board and go on. But in the long run, solving the problem once and for all would pay back in time used, and, it could turn out it's connected to your other issues, so they would be solved as well.

When I was looking this comment of yours:

"They just develop a fault where the software no longer cycles. This can happen on new boards too. It will go through the code 3-6 times and then hang. "

I thought, you are very lucky. You have a lot of specimen that do fail, on your hands. And you have consistent failures. Like you don't need to operate a well-performing product for weeks to see a failure. If I understand correctly, you have at least one (1) unit in your hands which you can demonstrate a failure with, within minutes or hours. That's great.

It doesn't matter what the fault is and what do you think it might be caused by. Given this particular failure you can demonstrate, go for full-blown root cause analysis and see what you find.

You just need to make your steps smaller, and lower level. Whenever you hit a wall of not knowing how to do it, Google it, learn it.

I don't personally use debuggers a lot, but this could be a case where you'd get a starting point. Failing to have one, just make your code turn an LED on/off at different points of code, after a few iterations of moving around where you turn the LED on/off you have found the exact place in code where it hangs.

If your MCU isn't flashing, look at the communication signals with an oscilloscope, decode the contents. It may take several hours, but then you know exactly where it hangs. Chances are, you find some analog signaling issue (stuck logic level, bad rise/fall time)... in two seconds after looking at the scope screen.

Get yourself the basic tools, a 50MHz 2-channel digital storage oscilloscope being a bare minimum to debug such a design. A $400 4-channel Rigol or similar is more than enough, but I'm sure you can get an older generation thing used for maybe $100.
Title: Re: Pulling my hair out. Circuit boards stop working once shipped to client and more
Post by: Jackster on July 02, 2019, 02:24:50 pm
Looking at the clues posted in this thread, I'm almost 99.9% positive there's more to your problems than just the PCB mishap. Although it's possible such "almost shorted" pads could give intermittent operation, it's unlikely it happens in multiple units. You have so many unexplained incidents of it failing, working again, then failing again.

If you want to become a professional design engineer, do yourself a favor and as soon as you have a moment of silence, don't go on to design more features, or a more advanced product, but instead, try to do a proper root cause analysis. Instead of just building products, try to build a process/a "factory" where you can robustly build these products without wasting a lot of time.

You seem to have many issues, some are likely correlated, some are not.

In a stressful situation, we tend to fall back into trying to just get things to work by whatever means. Like can't get the MCU flashed? It's not a total showstopper, swap the board and go on. But in the long run, solving the problem once and for all would pay back in time used, and, it could turn out it's connected to your other issues, so they would be solved as well.

When I was looking this comment of yours:

"They just develop a fault where the software no longer cycles. This can happen on new boards too. It will go through the code 3-6 times and then hang. "

I thought, you are very lucky. You have a lot of specimen that do fail, on your hands. And you have consistent failures. Like you don't need to operate a well-performing product for weeks to see a failure. If I understand correctly, you have at least one (1) unit in your hands which you can demonstrate a failure with, within minutes or hours. That's great.

It doesn't matter what the fault is and what do you think it might be caused by. Given this particular failure you can demonstrate, go for full-blown root cause analysis and see what you find.

You just need to make your steps smaller, and lower level. Whenever you hit a wall of not knowing how to do it, Google it, learn it.

I don't personally use debuggers a lot, but this could be a case where you'd get a starting point. Failing to have one, just make your code turn an LED on/off at different points of code, after a few iterations of moving around where you turn the LED on/off you have found the exact place in code where it hangs.

If your MCU isn't flashing, look at the communication signals with an oscilloscope, decode the contents. It may take several hours, but then you know exactly where it hangs. Chances are, you find some analog signaling issue (stuck logic level, bad rise/fall time)... in two seconds after looking at the scope screen.

Get yourself the basic tools, a 50MHz 2-channel digital storage oscilloscope being a bare minimum to debug such a design. A $400 4-channel Rigol or similar is more than enough, but I'm sure you can get an older generation thing used for maybe $100.

Thanks for the info.

We worked out that the major flashing issue was down to the bad boards and the occasional issues to be the header pins I was using not being repeatable over many units.

I hopefully have fixed this with the latest revision that uses a pogo pin style cable to do the programming rather than just some 2.54mm headers stuck into the PCB.


As for the software stopping. My friend worked out that there was a overflow due to the tight timings. Slight changes between the boards could have causedthat most were fine but some failed after a while.