Have an industrial controller, several years in production, and made in reasonable volume production (~1000/yr). We very rarely had an unusual occurrence. The CPU is a MOT 68337. On chip is a CAN module (TOUCAN) that we implement to communicate DeviceNet. For years we never heard about any systemic problems. We had a customer describe an issue, that several times an hour the DeviceNet would stop communicating. We looked at the obvious, like field wiring, power, noise, etc and found nothing. It was then discovered that inputs, either on the discrete inputs, or certain RS232 comms or even analog inputs, occurring at a rate close to 100 Hz would kill the DeviceNet.
No obvious cause led to a month or so of code analysis. I was convinced that there was an interrupt sneaking in overwhelming the CPU. Tests and more tests confirmed that the CAN had among the highest interrupt priority, and no unknown or "sneak" interrupt were present. No matter what we changed, the sensitive frequency seemed to be around 96-103 Hz. Thinking that there was some strange code interaction when certain functions ran at the 100 Hz. Next step was to change the clock parameters to alter the main loop "scan" frequency. which is 500 Hz. We changed the divider and readjusted the CAN baud rate so that we had a scan of 250 Hz. Same problem, still at 100 Hz inputs. Another change in clock rate still resulted in the 100 Hz sensitivity.
Next took us down looking at the +5 volt switcher on the board. Looked at the frequency response of the power supply. Swept it, injected test signals, etc and found that 100 Hz injected into the power supply summing junction would replicate the problem. Looked like a real possibility, but response tests of the power supply resulted in no sensitivity, peaks or notches at 100 Hz. Seemed like another near dead end.
Finally into the evening one day, some circuit mods resulted in a breakthrough. It turns out that the PLL on the CPU seems to have a sensitivity to 100 Hz. The PLL is powered through a pin called VddSYN. This pin was coupled to Vdd through a simple R-C filter network. By increasing the C value and upping the R, I could reduce the 100 Hz sensitivity (but not eliminate it). There was a limit to how high the R value could be, since the VddSYN current draw would reduce the voltage to unacceptable levels. I performed a simple test using an FFT response analyzer. The card has inputs that can measure frequency using the CPU's counter/timers. These timers are clocked off of the CPU clock. We applied a known constant frequency to a frequency input using a high precision signal generator. This input was converted to an analog value and available on a D/A channel, using the CPU PWMs. Statically the analog output was a very constant value, corresponding to the input frequency. I then applied a frequency sweep from the FFT analyzer to an analog input, and monitored the D/A output with the analyzer. Sure enough, around 100 Hz input came rolling out of the D/A.
So what was happening was the CPU PLL was sensitive to supply variations at around 100 Hz. The analog input resulted in some CPU effort to occur at 100 Hz, which caused some +5 volt power supply wiggle at 100 Hz (around 5 mV). This power supply noise was also present on the VddSYN PLL power, resulting in the CPU clock frequency wobbling a bit at 100 Hz. Normally not an issue of any great concern, but the CAN baud rate was also varying, since it is derived internally from the CPU clock. Since the baud rate accuracy is relatively critical with CAN (at 1 mbaud), errors on the CAN messages would quickly kill the DeviceNet comms and result in the problem.
Took a long time to find, with hardware and software all coming under the microscope. The fix was relatively easy. We powered the VddSYN pin (PLL power) from a 5 volt reference supply on the card that is used for the A/D reference power. This completely resolved the issue with no ill effects on the A/D operation or accuracy. Although Motorola has a circuit suggestion for powering the VddSYN for "use in noisy environments", it was nothing more than a 2 pole R-C and did not fully resolve the problem.
This was my trickiest one to find.
Paul