Author Topic: PIC16F18856...small programs are less noise susceptible?  (Read 2615 times)

0 Members and 1 Guest are viewing this topic.

Offline ocsetTopic starter

  • Super Contributor
  • ***
  • Posts: 1516
  • Country: 00
PIC16F18856...small programs are less noise susceptible?
« on: October 15, 2017, 01:49:00 pm »
Hello,
Our remote software engineer recently wrote some code for our new DALI dimmable lamp.
It’s just a simple dimmable lamp, however, due to the use of the DALI protocol, the code is enormous. When I loaded his code, and then turned the lamp on at full power, the lamp lit for several minutes and then for no reason, just turned off and stayed off. This should not have happened.
 :scared:

Every time it turned off, I kept recycling the power to bring it back on again, but it kept turning off, again, randomly after a few minutes or so.  :(  :palm:

Anyway, we told the software engineer what was happening and he said that his code was fine, and that the turning off must be due to our “noisy” hardware crashing his good software.  :bullshit:
Anyway, I then wrote some very very simple code for the lamp, -code that simply turned the lamp on at full power and leaves it on. With my simple code running, the lamp correctly stayed running at full power and did not turn off.  :clap:
I also wrote several other similar simple programs, each one to demonstrate that all of our lamp hardware was working fine…..for example, I wrote one code which simply did PWM dimming at 50%, to prove that that would correctly dim the lamp…and it did.  :clap:

We then replied back to the software engineer and informed him that our own simple code was running fine on the lamp…no errant turning off and no problems. We informed him that if our hardware was “noisy” and thereby crashing his software, then why was the “noisy” hardware not crashing our own simple software(?).  :-//
-Anyway, he replied that our simple software test programs were much shorter than his own software…and he told us that short software programs are less susceptible to noise corruption. Is this true?  :-//
The microcontroller is PIC16F18856.
 8)
 

Offline Kalvin

  • Super Contributor
  • ***
  • Posts: 2145
  • Country: fi
  • Embedded SW/HW.
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #1 on: October 15, 2017, 01:57:57 pm »
Take a stopwatch and measure the times. Is there any systematic time after which the device will fail operate properly? Possibly a watchdog or a timer resetting the device or crashing the software? Have you checked the power supply with the oscilloscope for any noise issues? Have you provided the SW guy with your actual hardware so that he can test his code with the actual hardware? If his code is working with the hardware you provided with, but fails when you test it with your hardware, then there is something wrong with the setup.
 
The following users thanked this post: ocset

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #2 on: October 15, 2017, 02:08:50 pm »
You keep posting about this "software engineer" who's not really a software engineer, but simply some sort of incompetent person.  Problem is that you believe that this incompetent person can somehow write a good code for you. This is absolutely, 100% impossible. By continuing on this road you simply put your good money after bad and you're unlikely to gain anything. So, stop wasting your time and do something about it.

You have two ways. Either summon money to hire someone better. Or bite a bullet and write it by yourself. DALI protocol is simple and doesn't require a big code. If the code is big, your "software engineer" might have just used some sort of library which he doesn't understand and cannot fix.
 
The following users thanked this post: JPortici, ocset

Offline woody

  • Frequent Contributor
  • **
  • Posts: 291
  • Country: nl
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #3 on: October 15, 2017, 02:46:56 pm »
To answer your question: no. Longer programs are not more susceptible to noise corruption. Longer programs are more susceptible to programming errors.
 
The following users thanked this post: ocset

Online hans

  • Super Contributor
  • ***
  • Posts: 1639
  • Country: nl
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #4 on: October 15, 2017, 03:08:39 pm »
How do you dim the lamp? Using PWM? If so, write a test program that utilizes several PWM frequencies and/or duty cycles. A full on and full off test could mean that the least amount of noise is generated, because there are no edges that will switch the load on/off.

Second question I would have; does the microcontroller fully reset itself? Is there any way you could diagnose this by e.g. having an onboard LED indicator (attached to maybe a spare pin) flash briefly whenever the firmware boots up?

A firmware reboot could be caused by a bug (I think most PIC16s reset themselves on e.g. a hardware stack overflow). Larger programs have more bugs, statistically speaking.
However if you would see reboots with your own firmware (e.g. after PWM test), then that points to a hardware problem.

Note that the job of firmware is to not only function correctly (happy path testing), but also function well in abnormal inputs. I'm totally unfamiliar with the protocols you're using, but e.g. I expect a protocol parser not to crash/hang/become unresponsive whenever you send a totally different or corrupted packet.
Coming up with counter examples for these kinds of things is a bit like where does the wind come from today. So perhaps testing that your hardware is not at fault is the most straight forward way to go in this.
 
The following users thanked this post: ocset

Offline Kjelt

  • Super Contributor
  • ***
  • Posts: 6460
  • Country: nl
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #5 on: October 15, 2017, 03:36:08 pm »
Hello,
Our remote software engineer recently wrote some code for our new DALI dimmable lamp.
It’s just a simple dimmable lamp, however, due to the use of the DALI protocol, the code is enormous.
When I loaded his code, and then turned the lamp on at full power, the lamp lit for several minutes and then for no reason, just turned off and stayed off. This should not have happened.  :scared:
1) You have the source code? That is good! If not, get it ASAP!

2) What do you mean with loaded his code, did you put it in debug mode or what? You should flash his compiled code in the uC and run it autonomously without debugger etc. the least chance of problems. In debug mode a lot of things can go wrong since it uses more RAM for instance, breakpoints etc. etc.

3) What device did you use to test the DALI protocol, if it is some kind of DALI master controller monitor the output of this controller and make sure that this device is not responsible for issuing an extra off command or whatever after a timeout occurred or sw timer expired. Also test your hardware with other DALI commands that might occur like the commissioning commands which it should accept even as group commands. But other device commands it can not process like colour etc. it should ignore those.

If this does not help and the lamp still turns off:
4) Ask the SW engineer to demonstrate the working of his code. A good SW engineer will test his code and can demonstrate the correct operation of the code. It is not uncommon to test software for a few hours to make sure everything works ok.

Another remark did you gave the SW engineer a proper HSI document so he knows what to expect and how to write the code to match the hardware? Did he have the hardware to test his code on?

Last remark please get some knowledge of software yourself or some other colleague in your company, also from your previous post it sounds like you only know HW and look at SW as something magical.
These days you can not do something without knowing the basics of the entire system, that includes DALI protocol in detail, interaction of DALI devices, SW (firmware) and HW.
 
The following users thanked this post: ocset

Offline Kjelt

  • Super Contributor
  • ***
  • Posts: 6460
  • Country: nl
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #6 on: October 15, 2017, 03:40:31 pm »
DALI protocol is simple and doesn't require a big code.
Have you ever written a DALI implementation of a lampdriver ? You're looking at 4k to 8k compiled code if you make it complete and the testing to the test specifications takes a lot of time to get it correct.
It is not just listening to a broadcast on/off/dim command. It is also the commisioning (addressing) phase, the protocol what to do if two drivers have choosen the same address, responding to query's , ignoring commands youre device does not support. It is pretty simple if you done it multiple times and have a library but from scratch could take a while to get it right and complete.
Also don't forget you're device has to operate succesfully in a standardized protocol way of working with possible 20 different brands of controllers and other drivers.
 
The following users thanked this post: ocset

Offline donotdespisethesnake

  • Super Contributor
  • ***
  • Posts: 1093
  • Country: gb
  • Embedded stuff
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #7 on: October 15, 2017, 03:51:28 pm »
-Anyway, he replied that our simple software test programs were much shorter than his own software…and he told us that short software programs are less susceptible to noise corruption. Is this true? 

Ah, now it is clear he is bullshitting you. It is theoretically possible, but quite unlikely, that some feature of longer code could be more susceptible to noise, but without any hard evidence of that it would be last on my list of things to consider.

Even if it was possible, I would expect a worthwhile engineer to say "let me test the software with your hardware setup and try to reproduce the problem. Failing that, I will make an on-site visit". I'm afraid to say this guy is not worth dealing with.

I guess it is relative but, DALI is not really enormous, quite small I would say. Some embedded systems have code running into millions of lines.
Bob
"All you said is just a bunch of opinions."
 
The following users thanked this post: ocset

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #8 on: October 18, 2017, 06:22:50 pm »
That it fails after a time makes me want to think about memory leaks.  Sometimes data is allocated on the heap and when it is no longer required, the allocation is recovered and reissued.  The problem comes up when it isn't properly released.

Some heap managers don'd do garbage collection and some others don't coalesce contiguous un-allocated blocks.  In other words, these heap managers are trash.

GCC uses the heap for string functions and for embedded systems, heaps are a problem.  It is far better to write your own string functions by copying the code out of Kernighan & Ritchie "The C Programming Language".

The GCC library does clean up after itself.  The problem comes up when programmers allocate memory and forget to release it.  The heap grows without bound end eventually clobbers the stack.  It takes time for this to occur.

 
The following users thanked this post: ocset

Offline kalel

  • Frequent Contributor
  • **
  • Posts: 880
  • Country: 00
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #9 on: October 18, 2017, 06:29:36 pm »
That it fails after a time makes me want to think about memory leaks.  Sometimes data is allocated on the heap and when it is no longer required, the allocation is recovered and reissued.  The problem comes up when it isn't properly released.

Some heap managers don'd do garbage collection and some others don't coalesce contiguous un-allocated blocks.  In other words, these heap managers are trash.

GCC uses the heap for string functions and for embedded systems, heaps are a problem.  It is far better to write your own string functions by copying the code out of Kernighan & Ritchie "The C Programming Language".

The GCC library does clean up after itself.  The problem comes up when programmers allocate memory and forget to release it.  The heap grows without bound end eventually clobbers the stack.  It takes time for this to occur.

I don't see it as much anymore, but I remember a lot of programs returning all kinds of overflow errors on Windows. Usually not big commercial stuff but small utilities, but it might have happened with the commercial programs as well.
 
The following users thanked this post: ocset

Offline woody

  • Frequent Contributor
  • **
  • Posts: 291
  • Country: nl
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #10 on: October 18, 2017, 08:34:49 pm »
Well, looking back at my own mistakes in this field, my guess would be that heaps have little to do with errors like this. More likely a counter overflows somewhere, indexes point beyond the end of an array or a 16 bit counter is tested in the main code while it is   increased in an interrupt. There are numerous ways to make a PIC misbehave that have nothing to do with the compiler and everything with the person using that compiler  ;D

But with so little real data on the problem we're all flying blind.
 
The following users thanked this post: ocset

Online hans

  • Super Contributor
  • ***
  • Posts: 1639
  • Country: nl
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #11 on: October 18, 2017, 08:52:02 pm »
Note that this is a PIC16 device.

XC8 is not GCC, XC8 does not support heap.

Quote
5.5.7 Dynamic Memory Allocation
Dynamic memory allocation, (heap-based allocation using malloc, etc.) is not supported on any 8-bit device. This is due to the limited amount of data memory, and that this memory is banked. The wasteful nature of dynamic memory allocation does not  suit itself to the 8-bit PIC device architectures.
Just guessing XC8 is being used, but AFAIK other compilers don't support heap on PIC16 neither.

Could very well be one of the errors woody mentions.

Or another race condition between ISR/main code and somekind of code that keeps hanging in while(). And if the programmer was responsible, a watchdog timer would be set, run out and reset the controller
 
The following users thanked this post: ocset

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #12 on: October 18, 2017, 08:56:22 pm »
Note that this is a PIC16 device.

XC8 is not GCC, XC8 does not support heap.

I forgot we were talking about a PIC 16F.  My bad...
 
The following users thanked this post: ocset

Offline MarkF

  • Super Contributor
  • ***
  • Posts: 2548
  • Country: us
Re: PIC16F18856...small programs are less noise susceptible?
« Reply #13 on: October 20, 2017, 11:45:18 am »
"Anyway, I then wrote some very very simple code for the lamp, -code that simply turned the lamp on at full power and leaves it on. With my simple code running, the lamp correctly stayed running at full power and did not turn off."

..."for example, I wrote one code which simply did PWM dimming at 50%, to prove that that would correctly dim the lamp…and it did."

..."We informed him that if our hardware was “noisy” and thereby crashing his software, then why was the “noisy” hardware not crashing our own simple software(?)."
He can generate a lot of noise just by writing to the hardware at needlessly high update rates.  That said, you should be able to determine this with your small test programs by trying various update rates and update rate patterns and then looking at the results on a scope to see how much noise you're creating.

You could also check with a scope how often his software actually is updating the hardware. (Periodic, just when needed, or some thing else).  Then see if you can reproduce the failure by duplicating his update rates with your small test programs.

You should also check that your test programs are selecting the same MCU clock as his program.  Or at least run at the highest clock speed while trying to reproduce the failure.

Quote
"Anyway, he replied that our simple software test programs were much shorter than his own software…and he told us that short software programs are less susceptible to noise corruption."
Back in the day (30+ years ago), a computer manufacture told us that our software was burning out their memory because our execution loops were too small.

Shorter programs are less susceptible to programming errors.  I guess it is possible to cause an erroneous interrupt from noise.  You need a NEW software person or at least get him on-site.


I'm an Elec Engr with 35 years of software development experience.  Rule of thumb-- "ALWAYS blame the hardware while fixing the code".   ;)
 
The following users thanked this post: ocset


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf