Author Topic: did i finally find the need for a better debugger?  (Read 2842 times)

0 Members and 1 Guest are viewing this topic.

Offline JPorticiTopic starter

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
did i finally find the need for a better debugger?
« on: November 23, 2016, 07:44:52 pm »
Today i wasted many work hours tracing an issue (which is not solved yet but a workaround was found, need more testing though)

BACKSTORY:
I am in the process of rewriting my SENT TX emulation library for dsPIC because i needed it to be more flexible (and *faster*)
and while i left it to test in a
sensor w/ SENT -> SENT in -> dspic33ev256gm106 -> SwSENT out -> scope
scenario, it kept hanging at random times. After much looking i found out that the dedicated timer had the period register set to zero.. and of course the corresponding dedicated interrupt wouldn't be fired until a manual reboot.

The part of the code which changes the dedicated timer period register wasn't modified so probably something else in the program is messing with me (is a pointer screwing around somewhere it isn't allowed, even if boundary checks are implemented? or a nested interrupt modifying accumulators that aren't saved? are nested interrupts actually enabled? i don't remember at the moment. i'm exausted.)
and for now i implemented a workaround with a do-while loop that keeps rolling while PR is zero at every operation that modifies it, okay for the moment so i know that the micro is hanging there and i haven't solved the issue.
* a simillar problem was found doing multiplications which seems to return zero at random times even if the operands are both different than zero.

problem is, even if i stop the debugger when new PRx is zero (and it can never be so it's always a fault condition) i only can look at the call stack (pickit3) and while it can be useful at least for tracing back trap causes, having a list of at least the last 5-10-15 instructions could be a tremendous help.
 

Offline RogerRowland

  • Regular Contributor
  • *
  • Posts: 193
  • Country: gb
    • Personal web site
Re: did i finally find the need for a better debugger?
« Reply #1 on: November 24, 2016, 06:25:39 am »
Doesn't a data breakpoint get you what you need?

http://microchip.wikidot.com/mplabx:set-data-breakpoint
 

Offline JPorticiTopic starter

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
Re: did i finally find the need for a better debugger?
« Reply #2 on: November 24, 2016, 07:06:51 am »
but does it?
even if it stops when (in this case) PRx = 0 as far as i know i can't see the last N instructions excecuted (which is NOT looking at the disassembly file i believe)
 

Offline RogerRowland

  • Regular Contributor
  • *
  • Posts: 193
  • Country: gb
    • Personal web site
Re: did i finally find the need for a better debugger?
« Reply #3 on: November 24, 2016, 08:07:29 am »
Well, if it breaks at the instruction that wrote zero to the register, presumably that's your bug - bad pointer or whatever, per your earlier postulations. No need to see any further back is there?
 

Offline JPorticiTopic starter

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
Re: did i finally find the need for a better debugger?
« Reply #4 on: November 24, 2016, 08:29:16 am »
well the issue at hand seems to be a multiplication returning zero as one of the operand is a positive constant and the other is a variable always greater than zero (both are small enough to ensure that the result is at most 16 bit wide, no overflow) which means one of the accumulators was zero, as the operands were verified to be in the correct ranges. manually forcing the routine to proceed clears the problem for an indefinite numbers of cycles, having the instruction trace history would tell me exactly what happened with that multiplication
 

Offline RogerRowland

  • Regular Contributor
  • *
  • Posts: 193
  • Country: gb
    • Personal web site
Re: did i finally find the need for a better debugger?
« Reply #5 on: November 24, 2016, 08:37:25 am »
well the issue at hand seems to be a multiplication returning zero as one of the operand is a positive constant and the other is a variable always greater than zero (both are small enough to ensure that the result is at most 16 bit wide, no overflow) which means one of the accumulators was zero, as the operands were verified to be in the correct ranges. manually forcing the routine to proceed clears the problem for an indefinite numbers of cycles, having the instruction trace history would tell me exactly what happened with that multiplication

Sounds intriguing. Any chance you can post a snippet of the relevant code? More eyeballs on it could be useful.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: did i finally find the need for a better debugger?
« Reply #6 on: November 24, 2016, 11:35:18 am »
How about checking the output and use a printf (output to serial port) to print the input variables when the result is outside the range? If you format the output sensible then you could collect logging data you can read into Excel and perform some analysis on it.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Online enz

  • Regular Contributor
  • *
  • Posts: 134
  • Country: de
Re: did i finally find the need for a better debugger?
« Reply #7 on: November 24, 2016, 11:39:46 am »
Check your stack.
Weird problems like this often happen due to a stack overflow.
 

Offline JPorticiTopic starter

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
Re: did i finally find the need for a better debugger?
« Reply #8 on: November 24, 2016, 11:57:08 am »
after i put the failsafe do while loop for PRx the funny stuff happened here
Code: [Select]
  unsigned int SwSENT2_Ticks[8],SwSENT2_Tick, SwSENT2_Nibbles[8];
  ...
  for (i=0;i<8;i++) {
    SwSENT2_Ticks[i] = SwSENT2_Tick * (SwSENT2_Nibbles[i] + 12);
  }
  ...
(i is local variable)
Nibbles come from a previous passage where it's masked with 0xF so it's limited to 15. Tick is not too large, depends on Fp and a couple of prescalers... it around 50 in this specific program i think so zero is not an effect of overflow.

XC16, Opt 0
the disassembly pointed to the fact that the compiler was using almost only W0 and W1 which is asking for trouble. Entering an interrupt meanwhile could mess things up. it actually had in other projects where i relied too much on assembly.

I too have tought about a stack overflow especially because i have 600 bytes or so left of ram but when things like this happened something spectacular like an address error trap or Reserved Trap 7 was called (btw what the hell is that? neither support guy or users in the forum would tell me)

using printf? i wish. no pins left for second uart and present uart is already used by another tool (develop unit is attached to a testbench)

Anyway, looking at the disassembly made me think about a stack problem so i went immediately to disable nested interrupt and see if things changed. they did! about three hours without SwSENT ever locking up, definetly a big improvement. Removing the lockup/forcing the start of a new transmission is just a matter of setting a bit so i prepared some kind of a watchdog too. will be added once i'm sure i've solved this one

(but my OP was intended to be more about "what do i do when the basic debugger is not enough help? oh i wish i could have X")
 

Offline RogerRowland

  • Regular Contributor
  • *
  • Posts: 193
  • Country: gb
    • Personal web site
Re: did i finally find the need for a better debugger?
« Reply #9 on: November 24, 2016, 02:06:40 pm »
(but my OP was intended to be more about "what do i do when the basic debugger is not enough help? oh i wish i could have X")

Yes, understood and I concur. Embedded stuff is my hobby, I've been 40 years writing software on mainframes, minis and PCs, and the debugging tools on desktops rather spoil you when you try to do similar stuff with MPLAB X and a PICKit 3 (as do I).

The cost of something "better" like an ICD 3 or REAL ICE is a touch too much for a hobby, so I tend to fall back on the old methods I used before desktop debuggers were a thing - frequent function/module testing during development, extensive logging, etc. (which sometimes causes more problems, especially with time-critical code) - but also just toggling an spare output pin is a great help to trace the flow - I use different numbers of toggles to checkpoint sections of code and also use it for measuring function time to see if it meets expectations (often I've cocked up a timer setting or an interrupt priority or something and things run much slower than planned).

In your case, multiple nested interrupts and high risk of stack overflow gives you the worst of everything - those types of issues on desktops (i.e. multi-threaded, memory intensive tasks) are also difficult to debug because the cause is often far removed  - in code lines and time - from the observed error. Having said that, most of the PICs I've used (8-bit and 32-bit) allow you to trap stack issues with a forced reset and a status register on restart that at least tells you it happened.
« Last Edit: November 24, 2016, 03:03:53 pm by RogerRowland »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf