Author Topic: Problems with HardFault_Handler (Read 4423 times)

Yaro · « **on:** February 12, 2016, 11:12:30 am »

Hi all,

I've recently encountered problems with HardFault_Handler. I'm using ATSAM4S connected via JTAG. I've two boards based on same ATSAM4S model chip with same package, there is little differences in this two board schematics; just some changes in some pin connection(LEDs) and a MCU position on the board, other components are the same connected to same pins.
The first one board, that I'll call board_1, have no problems, code runs perfectly without any errors or fault handlers triggered.
The board_2, with exactly the same code, with just GPIO pinout adapted, trigger HardFault_Handler very frequently at different code points. Also BusFault and UsageFault Handler are active but only HardFault triggers. I've checked that HardFault_Handler not depend on which point of code, some times it's called by a I2C istruction, some times by a variable set or a function call. I've found this performing several code runnings.
Since the code is the same on the two boards, I've tried a very simple code on board_2, like a LED blink, and running it whitout debugger connected, code runs with this problems:
- sometimes freeze and after very short times recovers(restart blink);
- sometimes freeze without recover(or after a long time);
- sometimes at random MCU reboots(found this blinking another led only at boot time);

After this test I'm quite sure(if no one denies me), that it's a hardware problem.

But I can't figure out if the problem is:
- PCB or schematics board fault;
- MCU is flawed;
- MCU was damaged during reflow;

I was quite sure that is MCU fault, but since MCU seems to programs without problems, and runs code(with problems), what it can be? Any supposition are appreciated and I'll check. I prefer to try to solve this problem(or be sure that is a MCU fault) before assembly another board or substitute MCU.

Consider that I'm not used to this problems and I don't know how much probability there may be that MCU is flawed or damaged. Also, since schematics are the same, this may be another component problem or a short, or power component with problems. But it power up MCU and problems are visible every some seconds. Any advice are appreciated.

Thank you!

dannyf · « **Reply #1 on:** February 12, 2016, 12:11:16 pm »

You can save a set of variables that point to the code that generated that fault condition. Check the arm programmer's manual.

Godzil · « **Reply #2 on:** February 12, 2016, 04:32:44 pm »

Does your MCU use external memory?
If yes it would be the main culprit that the track layout for the memory is faulty.

Brutte · « **Reply #3 on:** February 12, 2016, 05:53:05 pm »

Quote from: Yaro on February 12, 2016, 11:12:30 am

Hi all, I've recently encountered problems with HardFault_Handler.

HardFault might be raised either because a pesky event directly escalates to HardFault (like for example hitting asm("bkpt...") with debugging disabled), or as a derived exception (like for example trying to div by zero with UsageFault handler disabled). These are two different animals.
So which one is it?

Quote

Also BusFault and UsageFault Handler are active but only HardFault triggers

If you say so.

Quote

Since the code is the same on the two boards, I've tried a very simple code on board_2, like a LED blink, and running it whitout debugger connected, code runs with this problems:

Are you aware that hitting asm("bkpt ... ") or triggering any other debugging event with debugging/DebugMon Hanlder disabled escalates to HardFault? You want without hard faults, you have to remove any 0xbe** code, minimum. BTW, 0xbeab is used by semihosting.

Mind asm(bkpt) is always executed unconditionally in IT so even this might not fool a HardFault:

Code: [Select]

volatile int x=0;
if(x){
asm("bkpt 0x00");
}

Yaro · « **Reply #4 on:** February 12, 2016, 06:47:38 pm »

Quote from: Godzil on February 12, 2016, 04:32:44 pm

Does your MCU use external memory?
If yes it would be the main culprit that the track layout for the memory is faulty.

No, any external memory. The board is very simple, some MOSFETS to command LEDs, a clock source, VR, a Sensor on I2C bus and nothing else. I've checked different times Sensor readings and I2C works fine, checked clock source with a digital analizer and seems stable. Changed power supplies and I still have same errors.

I've checked call stack to see where it crashes but it crash very randomly, not to a particular function, sometimes at ADC call, sometimes to another peripherical, some times when I initialize values of a vector.

I've noticied this, when code is running without debugger plugged, as for simple blink code, it reboots randomly probably becouse reach some fault.

I don't have any debugging event since it's handled by Atmel IDE and ASF.

I'm more and more convinced that is a HW fault. But I don't know how to find a proof that is this, since a more complex code give me HardFault when debugger is active and restart continuously without debugger, and a simply code desn't give me HardFault(just LEDs have some glitches when blinking) but it restart when debugger is not plugged(not at the frequency of complex code).

Brutte · « **Reply #5 on:** February 12, 2016, 07:53:31 pm »

Quote from: Yaro on February 12, 2016, 06:47:38 pm

since a more complex code give me HardFault when debugger is active

Ok, so the core is halted on entry to HardFault handler.
Now what is the value of HFSR and CFSR? Those indicate the cause of that fault.

Yaro · « **Reply #6 on:** February 13, 2016, 12:05:08 pm »

Quote from: Brutte on February 12, 2016, 07:53:31 pm

Quote from: Yaro on February 12, 2016, 06:47:38 pm
since a more complex code give me HardFault when debugger is active
Ok, so the core is halted on entry to HardFault handler.
Now what is the value of HFSR and CFSR? Those indicate the cause of that fault.

I've this values:
Address 0xE000ED2C HFSR - 0x40000000
Address 0xE000ED28 CFSR - 0x00008200

But I've also recorded all Fault Register Values several times, this is values I get:

stacked_r0   0x60000001      0x60000001      0x404086f1      0x404086f1      0x64000000      0x00403ee9
stacked_r1   0xfffffff9        0xfffffff9        0xfffffff9        0xfffffff9        0xfffffff9        0xfffffff9
stacked_r2   0x00045c13      0x00045c13      0xffffffff        0xffffffff        0x20001142      0x40460e00
stacked_r3   0x00000000      0x00000000      0x41600000      0x41600000      0x00000000      0x10000000
stacked_r12   0x0000f00d      0x0000f00d      0x0000f00d      0x0000f00d      0x64000004      0x00000001
stacked_lr   0x60000001      0x60000001      0x404086f1      0x404086f1      0x64000000      0x00403ee9
stacked_pc   0x00000001      0x00000001      0x79ee6ed6      0x7d759044      0xfffffff8        0x20001600
stacked_psr   0x004011e5      0x004011e5      0x00401369      0x00401335      0x00404b97      0x0040405d
_CFSR     0x00000100      0x00000100      0x00000100      0x00000100      0x00000400      0x00000400
_HFSR     0x40000000      0x40000000      0x40000000      0x40000000      0x40000000      0x40000000
_DFSR     0x00000000      0x00000000      0x00000000      0x00000000      0x00000000      0x00000000
_AFSR     0x00000000      0x00000000      0x00000000      0x00000000      0x00000000      0x00000000
_MMAR     0xe000ed34      0xe000ed34      0xe000ed34      0xe000ed34      0xe000ed34      0xe000ed34
_BFAR     0xe000ed38      0xe000ed38      0xe000ed38      0xe000ed38      0xe000ed38      0xe000ed38

1° and 2° was an ADC initialization call, 3° and 4° at same point when calling I2C read, 5° when writing an array(not related to any peripherals), 6° related to GPIO pins.

Errors triggers on different point, some time on the same, but most of time at random.

andersm · « **Reply #7 on:** February 13, 2016, 12:41:19 pm »

Did you record the value of the stack pointer register (r13)? The stacked PC values are all invalid (instruction zero, or pointing to an even address), and only in the last case does the stacked link register point into code space. In all cases, r0 and lr also hold the same values, which is improbable. This is the kind of garbage data I'd expect to see when the stack gets trashed.

dannyf · « **Reply #8 on:** February 13, 2016, 12:49:57 pm »

Many times accessing a pheriphal without enabling clock to it will trigger a hard fault.

Use swv / itm will help you.

Yaro · « **Reply #9 on:** February 13, 2016, 01:48:43 pm »

Quote from: andersm on February 13, 2016, 12:41:19 pm

Did you record the value of the stack pointer register (r13)? The stacked PC values are all invalid (instruction zero, or pointing to an even address), and only in the last case does the stacked link register point into code space. In all cases, r0 and lr also hold the same values, which is improbable. This is the kind of garbage data I'd expect to see when the stack gets trashed.

I can try to add, I used a guide to create a debug handler. You're telling about r0 and lr that I get values in wrong way or it's not a common error to see? I can catch more times and provide results.

Quote from: dannyf on February 13, 2016, 12:49:57 pm

Many times accessing a pheriphal without enabling clock to it will trigger a hard fault.

Use swv / itm will help you.

I've used same code for peripherals as used in board_1. I may check if there some delays or wait time not respected when enabling a peripheral but since it works on board_1 it's hard error is here, also this peripheral initialization codes that I've used are provided by Atmel in their Applications Notes.
Consider that I'm using all code(for system and peripherals) autocreated by ASF in Atmel studio and since board_1 and board_2 have same code, same MCU and same peripherals pinout, that looks very strange to me that first board working and second not.

I've checked some comments and found out that when errors are so random they are probably related to HW, PCB or MCU faults but I've also read it's hard that is a MCU fault since they are well checked before going to sell(or maybe just samples are tested). But as I've tried I've found that this MCU on board_2, unlike board_1, created me problems for start like wrong device ID, errors with SWD, security bit setted when I didn't set it or not recognizing device, also with very simple code it have glitches and continuously reboots after some second of activity. Thats make me think that is a MCU or PCB fault.
I'll check better PCB and it's connection. And maybe will substitute MCU.
Any advice about where to check on PCB errors that may be related to reboot or HardFault after some Sec and mS? Maybe some power line, or power sequence not respected, wrong capacitors, shorts ecc..?

andersm · « **Reply #10 on:** February 13, 2016, 04:09:38 pm »

Quote from: Yaro on February 13, 2016, 01:48:43 pm

You're telling about r0 and lr that I get values in wrong way or it's not a common error to see? I can catch more times and provide results.

The link register stores the function return address, and r0 is used to pass the first function parameter. It would be unusual for them to consistently hold the same value, especially when the value does not look like a valid return address. The stack could be trashed, you could be looking at the wrong stack pointer, or your handler may be buggy.

In all your cases, the "forced" bit of HFSR is set, indicating that another fault has been escalated to a hard fault. The CFSR value 0x100 indicates an instruction access error, and 0x400 indicates a data access error. Install handlers for the other faults so you can capture as much context as possible.

Quote

I've checked some comments and found out that when errors are so random they are probably related to HW, PCB or MCU faults but I've also read it's hard that is a MCU fault since they are well checked before going to sell(or maybe just samples are tested).

MCU hardware errors are possible, but far less likely than software bugs. Buffer overruns or pointer errors can cause symptoms that look essentially random, especially if interrupts are involved. Of course you should check your power supply and ground connections, and also enable brown-out detection and run the MCU from the internal oscillator.

In your first post you indicated that the two boards aren't actually running the same software ("with just GPIO pinout adapted"). Can you reproduce the problem if you program both boards with the exact same binary?


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Problems with HardFault_Handler (Read 4423 times)

Yaro

Problems with HardFault_Handler

dannyf

Re: Problems with HardFault_Handler

Godzil

Re: Problems with HardFault_Handler

Brutte

Re: Problems with HardFault_Handler

Yaro

Re: Problems with HardFault_Handler

Brutte

Re: Problems with HardFault_Handler

Yaro

Re: Problems with HardFault_Handler

andersm

Re: Problems with HardFault_Handler

dannyf

Re: Problems with HardFault_Handler

Yaro

Re: Problems with HardFault_Handler

andersm

Re: Problems with HardFault_Handler

Share me