Author Topic: Beginner question: Do any microcontrollers have stack/heap collision detection?  (Read 4606 times)

0 Members and 1 Guest are viewing this topic.

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3698
  • Country: gb
  • Doing electronics since the 1960s...
That is what a watchdog is for - it is the last guard to prevent a hanging product.

In fact in many applications the watchdog tripping will never be noticed no matter how frequent :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
I think it's a really silly idea to let the stack and the heap grow into each other as OP suggests.

If you have a heap, it should be restricted to certain memory area, as opposed to growing indefinitely in one direction corrupting everything on its way. With restricted heap, if you run out of memory, or memory gets fragmented, nothing gets corrupted, but your next allocation fails, which you then can handle gracefully.

Waiting until the memory is corrupted and reset is a recipe for disaster. Once the memory is corrupted, the consequences are unpredictable. In the embedded application, where your chip controls hardware, if your program goes astray, something may get damaged. You should avoid memory corruption at all costs.

 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14480
  • Country: fr
I think it's a really silly idea to let the stack and the heap grow into each other as OP suggests.

If you have a heap, it should be restricted to certain memory area, as opposed to growing indefinitely in one direction corrupting everything on its way. With restricted heap, if you run out of memory, or memory gets fragmented, nothing gets corrupted, but your next allocation fails, which you then can handle gracefully.

Yes, and heap limit is relatively easy to enforce. I personally do this if I need heap allocation by implementing my own _sbrk() function. Heap start and end are defined in the linker script and taken from there. Allocations will fail if they exceed heap end. End of the story.

Of course, the other way around - preventing the stack from growing too much - is, OTOH, much harder to do, as discussed in this thread and on a regular basis.

Waiting until the memory is corrupted and reset is a recipe for disaster. Once the memory is corrupted, the consequences are unpredictable. In the embedded application, where your chip controls hardware, if your program goes astray, something may get damaged. You should avoid memory corruption at all costs.

That's for sure, even though you must always be prepared for the worst - which means here adding means of mitigating any memory corruption, at least that you can reasonably detect. If triggering a reset is the best you can do, then so be it. Still better - as long as the whole system is designed to accomodate spurious resets - than doing nothing, at least in most cases. Of course that must be decided on a system level. If firmware developers decide on their own that the CPU might reset arbitrarily, and the hardware team is not aware of this, things could get really bad.
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5911
  • Country: es
When having such issues, most of the answers over forums are "Don't use computer things on microcontrollers".

When you have few RAM KB, dynamic allocation starts making sense, allowing to run much larger programs, as often all tasks aren't running concurrently.

- Malloc should fail if you're trying to allocate too much.
- Most compilers have a static stack analyzer, showing the local cost for each function, and worst-case call cost (Ex. in a 15 function-depth call, each one adds its own stack).
   And that's what you should reserve to stay in the safe side, unless you know very well how the program flows.

« Last Edit: September 15, 2021, 07:18:10 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Online ajb

  • Super Contributor
  • ***
  • Posts: 2607
  • Country: us
Waiting until the memory is corrupted and reset is a recipe for disaster. Once the memory is corrupted, the consequences are unpredictable. In the embedded application, where your chip controls hardware, if your program goes astray, something may get damaged. You should avoid memory corruption at all costs.

That's for sure, even though you must always be prepared for the worst - which means here adding means of mitigating any memory corruption, at least that you can reasonably detect. If triggering a reset is the best you can do, then so be it. Still better - as long as the whole system is designed to accomodate spurious resets - than doing nothing, at least in most cases. Of course that must be decided on a system level. If firmware developers decide on their own that the CPU might reset arbitrarily, and the hardware team is not aware of this, things could get really bad.

For sure, fault handling has to be a system-level design conversation.  A big trap is that the method of reset in case of corruption or crash has to be sufficiently complete for the application.  In general, with a modern MCU you can probably expect that you have a good soft/hard reset capability that will reset all internal peripherals and put them into a state where your standard cold-start initialization will work.  However this requires reading the datasheet AND ERRATA carefully to verify, and there is no such guarantee for external peripherals.  With complex peripherals or multiple processors/FPGAs on board an external reset controller may be required, or you may need to spend significant effort to verify that each unit of the system will tolerate one of its neighbors resetting at an arbitrary time.  As a rule, initialization routines should be written defensively in that they assume as little as possible--preferably nothing--about the state of other parts of the system.  Any external peripherals (like displays or other IO) that have hardware reset lines should really have those tied to GPIO with an external pulldown or a global system reset line so you can guarantee their state every time the processor starts up.  Some parts (like IMUs, notoriously) may even require an external power switch if they don't have a proper external reset capability. 

The problem is that is that there is no room for heap to grow into so it all has to be static allocation or stack.
Then again i do think that is the way to do it for small MCUs since the stuff that runs on them tends to be reasonably simple and you probably want a tight grip on where the precious kilobytes of memory go.

It's pretty much always possible to write an application with exclusively static allocation, it's just a question of how much of a pain it is that way versus using dynamic allocation.  I would venture that the vast majority of applications running on, say, Cortex M4 or below these days, don't REALLY need dynamic allocation.  The state space of the applications that tend to be written on such devices should be sufficiently well defined that you either just don't have any need to create storage objects on the fly, or you can define ahead of time the worst case storage requirements and statically allocate space for that amount of data.  Possibly if a device has different operating modes with different data storage requirements you might not want to pay for enough RAM to hold all of that at once, so you might need to do some tricks there with overlapping static allocations (ie, unions, if you want to just use the language to solve the problem), but it's often not that hard to avoid dynamic allocation.  There are of course exceptions where it truly is the best way to approach an application.   
 

Offline ogden

  • Super Contributor
  • ***
  • Posts: 3731
  • Country: lv
Sorry if I repeat, but there's ancient method of adding small (4..16bytes) constant memory area filled with 0xDEADBEEF or similer, between stack and heap, then by adding data integrity checks of said area in main loop (but not limited to) usually was more than enough to catch stack/heap collision. Beauty of this method is that it works on any CPU/MCU.
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3698
  • Country: gb
  • Doing electronics since the 1960s...
"It's pretty much always possible to write an application with exclusively static allocation, it's just a question of how much of a pain it is that way versus using dynamic allocation.  I would venture that the vast majority of applications running on, say, Cortex M4 or below these days, don't REALLY need dynamic allocation. "

How very true. But it depends a lot on where you come from. If you have an asm background then what's a malloc() ? :)

"Sorry if I repeat, but there's ancient method of adding small (4..16bytes) constant memory area filled with 0xDEADBEEF or similer, between stack and heap, then by adding data integrity checks of said area in main loop (but not limited to) usually was more than enough to catch stack/heap collision. Beauty of this method is that it works on any CPU/MCU."

That's clever but it won't pick up where one does a malloc() which spans right across that piece of memory, and results in corruption either side of it. It's the same with protecting some memory address in hardware.
« Last Edit: September 16, 2021, 06:01:41 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ogden

  • Super Contributor
  • ***
  • Posts: 3731
  • Country: lv
"Sorry if I repeat, but there's ancient method of adding small (4..16bytes) constant memory area filled with 0xDEADBEEF or similer, between stack and heap, then by adding data integrity checks of said area in main loop (but not limited to) usually was more than enough to catch stack/heap collision. Beauty of this method is that it works on any CPU/MCU."

That's clever but it won't pick up where one does a malloc() which spans right across that piece of memory, and results in corruption either side of it. It's the same with protecting some memory address in hardware.

Good point. For that you simply write your special  debug malloc() which does memzero() as well.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
That's for sure, even though you must always be prepared for the worst - which means here adding means of mitigating any memory corruption, at least that you can reasonably detect.

Of course if you detect some sort of memory corruption you should reset. But, by the time you detect it, the damage may be already done. Your reset can only prevent future damages, not the ones that have already happened. Therefore, it's wise to design your program to minimize the probability of memory corruption, such as restrict your heap, or avoid using the heap whatsoever.
 

Offline Doctorandus_P

  • Super Contributor
  • ***
  • Posts: 3360
  • Country: nl
Too long, didn't read it all.

An old technique is called "stack paining", and works (in principle) quit simple.
During initialization of your uC you fill all memory with either a static values, or the output from a quasi random generator.

And then every once in a while you have an ISR (so you have all memory to your own) and in that ISR you walk through all RAM and examine it's current content, and you can get a quite good (but not 100% reliable) result of what memory locations have been used, and you can determine the distance between the top of the heap and the bottom of the stack. Once you have a stack collision, there's a good chance you will not reach your ISR anymore, and to guard against that some "safety margin" should be respected. If stack and heap get too close, you're in trouble.

But as others have already written. uC programs tend to be small, and often constructs as malloc are not even used at all, and no heap fragmentation can occur.

With small uC's often some static buffers are declared. If your total RAM need is bigger then physical RAM, then you can declare an array of Unions, and re-use the same RAM for different buffers (but obviously not at the same time).

Memory corruption in small uC's is often caused by beginner errors, such as static text strings not being declared as static, and then the compiler copying them from Flash to RAM during initialisation. Such text strings can fill up RAM quickly in a small uC.

If you have detected memory corruption in your ISR, you can't rely on the uC still working properly. So don't return from the ISR, and don't go calling any functions (which use the stack). Some static code (stored in Flash of course) to first re-initialize a UART by writing to it's registers directly (it's registers can be corrupted, so do not rely in it being initialized) and then pushing out some bytes or a text string is still **likely** to work.

« Last Edit: September 18, 2021, 10:27:28 pm by Doctorandus_P »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf