any valid data below SP will be corrupted the very same instant an IRQ arrives launching the stacking operation.
the stack will be at the bottom of memory, with a fixed maximum size and growing downwards. The heap is above it, also with a fixed maximum size and growing upwards. When the stack reaches its maximum size (SP goes below the 1st address of available RAM), an error fires.
caddr_t _sbrk(int incr)
{
extern char end asm("end");
extern char top asm("_top");
static char *heap_end;
char *prev_heap_end;
// This test is never met since heap_end is never going to be = 0
// "end" = 0x2000edf8 (see linkfile)
if (heap_end == 0)
heap_end = &end;
prev_heap_end = heap_end;
// top = 2001e000 (top of 128k RAM minus 8k stack)
if (heap_end + incr > &top)
{
// write(1, "Heap and stack collision\n", 25);
// abort();
errno = ENOMEM;
return (caddr_t) -1;
}
heap_end += incr;
return (caddr_t) prev_heap_end;
}
Thus, advise that says it is not a good idea to use dynamic memory management on a microcontroller with very limited amount of RAM, stems from practical reasons and human stuff, and not technical details. You could say very concisely that "most often, it is not cost-effective to use dynamic memory management on a microcontroller in C". It does not mean it cannot be done, just that the effort needed to make it work is usually too much, compared to the rewards.
I look at things another way.
There are tasks which inherently require dynamic memory management, based purely on the nature of their interface with the outside world.
For example, suppose we have a specification for a device that has a communication channel -- UART, WIFI, I2C ... it doesn't matter -- which sits in an event loop accepting the following messages (I will give it in text, but it could just as well be a binary format):Code: [Select]PUT key "value"
GET key
DEL key
"value" can have any length, at least up to some fairly high maximum .. it doesn't matter in principle whether this is a few hundred bytes or a few GB.
key might restricted to something fairly short. Let's say 32 characters.
PUT associates the value with the key for later retrieval by GET. DEL forgets the value associated with key.
You simply *can't* use static memory allocation for this unless you are prepared to limit the values to relatively small ones AND limit the total number of keys to your storage size divided by the maximum size of a value. Which, assuming that the average value is much smaller than the maximum possible one, is very inefficient.That leads me to these questions:
1) What else are you going to use the memory for?
2) How are you going to guarantee that there is always enough memory to handle the longest value?
3) Do you allow the system to offer limited functionality at times where memory is low?
The ARM uses the 2nd stack for ISR if allowed to do so
If in mbed (for some reason I think you are working with mbed now, maybe I'm confused) you can switch to the minimal-printf variant, not using malloc.
point 5.2.1.1 of the ARM EABI clearly states that a process may only access data placed at SP or above, not below SP:
I do not know of any bullet-proof method to check for heap-stack collisions
Quotepoint 5.2.1.1 of the ARM EABI clearly states that a process may only access data placed at SP or above, not below SP:
Yes; you should never store data below the stack, because (in any "normal" system) interrupts will corrupt it. So local variables in a function are stored on a stack frame which sits above the SP.
int foo(){
int a[10];
return a[3];
}
foo():
mov eax, DWORD PTR [rsp-44]
ret
.L.foo():
lwa 3,-36(1)
blr
int foo(){
int a[10];
a[0] = 0; a[1] = 1;
for (int i=2; i<10; ++i)
a[i] = a[i-1] + a[i-2];
return a[7];
}
SP is not adjusted and local variable a[] is stored below SP.
Cortex-M has 2 stack pointers. One (the Main Stack Pointer, MSP) is always used in handler mode. The other (Process Stack Pointer, PSP) may be used in thread mode. Whether thread mode uses MSP or PSP is controlled by the SPSEL bit in register CONTROL, setting the bit enables usage of PSP in thread mode. The default situation after a reset is for this bit to be cleared, so the default behavior is to have a single stack.
/**
\brief Union type to access the Control Registers (CONTROL).
*/
typedef union
{
struct
{
uint32_t nPRIV:1; /*!< bit: 0 Execution privilege in Thread mode */
uint32_t SPSEL:1; /*!< bit: 1 Stack to be used */
uint32_t FPCA:1; /*!< bit: 2 FP extension active flag */
uint32_t _reserved0:29; /*!< bit: 3..31 Reserved */
} b; /*!< Structure used for bit access */
uint32_t w; /*!< Type used for word access */
} CONTROL_Type;
QuoteSP is not adjusted and local variable a[] is stored below SP.
I am not as clever as you but I can't see this. The whole array a[10] is stored on the stack of that function,
and SP points at the return address - otherwise the function could not return to the caller.
On x86 yes. On PowerPC the return address is not in RAM but in a register.
static void prvPortStartFirstTask( void )
{
/* Start the first task. This also clears the bit that indicates the FPU is
in use in case the FPU was used before the scheduler was started - which
would otherwise result in the unnecessary leaving of space in the SVC stack
for lazy saving of FPU registers. */
__asm volatile(
" ldr r0, =0xE000ED08 \n" /* Use the NVIC offset register to locate the stack. */
" ldr r0, [r0] \n"
" ldr r0, [r0] \n"
" msr msp, r0 \n" /* Set the msp back to the start of the stack. */
" mov r0, #0 \n" /* Clear the bit that indicates the FPU is in use, see comment above. */
" msr control, r0 \n"
" cpsie i \n" /* Globally enable interrupts. */
" cpsie f \n"
" dsb \n"
" isb \n"
" svc 0 \n" /* System call to start first task. */
" nop \n"
);
}
void vPortSVCHandler( void )
{
__asm volatile (
" ldr r3, pxCurrentTCBConst2 \n" /* Restore the context. */
" ldr r1, [r3] \n" /* Use pxCurrentTCBConst to get the pxCurrentTCB address. */
" ldr r0, [r1] \n" /* The first item in pxCurrentTCB is the task top of stack. */
" ldmia r0!, {r4-r11, r14} \n" /* Pop the registers that are not automatically saved on exception entry and the critical nesting count. */
" msr psp, r0 \n" /* Restore the task stack pointer. */
" isb \n"
" mov r0, #0 \n"
" msr basepri, r0 \n"
" bx r14 \n"
" \n"
" .align 4 \n"
"pxCurrentTCBConst2: .word pxCurrentTCB \n"
);
}
QuoteOn x86 yes. On PowerPC the return address is not in RAM but in a register.
Learn something every day
That cannot possibly work with interrupts unless the ISR auto-switches to a different stack.
What puzzles me, if anything, is why the RISC-V ABI didn't also adopt this common pattern of a "Red Zone" on the stack.
it is pretty obvious that the heap is pretty useless in an RTOS environment. Even if you mutex malloc() and free() your system will eventually blow up due to fragmentation.
So the heap can be used from only one RTOS task, or perhaps from others if you can make sure there is no time overlap and the allocated memory is never freed.
For instance a network stack might have a fixed memory allocated for the whole stack and dynamically allocate packet buffers from it.
You can use power of two allocators with pools for each size to limit fragmentation.
Apart from the remaining 32F417 + FreeRTOS mystery of where SPSEL=1 gets set
void vPortSVCHandler( void )
{
__asm volatile (
" ldr r3, pxCurrentTCBConst2 \n"/* Restore the context. */
" ldr r1, [r3] \n"/* Use pxCurrentTCBConst to get the pxCurrentTCB address. */
" ldr r0, [r1] \n"/* The first item in pxCurrentTCB is the task top of stack. */
" ldmia r0!, {r4-r11, r14} \n"/* Pop the registers that are not automatically saved on exception entry and the critical nesting count. */
" msr psp, r0 \n"/* Restore the task stack pointer. */
" isb \n"
" mov r0, #0 \n"
" msr basepri, r0 \n"
" bx r14 \n"
" \n"
" .align 4 \n"
"pxCurrentTCBConst2: .word pxCurrentTCB \n"
);
}
What puzzles me, if anything, is why the RISC-V ABI didn't also adopt this common pattern of a "Red Zone" on the stack.
Maybe to save a precious 128 bytes of memory , keeping in mind embedded use cases?
it is pretty obvious that the heap is pretty useless in an RTOS environment. Even if you mutex malloc() and free() your system will eventually blow up due to fragmentation.
It's neither obvious nor true, and there are several ways to deal with fragmentation. The most generic is to use memory handles which was very common in systems without virtual memory. Pool allocators are useful for transaction processing where you might need to allocate several small pieces of memory for a task but then free them all a once when the task is complete. You can use power of two allocators with pools for each size to limit fragmentation.
QuoteYou can use power of two allocators with pools for each size to limit fragmentation.
If you do a malloc() of say 2048 bytes, doesn't it grab slightly more than 2048 (due to it being a linked list)? What units does one have to run to avoid fragmentation?
QuoteOn x86 yes. On PowerPC the return address is not in RAM but in a register.
Learn something every day
That cannot possibly work with interrupts unless the ISR auto-switches to a different stack.
Sure it can, and does.
I don't know what they do with x86_64, but with PowerPC, Itanium and some others the interrupt routine simply decrements the stack pointer by 128 bytes before starting to push the return address and processor status and other registers.
Competent people writing things such as power of two allocators store metadata for the blocks somewhere else.
QuoteCompetent people writing things such as power of two allocators store metadata for the blocks somewhere else.
That means the malloc() function needs an array allocated and that sets a limit on how many blocks can be in use. Or does it use its own allocation mechanism for the array, growing it as necessary?