The main thing is you cannot have "real" dynamic memory on MCU or any processor without MMU
Sooner or later it will ever end up with fragmentation with no way to solve it
Blocks must have fixed locations or be every time created and released in a fixed order
Absolutely incorrect. Even (older versions of) the Linux kernel can run MMU-less. A memory management unit can provide virtual memory and/or limit access to regions of memory, but is in no way necessary for dynamic memory management. In particular, see
uClibc-ng, a standard C library implementation for MMU-less processors.
Furthermore, indirect pointers can be used to avoid memory fragmentation. Here is an example implementation in C.
First, we have an indirect pointer structure:
struct iptr {
uint32_t offset;
uint32_t length;
int16_t used;
int16_t refs;
}
We could use a direct pointer instead of offset, but if we use an offset to the dynamic memory region, this can be used in fully featured C code running under and actual OS (where the region can shrink and grow, requesting more memory from the OS, and returning unneeded memory back to the OS), as well as on a microcontroller.
The
length field is optional, and is useful if the indirect pointer is accessed via a helper function (say
get_u32(id, offset, errorval)) to eliminate buffer under- and overruns; the
errorval being returned if access is attempted outside the buffer (so that no "errors" can ever occur). The
id identifies which
struct iptr is being used, but more on that further below.
The
refs counter is a simple reference counter, and starts at 1 when the memory is allocated. However, if these objects are used in a list or a tree or in other ways refer to each other, the
used counter can be used to make sure that an allocation still in use is not freed before it is no longer used. A typical interface is
get_iptr(id)/iptr_ref(id) to increment the counter, and
put_iptr(id)/iptr_unref(id) to decrement the counter. There is no "free()" per se; whenever the
refs counter drops to zero, the region will be freed.
The
used counter is a simple lock, indicating the data itself is being used. A typical interface is
iptr_use_begins(id) at the beginning of some operation that uses the dynamically allocated region, with a corresponding
iptr_use_ends(id) when that access or loop completes.
The 'trick' is that when the
used counter is zero, the data can be moved in memory, compacting the dynamically allocated memory area, so that the memory used by these allocations can be periodically defragmented. If all currently used
struct iptr used counters are zero, then the compaction is a very efficient single copy loop over the data.
A practical C implementation (that manages the heap size dynamically by obtaining memory from the OS) might have
volatile unsigned char *heap_ptr = NULL; /* Beginning of dynamic allocation area */
volatile size_t heap_len = 0; /* Size of dynamic allocation area */
int indirect_num = 0;
struct iptr *indirect_ptr = NULL;
/* Compact dynamic allocations, defragmenting dynamic memory regions.
Returns the size of the largest free fragment. */
size_t imem_compact(void);
/* Allocate size * count bytes, returning the ID of the memory.
Returns negative if not enough memory available.
This function will call imem_compact() if necessary. */
int imem_alloc(size_t size, size_t count);
/* Mark the allocated memory as used by something else.
Before the memory gets actually freed, as many additional imem_discard()
calls need to be made as the calls to this function. */
void imem_obtain(int id);
/* Discard previously allocated memory. */
void imem_discard(int id);
/* Begin an access to the allocated memory.
Returns a direct pointer to the data, or 'impossible' if invalid.
NOTE: Each begin must be paired with a corresponding end,
and the interval between the two should be minimized. */
*/
static inline void *imem_use_begins(int id, void *impossible);
/* End of accesses via the pointer obtained via imem_use_begins(). */
static inline void imem_use_ends(int id);
where the last two are rather simple,
static inline void *imem_use_begins(int id, void *impossible);
{
/* Check that the ID number refers to a valid iptr. */
if (id < 0 || id >= indirect_num)
return impossible;
/* Check that the iptr can be referenced. */
if (indirect[id].refs <= 0)
return impossible;
indirect[id].used++;
return heap_ptr + indirect[id].offset;
}
static inline void imem_use_ends(int id)
{
if (id >= 0 && id < indirect_num && indirect[id].refs > 0 && indirect[id].used > 0)
indirect[id].used--;
}
Additionally, you probably would implement helper functions depending on the actual data dynamically allocated, for example to copy and compare, or get and set individual values:
static inline bool imem_copy_from_ptr(int dst_id, int dst_offset, int length, void *src);
static inline bool imem_copy_to_ptr(void *dst, int length, int src_id, int src_offset);
static inline bool imem_copy(int dst_id, int dst_offset, int length, int src_id, int src_offset);
static inline int imem_cmp_ptr(int id, int offset, int length, void *ptr, int invalid);
static inline int imem_cmp_imem(int id1, int offset1, int length, int id2, int offset2);
static inline uint32_t imem_get32(int id, int offset, uint32_t invalid);
static inline uint16_t imem_get16(int id, int offset, uint16_t invalid);
static inline float imem_getfloat(int id, int offset, float invalid);
static inline bool imem_set32(int id, int offset, uint32_t value);
static inline bool imem_set16(int id, int offset, uint32_t value);
static inline bool imem_setfloat(int id, int offset, float value);
On microcontrollers with external RAM that is not fully directly accessible to the processor –– very much like what DiTBho described, with on-disk structures! –– a similar interface can be used to access that external RAM via memory apertures or caches of some sort.
These are certainly more complex, and incur run-time access cost (because we cannot just
use the allocated memory, but have to do bookkeeping around each access or set of accesses) compared to, using a statically allocated memory. However, this is definitely doable, and assuming the "user code" does not do any silly errors (like keep an area "in use" just because the programmer thinks they can shave off a few cycles of run time that way, or forgets to end use somewhere), is quite robust, and does not easily suffer any issues related to memory fragmentation.
Methods like this were heavily used on 8-bit and even 16-bit processors. At the hardware level,
memory bank switching was often used. Even today, there are microcontrollers using bank switching; there, an unique pointer is actually a tuple, (
bank,
address), because the contents of the banked address range depends on the active bank. Note that this is not really "virtual memory", because we cannot control the addresses; only the data visible in the banked address range can be
selected (from a small number of options, depending how many banks are mapped to the same region by the hardware). In many cases, the bank selection was actually just a standard serial-to-parallel latch, that controlled the highest address lines to the memory chips, as the processor or microcontroller itself did not have sufficient address lines to access all available memory. Sometimes, the bank switch only switched between ROM and RAM, by manipulating the chip enable pins on the ROM and RAM chips connected to the same address and data buses.
Personally, I have no problems in using dynamic memory management even on small microcontrollers like AVRs. (The Atmel ATmega32u4 is my favourite 8-bit microcontroller.) In my use case (using
avr-gcc compiler and
avr-libc C library), the heap (region of memory dynamic allocations happen from) and stack are a single shared region, with allocations preferring the low end, and stack growing downwards from the high address end. I have no problems with memory leaks; and the types of things I use memory allocation for, memory fragmentation is not an issue either.
What I do have trouble with, is determining whether at runtime the stack and heap collide/overlap. There are many methods to detect if such a collision has occurred (heuristically; no false positives, but with a small chance that a collision might be missed), for example via
canary values, but at that point the operation is already unsafe and the device should be restarted... testing basically consists of using canary values or similar techniques, perhaps periodically measuring the difference between the top of the heap and bottom of the stack (to determine the amount of leeway between the two),
and then trying to run the device through all possible scenarios. There are free and commercial simulators that can do static (just analyzing the code) and dynamic (running or simulating running the code) to stress test and find these things out, but they are architecture-specific, and the best ones mostly cost a lot of money.
Only in the simplest cases can I prove how much heap will be used, how fragmented it can be; how much stack will be needed in the worst case; and if the available RAM for both is larger than that, can I prove that the two will not collide.
In other words, this stuff is quite complicated and
sensitive, the moreso the less RAM you have available, but not technically impossible.
When others (and I guess myself too!) say that dynamic memory management is not often used with microcontrollers, they do not really mean it is technically impossible; they mean
it is not worth the effort. That if you go down that road without enough experience and skill, you will find very little success and a lot of frustration.
Yet, it is a skill that can be taught, much more so that say playing go or chess. I find it frustrating that so few C programmers are interested in such skills, and instead concentrate on
being fast at writing code, because that's what employers pay for. (The frustration stems from the fact that the results are becoming less and less reliable, with things like
CVE-2021-4043 being commonplace. It's a severe local privilege escalation (letting any local user on the machine get full superuser privileges), due to a very silly programming bug, that nobody caught in over a decade.)
Thus, advise that says it is not a good idea to use dynamic memory management on a microcontroller with very limited amount of RAM, stems from practical reasons and human stuff, and not technical details. You could say very concisely that
"most often, it is not cost-effective to use dynamic memory management on a microcontroller in C". It does not mean it cannot be done, just that the effort needed to make it work is usually too much, compared to the rewards.
I have a bit of a dichotomy here.
On one hand, I want to encourage programmers to use dynamic memory management, and get good at it, because when you do get good at it, you can do even those complex things even in constrained environments, and achieve Very Nice Results. On the other hand, if you want programming to be your career, being
reasonable is much more valuable. Things like continued maintenance, documentation, and even writing useful comments (describing developer intent and reasoning, and not what the code does)), algorithm analysis, algorithm implementation, general problem solving skills, and software design (modularity, software engineering, and so on); are much more valuable (in terms of employment and wages) than skills in a detail like dynamic memory management.
So, when someone like Dadu@ asks about this, I want to both show
how and
why, but also show why you might NOT want to do it.
The end result? Walls of text with interspersed code snippets like this post, that nobody has the time or will to read. Apologies.