Author Topic: Dynamic memory allocation on embedded systems (Read 12023 times)

pyrohaz · « **on:** December 11, 2014, 07:18:17 pm »

Hi all,

I'm currently designing my own hardcode-esque operating system for my next digital wristwatch project. I want to implement a fair bit of functionality so it has an SD card socket to allow me to read data.

I've written a few programs such as a simple text viewer to view text in a text file on the SD card so obviously, I need some form of data buffer. Now nearly all of the coding that I've done on microcontrollers to date have needed to be relatively deterministic as they've been pretty real time. This system however isn't needing to accomplish anything to a specified time and time delays are fine. All my previous coding has generally consisted of statically allocated arrays which works if I know what functions I'm calling and when I'm calling them. I have no problem with that normally, in this instance however, I don't know when I'll be calling certain functions and I don't have enough RAM to statically allocate enough for every function. I do however have enough for worst case of each function (including stack to get to these functions etc.) so obviously, it seems relatively obvious to just use dynamic memory allocation.

After reading up about this on the net a bit more, there seems to be loads of opposition to dynamic memory allocation on embedded systems so I'm wondering if anybody can give me some feedback on this, preferably from personal experience!

Thanks,

free_electron · « **Reply #1 on:** December 11, 2014, 07:46:08 pm »

most embedded processors don't have an MMU so good luck finding out what is stored where , keeping track of it and finding out overruns ...

jerry507 · « **Reply #2 on:** December 11, 2014, 08:03:45 pm »

Having no MMU obviously makes things harder, but it doesn't make it impossible. I have successfully used dynamic several times on some pretty low memory micros.

Your problems are the same as any other system using dynamic memory, you just get less feedback about errors. Be careful about freeing everything you malloc, check for malloc returning null, error out right away if it does. The biggest issue for most people is overruns, so use limited versionf of stdlib functions (snprintf etc). Generally try and wrap everything with loops with hard upper size limits.

But you probably won't be able to read the allocation table. Even if you find out how to get to it, it's not exactly human readable most times. But then again, I've never had to try either. There are always other ways.

But making sure malloc is safe for concurrent use is much harder. Hopefully you have some support for that, or you're going to have a rough time. Malloc is not normally thread/multitasking safe.

nctnico · « **Reply #3 on:** December 11, 2014, 11:30:44 pm »

What worked very well for me is using a linked-list with blocks of data. The data is accessed through an API which hides the memory management completely. Memory overruns or leaks are impossible this way. Because the blocks of data can be stored in random, non sequential order memory fragmentation is never an issue.

pyrohaz · « **Reply #4 on:** December 12, 2014, 02:59:39 am »

Quote from: jerry507 on December 11, 2014, 08:03:45 pm

Having no MMU obviously makes things harder, but it doesn't make it impossible. I have successfully used dynamic several times on some pretty low memory micros.

Your problems are the same as any other system using dynamic memory, you just get less feedback about errors. Be careful about freeing everything you malloc, check for malloc returning null, error out right away if it does. The biggest issue for most people is overruns, so use limited versionf of stdlib functions (snprintf etc). Generally try and wrap everything with loops with hard upper size limits.

But you probably won't be able to read the allocation table. Even if you find out how to get to it, it's not exactly human readable most times. But then again, I've never had to try either. There are always other ways.

But making sure malloc is safe for concurrent use is much harder. Hopefully you have some support for that, or you're going to have a rough time. Malloc is not normally thread/multitasking safe.

Its pretty understandable that without an MMU this is going to be pretty hard. So far, all that I've done works and I make sure that if a malloc is null, that function exits and the system goes to a default handler. I actually write all my own stdlib functions that I use, for example numbers to strings etc. efficiency isn't a massive deal where as code understandability (to me) is.

I'm writing my application using the "superloop" method vs a dedicated RTOS, as the program is essentially linear, I'm pretty sure malloc will be alright, right?

Would you recommend using a third party dynamic memory library?

Quote from: nctnico on December 11, 2014, 11:30:44 pm

What worked very well for me is using a linked-list with blocks of data. The data is accessed through an API which hides the memory management completely. Memory overruns or leaks are impossible this way. Because the blocks of data can be stored in random, non sequential order memory fragmentation is never an issue.

Could you possible recommend some good sources to learn about this? I don't really know much about linked lists

edavid · « **Reply #5 on:** December 12, 2014, 03:46:41 am »

The lack of MMU doesn't matter.

Simple malloc is OK if you do all your allocations in your initialization code, not in the superloop.

Otherwise, your mallocs can start failing at any time, due to heap fragmentation.

You really need to understand both the memory allocator internals and the application to be sure this won't happen.

FPGAcrazy · « **Reply #6 on:** December 12, 2014, 09:29:53 am »

I read this before that people are saying that without a MMU it is not possible to use dynamic memory.
Dynamic memory doesn't need a MMU. If this was the case MS-DOS hadn't been able to work ;-)

The MMU only has thee major tasks.
1. It handles pages of lets say 4k (x86)
2. It translates virtual addresses into physical addresses. The so called translation look up.
3. It handles memory protection.

This translates into the following consequences.
1. It means that all allocations are based on lists of 4k pages.
2. It means that a continues virtual address space is mapped on the block(s) allocated under point 1.
3. That blocks belonging to one program can't be accessed by another programs.

On an embedded system all this is not really necessary.
Dynamic memory can still be performed.

All that needs to be done.
1. Create a pool of memory.
2. Use some kind of block allocation algorithm (first fit, second fit etc.)
3. Make a list with all allocated blocks or put this information in front of the allocated block and link them together.

Many DSP/MPU have the malloc function. But its use is not recommended.
Because of:
1. Due to a lack of memory protection management information may be destroyed.
2. Due to a lack of virtual memory the memory becomes fragmented resulting in allocation fails although the requested memory is available, but it isn't contingous in memory.

Keep in mind that this practice is very error proune. Be careful.

Howardlong · « **Reply #7 on:** December 12, 2014, 10:08:06 am »

Quote from: edavid on December 12, 2014, 03:46:41 am

The lack of MMU doesn't matter.

Simple malloc is OK if you do all your allocations in your initialization code, not in the superloop.

Otherwise, your mallocs can start failing at any time, due to heap fragmentation.

You really need to understand both the memory allocator internals and the application to be sure this won't happen.

Totally agree.

One way I have dealt with this is to allocate a fixed size per-process super-block as required and then sub-allocate within those blocks. To the programmer, they just call the sub-allocate with the process ID, and the sub allocate manages the super-block allocation too as necessary, returning a pointer (or NULL). The super-blocks were fixed at 16K in this case, but you'd generally pick a reasonable size bearing in mind your RAM size and ball park number of processes. By forcing allocation in this way, in a fixed size, you can avoid fragmentation problems that would affect other processes including the core super loop, but it does then mean your "application" code needs to be mindful of the limitation of a maximum allocation size. Another side benefit is that for each 16K block you can put things like process IDs and other metadata in each block's control block for debugging purposes.

I have also used the handle method (i.e., a pointer to a pointer), that allows for defragmentation on the fly, but you innevitably end up with indeterministic behaviour, but at least it is synchronous with the super loop in that it only needs to be executed when a malloc occurs. If you use DMA to do the memory defrag, and the blocks are of reasonable size, this can be pretty fast of course.

(For the old farts amongst us, I originally used a combination of the above two methods about 25+ years ago to allow use of bank switched EMS memory in Windows 3.0 in real mode, something that wasn't properly supported at the time by Windows, but people had plenty of EMS cards installed).

zapta · « **Reply #8 on:** December 12, 2014, 01:43:52 pm »

Avoiding the non deterministic memory utilization due to fragmentation is easy, even without a MMU. Set up a pool of preallocated blocks of fixed size and use dynamic data structures such as linked lists that use these blocks.

mrgregs · « **Reply #9 on:** December 12, 2014, 07:04:58 pm »

If you want something more heavy duty, and can handle some of the fragmentation issues, you could use the grandaddy of memory managers BGet. You can give it a memory pool to use, and then just use bget/brel like malloc/free. As noted above, though, the smaller the pool you use, the more likely you are to get heap fragmentation.

akis · « **Reply #10 on:** December 13, 2014, 12:58:46 am »

I am slightly confused. If you already have "malloc" that is by definition dynamic allocation which means someone has already written the code to handle dynamic memory management including allocations, deallocations, merging, thread safety and so on. On the Arduino C I do have malloc and a big subset of the standard C library so I can call pretty much anything. If you already have a standard C library provided or endorsed by the MCU manufacturer then this is the one you should use, no?

The way you and others have described it, it sounds as if you want to write your own "malloc" which of course is harder and can be done in a million and one ways.

Just to give you an idea, the standard MS C library malloc used to be a very poorly written piece of code, and probably still is. It used to have a single synchonization object to protect its blocks from corruption but that meant that you'd inadevrtently lock out all other threads when one thread was doing either malloc or free or new/delete and all the rest which are built on top of malloc. In addition it is constructed as a linked list of blocks which has the nasty side-effect of being very sensitive to overruns. If for example you allocate 10 bytes and somehow you manage to write 11 bytes you will most likely be destroying housekeeping objects, like pointers to next or the previous element in the linked list, thus corrupting the list and making the whole heap unuseable. Plus a nightmare to debug.

A better way to write your own dynamic allocator is to separate memory blocks from the descriptors of these blocks, so that the information concerning the location, size and other attributes of each memory block is completely isolated and safe from any buffer overruns or underruns or other corruption that can easily happen on the user memory blocks. That way your dynamic memory block lists will never get corrupted.

For thread safety and performance you should have a different block chain per thread. As a new thread starts it gets given its own chain. That means multiple threads can all go on malloc-ing and free-ing without clashing and waiting on each other.

Handing out memory blocks can be done in many different ways, for example first fit, best bit, zone fit, and myriads of other ways. It is not as trivial as " take a large block and chop it into bits". Typically you call on the OS to hand you large swathes of RAM, again typically called Heap, and then you partition that area as you see fit. The OS has not got a clue what malloc is, this is a high level function provided by the standard C library.

Because malloc is intrinsic to the C and CPP library, it gets used by lots of internal code and it is not easy to replace with your own.

Looking at my Atmega328 which seems to be very popular, I get "just" 2K RAM, not sure how much you can do with 2K for stack, static data and variables and on top malloc! And to be sure, running at 16MHz it means it can process anything in the 2K space in no time, which means many of the problems encountered in dynamic allocators that deal with many gigabytes of RAM are simply not present.

splin · « **Reply #11 on:** December 14, 2014, 02:37:02 am »

Despite some of the negativity here I 'd go for it; if you need dynamic memory allocation then you need it - try implementing an OSI or TCP/IP comms stack without it. Of course you might implement a buffer management scheme and call it something different but if it looks and walks like a duck etc. If you don't have enough memory to preallocate it to all users, then you have to provide a mechanism to share it, with all the attendant risks such as it being used by two or more users simultaneously, memory leaks etc. That might be something as simple as a flag to indicate which of two users currently has ownership of a buffer but I'd argue it's still classified as dynamic memory allocation.

For a quick introduction take a look at: http://www.cs.cmu.edu/afs/cs/academic/class/15213-f10/www/lectures/17-allocation-basic.pdf

If you do implement dynamic memory there's lots of example code out there although its quite easy to write a simple implementation yourself which means you should be very familiar with its characteristics, strengths and limitations. You can find code in the free real-time operating systems such as Chibios etc. but some of those may be difficult to understand due to optimizations etc.

As someone else said, there are a million and one ways of implementing allocators, but a good place to start is with the simplest. If it's appropriate for your application I'd use an allocation scheme whereby the memory pool is split into an array of fixed size blocks where the block size is determined by the largest block required by the application. Keeping the free blocks in a list avoids having to search for a free block - allocation is simply a case of returning the first entry in the free block list and removing that entry from the list. Freeing a block adds the block to the start of the list - ie. the list is a last in first out stack (LIFO). The free block list can be a linked list (along with a pointer to the start of the list), but its simpler to use an array of pointers to the free blocks, or an array containing the indexes of the free blocks (with a 'start of the list' variable containing the index of the first entry in the free list).

This scheme has the advantage that it is very simple, very fast and the time is deterministic - it is quick enough to allow blocks to be allocated or freed within an interrupt routine providing interrupts are disabled when the free block list is being manipulated (but see caveats below). It's fast enough not to bother with separate pools for different threads etc. - the interrupts only need to be disabled long enough to copy and then increment the free list index (typically 2 instructions) for allocation, whilst when freeing a block interrupts are disabled whilst the block is written into the free block list addressed by the free list index and decrementing the free list index. (Again typically 2 instructions).

Note that the above only applies for single core processors (or the allocator is only used by code running on one core) - things are more complex if you have multiple threads and multiple cores. Also processors that implement out of order instruction execution, data caching or buffered writes need more care - eg. ARM Cortex-M processors provide memory barrier instructions to ensure memory access sequencing. Compiler optimizations also need to be considered.

This scheme also keeps the heap metadata out of the pool so that buffer over and under-runs are less likely to corrupt the heap structure.

If the size of memory blocks requested vary significantly, wastage can be reduced by having multiple memory pools with different size blocks, but determining the size of each pool in advance may be difficult or impossible. One way round this is to start with one pool of large blocks, with the allocator splitting a large block into smaller blocks as required. Both schemes need separate free block lists for each block size. Freeing memory is more complicated if it is required to return blocks to the large block pool when the last used sub-block of a large block is freed off. (A reference counter can be associated with each large block to keep count of the number of its sub-blocks which are currently allocated, allowing the large block to be released when the count gets to 0). This probably won't be necessary though in many embedded systems as each smaller pool will grow as needed to match the worst case allocation requirement and stop growing.

If you want to use a more traditional allocator then BGet looks OK. It's simple but suffers the problems discussed by other posters - namely fragmentation, indeterminate execution time and heap metadata stored in the heap and vulnerable to buffer over and under-runs. Allocation requires the list of free blocks to be searched so with lots of fragmentation and large memory pools, a request for a large block can take a relatively long time (which would probably mean it is unusable from an interrupt routine).

You could rewrite BGet to separate the metadata from the pool. You could implement 'improvements' as required to suit your application. For example, memory is often in short supply in embedded systems; BGet adds headers to every block in the pool, free or allocated, containing its size and a link to the previous block. If you have lots of small allocations the overhead may be reduced by using shorter pointers and size variables appropriate to the size of the pool and the alignment of each block. Eg. a 2K pool with 2 byte alignment of blocks would need 10 bit pointers. Similarly the length of the size variable can be reduced further if the maximum size that can be allocated is restricted. If it were 128 bytes say and blocks are allocated in multiples of 2 bytes, then only 6 bits would be required. Both would allow the header to be reduced to 16 bits or 2 bytes.

Alternatively you could avoid the link to the previous block altogether, at the expensive of having to search the free block list when freeing the block, in order to find it's neighbours (if any) so that it can coalesce adjacent free blocks. I don't think the size element can be avoided though.

Mutual exclusion will be required for multi-threaded code, eg. by using monitors or semaphores - disabling interrupts probably would not be practical as they would potentially be disabled for too long, unless you have complicated code to restart the free list search in the event that another thread allocates or frees a block whilst the first thread is in the middle of a search.

There's nothing to stop you including more than one type of memory allocator, using the most suitable for each user's needs.

To aid in debugging, additional data structures may be used to record the ID of the code that allocated the memory and/or the file name and line number of the allocating code, and even a string supplied by the caller to help understand where, why and when the buffer was allocated. By allocating a larger buffer than the caller requested, magic numbers can be put just before and after the buffers returned so that the free routine can check for buffer over and under-runs. See https://www.fourmilab.ch/smartall/ for an example.

By keeping stats about the memory pool(s) you can examine them periodically or at the end of a test run to get some idea about the behavior of the system including the amount of fragmentation, the maximum number of blocks allocated at any one time, the size profile of the blocks allocated at any one time, blocks which get allocated but never get freed indicating a leak or a buffer which probably should be statically allocated etc.

tggzzz · « **Reply #12 on:** December 14, 2014, 10:08:25 am »

Quote from: zapta on December 12, 2014, 01:43:52 pm

Avoiding the non deterministic memory utilization due to fragmentation is easy, even without a MMU. Set up a pool of preallocated blocks of fixed size and use dynamic data structures such as linked lists that use these blocks.

That is often sufficient, but it isn't non-deterministic. Consider

the time taken to locate the next unused block
the time taken to coalese adjacent free blocks, if you need to allow grabbing two adjacent blocks
the time taken to locate a block when all blocks are currently being used

0xdeadbeef · « **Reply #13 on:** December 14, 2014, 11:21:57 am »

The problem with dynamic allocation in embedded systems is not the MMU, it's the performance and the system reaction to an "out of RAM" error.
Allocating the memory is not much of an effort - pseudo dynamic allocation is easy to do if you need e.g. different configurations in the same software that have different RAM needs.
In the end however, everything has to fit into the RAM. An "out of RAM" reset is not a nice thing. Especially if it would just happen again in the same situation.
When freeing RAM though, it is possible that due to fragmentation of the RAM, a memory amount that is available in theory is not available as block. Then you need either some kind of defragmentation during runtime (which is usually a no-go in real-time systems) or another reset -> this results in (somewhat) indeterministic resets depending on which function allocated first.
-> dynamic RAM allocation usually has to be avoided in embedded real-time system, pseudo-dynamic allocation can make sense in certain applications though.

nctnico · « **Reply #14 on:** December 14, 2014, 11:41:57 am »

A linked list with blocks can never have fragmentation problems. If the system runs out of memory it is time to ignore input and things will get back to normal after that.

0xdeadbeef · « **Reply #15 on:** December 14, 2014, 11:50:26 am »

A linked list is just a way to store information. Still if you free a byte at one address and one at a completely different address, you still don't have two adjacent bytes in the end.
Remapping RAM during runtime is nothing you really want to do in a real-time system and "ignoring input" is obviously a no-go for any kind of system.

grumpydoc · « **Reply #16 on:** December 14, 2014, 12:03:30 pm »

Quote

A linked list is just a way to store information. Still if you free a byte at one address and one at a completely different address, you still don't have two adjacent bytes in the end.

The idea is to pre-allocate a linked list of "things you need" and then manage that list as an "in use" list and a "free" list - you can then allocate and de-allocate items without fear that the main heap will become fragmented.

Quote

Remapping RAM during runtime is nothing you really want to do in a real-time system and "ignoring input" is obviously a no-go for any kind of system.

Depends on the application - for something like a router then dropping packets (i.e ignoring input) is exactly the right thing to do.

Generally I would avoid malloc/free for an embedded system once running - probably the majority of embedded systems actually do not need dynamic memory allocation and some where you think they do work better with other techniques.

I don't think anyone has asked what the embedded system is actually doing, or why the original poster thinks malloc/free would be appropriate.

0xdeadbeef · « **Reply #17 on:** December 14, 2014, 01:09:17 pm »

Not sure I agree.
IMHO the only easy approach to avoid fragmentation would be to use different allocation pools with e.g. 8,32,256 bytes. This implies a certain waste of (usually precious) memory though.

And regarding ignoring inputs: there might be specific applications where ignoring certain inputs might be acceptable.
Generally speaking, if I command a device or program to do something and it ignores me just because it is doing some internal garbage collection or whatever, I'd assume it is a piece of crap.

max_torque · « **Reply #18 on:** December 14, 2014, 03:09:00 pm »

Assuming you're not going to be making hundreds or thousands of these things, wouldn't it just be easier to move to a micro with more RAM?? (or one that supports "seamless" external RAM)

0xdeadbeef · « **Reply #19 on:** December 14, 2014, 03:33:11 pm »

Typically internal RAM means SRAM means no (or minimum) wait states. External RAM means waitstates, possible data/address multiplexing etc.
-> More pins (higher cost) and worse performance

tggzzz · « **Reply #20 on:** December 14, 2014, 07:35:20 pm »

Quote from: nctnico on December 14, 2014, 11:41:57 am

A linked list with blocks can never have fragmentation problems.

A moment's thought will indicate that's false in many real-world cases. If it takes more than a moment, there are plenty of textbooks/appnotes/libraries around that will indicate the cause, solution, and disadvantages.

Quote

If the system runs out of memory it is time to ignore input and things will get back to normal after that.

That rather depends on the system! Good luck if you are ignoring a "brake now" signal.

grumpydoc · « **Reply #21 on:** December 14, 2014, 07:44:19 pm »

Quote

That rather depends on the system! Good luck if you are ignoring a "brake now" signal.

If an embedded system has such an input, chances are that it will be written under a coding standard which forbids dynamic memory - eg MISRA

zapta · « **Reply #22 on:** December 14, 2014, 07:49:32 pm »

Quote from: tggzzz on December 14, 2014, 10:08:25 am

Quote from: zapta on December 12, 2014, 01:43:52 pm
Avoiding the non deterministic memory utilization due to fragmentation is easy, even without a MMU. Set up a pool of preallocated blocks of fixed size and use dynamic data structures such as linked lists that use these blocks.
That is often sufficient, but it isn't non-deterministic. Consider
the time taken to locate the next unused block
the time taken to coalese adjacent free blocks, if you need to allow grabbing two adjacent blocks
the time taken to locate a block when all blocks are currently being used

I think you ignored the 'preallocated' and 'fixed size' parts of my post. They makes all the 3 points very efficient (or O(1) in computer sciences speak). For example using a LIFO list of blocks.

tggzzz · « **Reply #23 on:** December 14, 2014, 09:59:53 pm »

Quote from: zapta on December 14, 2014, 07:49:32 pm

Quote from: tggzzz on December 14, 2014, 10:08:25 am
Quote from: zapta on December 12, 2014, 01:43:52 pm
Avoiding the non deterministic memory utilization due to fragmentation is easy, even without a MMU. Set up a pool of preallocated blocks of fixed size and use dynamic data structures such as linked lists that use these blocks.
That is often sufficient, but it isn't non-deterministic. Consider
the time taken to locate the next unused block
the time taken to coalese adjacent free blocks, if you need to allow grabbing two adjacent blocks
the time taken to locate a block when all blocks are currently being used

I think you ignored the 'preallocated' and 'fixed size' parts of my post. They makes all the 3 points very efficient (or O(1) in computer sciences speak). For example using a LIFO list of blocks.

No, I didn't.

It depends on what you mean by "preallocated" (i.e. before what and after what) and "fixed size" (i.e. I think you meant single/uniform size). Other people may interpret those words differently, quite legitimately.

pyrohaz · « **Reply #24 on:** December 15, 2014, 12:20:30 pm »

Thank you all for your help, its been a great read.

Quote from: splin on December 14, 2014, 02:37:02 am

Despite some of the negativity here I 'd go for it; if you need dynamic memory allocation then you need it - try implementing an OSI or TCP/IP comms stack without it. Of course you might implement a buffer management scheme and call it something different but if it looks and walks like a duck etc. If you don't have enough memory to preallocate it to all users, then you have to provide a mechanism to share it, with all the attendant risks such as it being used by two or more users simultaneously, memory leaks etc. That might be something as simple as a flag to indicate which of two users currently has ownership of a buffer but I'd argue it's still classified as dynamic memory allocation.

For a quick introduction take a look at: http://www.cs.cmu.edu/afs/cs/academic/class/15213-f10/www/lectures/17-allocation-basic.pdf

If you do implement dynamic memory there's lots of example code out there although its quite easy to write a simple implementation yourself which means you should be very familiar with its characteristics, strengths and limitations. You can find code in the free real-time operating systems such as Chibios etc. but some of those may be difficult to understand due to optimizations etc.

As someone else said, there are a million and one ways of implementing allocators, but a good place to start is with the simplest. If it's appropriate for your application I'd use an allocation scheme whereby the memory pool is split into an array of fixed size blocks where the block size is determined by the largest block required by the application. Keeping the free blocks in a list avoids having to search for a free block - allocation is simply a case of returning the first entry in the free block list and removing that entry from the list. Freeing a block adds the block to the start of the list - ie. the list is a last in first out stack (LIFO). The free block list can be a linked list (along with a pointer to the start of the list), but its simpler to use an array of pointers to the free blocks, or an array containing the indexes of the free blocks (with a 'start of the list' variable containing the index of the first entry in the free list).

This scheme has the advantage that it is very simple, very fast and the time is deterministic - it is quick enough to allow blocks to be allocated or freed within an interrupt routine providing interrupts are disabled when the free block list is being manipulated (but see caveats below). It's fast enough not to bother with separate pools for different threads etc. - the interrupts only need to be disabled long enough to copy and then increment the free list index (typically 2 instructions) for allocation, whilst when freeing a block interrupts are disabled whilst the block is written into the free block list addressed by the free list index and decrementing the free list index. (Again typically 2 instructions).

Note that the above only applies for single core processors (or the allocator is only used by code running on one core) - things are more complex if you have multiple threads and multiple cores. Also processors that implement out of order instruction execution, data caching or buffered writes need more care - eg. ARM Cortex-M processors provide memory barrier instructions to ensure memory access sequencing. Compiler optimizations also need to be considered.

This scheme also keeps the heap metadata out of the pool so that buffer over and under-runs are less likely to corrupt the heap structure.

If the size of memory blocks requested vary significantly, wastage can be reduced by having multiple memory pools with different size blocks, but determining the size of each pool in advance may be difficult or impossible. One way round this is to start with one pool of large blocks, with the allocator splitting a large block into smaller blocks as required. Both schemes need separate free block lists for each block size. Freeing memory is more complicated if it is required to return blocks to the large block pool when the last used sub-block of a large block is freed off. (A reference counter can be associated with each large block to keep count of the number of its sub-blocks which are currently allocated, allowing the large block to be released when the count gets to 0). This probably won't be necessary though in many embedded systems as each smaller pool will grow as needed to match the worst case allocation requirement and stop growing.

If you want to use a more traditional allocator then BGet looks OK. It's simple but suffers the problems discussed by other posters - namely fragmentation, indeterminate execution time and heap metadata stored in the heap and vulnerable to buffer over and under-runs. Allocation requires the list of free blocks to be searched so with lots of fragmentation and large memory pools, a request for a large block can take a relatively long time (which would probably mean it is unusable from an interrupt routine).

You could rewrite BGet to separate the metadata from the pool. You could implement 'improvements' as required to suit your application. For example, memory is often in short supply in embedded systems; BGet adds headers to every block in the pool, free or allocated, containing its size and a link to the previous block. If you have lots of small allocations the overhead may be reduced by using shorter pointers and size variables appropriate to the size of the pool and the alignment of each block. Eg. a 2K pool with 2 byte alignment of blocks would need 10 bit pointers. Similarly the length of the size variable can be reduced further if the maximum size that can be allocated is restricted. If it were 128 bytes say and blocks are allocated in multiples of 2 bytes, then only 6 bits would be required. Both would allow the header to be reduced to 16 bits or 2 bytes.

Alternatively you could avoid the link to the previous block altogether, at the expensive of having to search the free block list when freeing the block, in order to find it's neighbours (if any) so that it can coalesce adjacent free blocks. I don't think the size element can be avoided though.

Mutual exclusion will be required for multi-threaded code, eg. by using monitors or semaphores - disabling interrupts probably would not be practical as they would potentially be disabled for too long, unless you have complicated code to restart the free list search in the event that another thread allocates or frees a block whilst the first thread is in the middle of a search.

There's nothing to stop you including more than one type of memory allocator, using the most suitable for each user's needs.

To aid in debugging, additional data structures may be used to record the ID of the code that allocated the memory and/or the file name and line number of the allocating code, and even a string supplied by the caller to help understand where, why and when the buffer was allocated. By allocating a larger buffer than the caller requested, magic numbers can be put just before and after the buffers returned so that the free routine can check for buffer over and under-runs. See https://www.fourmilab.ch/smartall/ for an example.

By keeping stats about the memory pool(s) you can examine them periodically or at the end of a test run to get some idea about the behavior of the system including the amount of fragmentation, the maximum number of blocks allocated at any one time, the size profile of the blocks allocated at any one time, blocks which get allocated but never get freed indicating a leak or a buffer which probably should be statically allocated etc.

Thank you for literally considering nearly ever possible option! I'm currently thinking about having a dedicated memory pool to allocate RAM for each application from though after reading all comments so far, it seems that fragmentation is going to be the largest problem with variably sized RAM blocks. The time required to defrag the ram isn't actually a big deal to me, aslong as its <100ms, which for 20kB I'd assume so! I'm going to read your links today to inform myself on the whole topic, after being used to static allocation of everything, I don't actually know much about the whole dynamic memory structure at all. Its always been one of those things I'd took for granted! I think I'm going to take a look at BGet too.

Quote from: grumpydoc on December 14, 2014, 12:03:30 pm

Quote
A linked list is just a way to store information. Still if you free a byte at one address and one at a completely different address, you still don't have two adjacent bytes in the end.
The idea is to pre-allocate a linked list of "things you need" and then manage that list as an "in use" list and a "free" list - you can then allocate and de-allocate items without fear that the main heap will become fragmented.

Quote
Remapping RAM during runtime is nothing you really want to do in a real-time system and "ignoring input" is obviously a no-go for any kind of system.
Depends on the application - for something like a router then dropping packets (i.e ignoring input) is exactly the right thing to do.

Generally I would avoid malloc/free for an embedded system once running - probably the majority of embedded systems actually do not need dynamic memory allocation and some where you think they do work better with other techniques.

I don't think anyone has asked what the embedded system is actually doing, or why the original poster thinks malloc/free would be appropriate.

That't very true so I'll explain here!

I'm currently designing myself the ultimate smartwatch, in upgrade to my previous creations (http://hsel.co.uk/2014/06/04/watch-2-update/), so this one features GSM, GPRS, GPS, WiFi and SD, along with being much larger. The LCD only has a resolution of 128x64 and works with one byte addressing 8 pixels meaning the whole graphic buffer only needs to be 1024B.

The problem comes however with creating application specific functions. For example, when accessing the SD card, I've implemented a text reading application (opens file < max size, reads data into buffer, displays data on the screen with the ability to scroll through), so I think the best thing to do here would be to read in a constant buffer then request more data as required I suppose, for files less than the buffer size however, I would've liked to allocate less data than a constant buffer as there will always be other applications running in the background, for example monitoring the data stream from the GSM modem.

Interrupt wise, I was originally going to use DMA for all of the USART and SPI transactions for my USART transactions, I reset the buffer before I send a data request and keep reading data up to a terminating character - which I don't think can be done with DMA.

Quote from: max_torque on December 14, 2014, 03:09:00 pm

Assuming you're not going to be making hundreds or thousands of these things, wouldn't it just be easier to move to a micro with more RAM?? (or one that supports "seamless" external RAM)

Normally, if I was using one of the larger STM32 chips, I'd just shove a massive load of SRAM/SDRAM on the FSMC but I'm a little space limited both for the RAM and the IC. I'm using an STM32F103C8 (20kB SRAM) for my project. I think the best idea here would be for me to rework my application style to work with constant sized buffers for both ease of programming and tracking memory.

Thank you all for your help!


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Dynamic memory allocation on embedded systems (Read 12023 times)

Share me