Despite some of the negativity here I 'd go for it; if you need dynamic memory allocation then you need it - try implementing an OSI or TCP/IP comms stack without it. Of course you might implement a buffer management scheme and call it something different but if it looks and walks like a duck etc. If you don't have enough memory to preallocate it to all users, then you have to provide a mechanism to share it, with all the attendant risks such as it being used by two or more users simultaneously, memory leaks etc. That might be something as simple as a flag to indicate which of two users currently has ownership of a buffer but I'd argue it's still classified as dynamic memory allocation.
For a quick introduction take a look at:
http://www.cs.cmu.edu/afs/cs/academic/class/15213-f10/www/lectures/17-allocation-basic.pdfIf you do implement dynamic memory there's lots of example code out there although its quite easy to write a simple implementation yourself which means you should be very familiar with its characteristics, strengths and limitations. You can find code in the free real-time operating systems such as Chibios etc. but some of those may be difficult to understand due to optimizations etc.
As someone else said, there are a million and one ways of implementing allocators, but a good place to start is with the simplest. If it's appropriate for your application I'd use an allocation scheme whereby the memory pool is split into an array of fixed size blocks where the block size is determined by the largest block required by the application. Keeping the free blocks in a list avoids having to search for a free block - allocation is simply a case of returning the first entry in the free block list and removing that entry from the list. Freeing a block adds the block to the start of the list - ie. the list is a last in first out stack (LIFO). The free block list can be a linked list (along with a pointer to the start of the list), but its simpler to use an array of pointers to the free blocks, or an array containing the indexes of the free blocks (with a 'start of the list' variable containing the index of the first entry in the free list).
This scheme has the advantage that it is very simple, very fast and the time is deterministic - it is quick enough to allow blocks to be allocated or freed within an interrupt routine providing interrupts are disabled when the free block list is being manipulated (but see caveats below). It's fast enough not to bother with separate pools for different threads etc. - the interrupts only need to be disabled long enough to copy and then increment the free list index (typically 2 instructions) for allocation, whilst when freeing a block interrupts are disabled whilst the block is written into the free block list addressed by the free list index and decrementing the free list index. (Again typically 2 instructions).
Note that the above only applies for single core processors (or the allocator is only used by code running on one core) - things are more complex if you have multiple threads and multiple cores. Also processors that implement out of order instruction execution, data caching or buffered writes need more care - eg. ARM Cortex-M processors provide memory barrier instructions to ensure memory access sequencing. Compiler optimizations also need to be considered.
This scheme also keeps the heap metadata out of the pool so that buffer over and under-runs are less likely to corrupt the heap structure.
If the size of memory blocks requested vary significantly, wastage can be reduced by having multiple memory pools with different size blocks, but determining the size of each pool in advance may be difficult or impossible. One way round this is to start with one pool of large blocks, with the allocator splitting a large block into smaller blocks as required. Both schemes need separate free block lists for each block size. Freeing memory is more complicated if it is required to return blocks to the large block pool when the last used sub-block of a large block is freed off. (A reference counter can be associated with each large block to keep count of the number of its sub-blocks which are currently allocated, allowing the large block to be released when the count gets to 0). This probably won't be necessary though in many embedded systems as each smaller pool will grow as needed to match the worst case allocation requirement and stop growing.
If you want to use a more traditional allocator then BGet looks OK. It's simple but suffers the problems discussed by other posters - namely fragmentation, indeterminate execution time and heap metadata stored in the heap and vulnerable to buffer over and under-runs. Allocation requires the list of free blocks to be searched so with lots of fragmentation and large memory pools, a request for a large block can take a relatively long time (which would probably mean it is unusable from an interrupt routine).
You could rewrite BGet to separate the metadata from the pool. You could implement 'improvements' as required to suit your application. For example, memory is often in short supply in embedded systems; BGet adds headers to every block in the pool, free or allocated, containing its size and a link to the previous block. If you have lots of small allocations the overhead may be reduced by using shorter pointers and size variables appropriate to the size of the pool and the alignment of each block. Eg. a 2K pool with 2 byte alignment of blocks would need 10 bit pointers. Similarly the length of the size variable can be reduced further if the maximum size that can be allocated is restricted. If it were 128 bytes say and blocks are allocated in multiples of 2 bytes, then only 6 bits would be required. Both would allow the header to be reduced to 16 bits or 2 bytes.
Alternatively you could avoid the link to the previous block altogether, at the expensive of having to search the free block list when freeing the block, in order to find it's neighbours (if any) so that it can coalesce adjacent free blocks. I don't think the size element can be avoided though.
Mutual exclusion will be required for multi-threaded code, eg. by using monitors or semaphores - disabling interrupts probably would not be practical as they would potentially be disabled for too long, unless you have complicated code to restart the free list search in the event that another thread allocates or frees a block whilst the first thread is in the middle of a search.
There's nothing to stop you including more than one type of memory allocator, using the most suitable for each user's needs.
To aid in debugging, additional data structures may be used to record the ID of the code that allocated the memory and/or the file name and line number of the allocating code, and even a string supplied by the caller to help understand where, why and when the buffer was allocated. By allocating a larger buffer than the caller requested, magic numbers can be put just before and after the buffers returned so that the free routine can check for buffer over and under-runs. See
https://www.fourmilab.ch/smartall/ for an example.
By keeping stats about the memory pool(s) you can examine them periodically or at the end of a test run to get some idea about the behavior of the system including the amount of fragmentation, the maximum number of blocks allocated at any one time, the size profile of the blocks allocated at any one time, blocks which get allocated but never get freed indicating a leak or a buffer which probably should be statically allocated etc.