it is pretty obvious that the heap is pretty useless in an RTOS environment. Even if you mutex malloc() and free() your system will eventually blow up due to fragmentation.
It's neither obvious nor true, and there are several ways to deal with fragmentation. The most generic is to use memory handles which was very common in systems without virtual memory. Pool allocators are useful for transaction processing where you might need to allocate several small pieces of memory for a task but then free them all a once when the task is complete. You can use power of two allocators with pools for each size to limit fragmentation.
It certainly makes sense to have a small number of different sizes, and round each allocation up to the next size. Power of two might be a bit extreme. I've done memory managers for what are essentially embedded software (mobile phones with a few hundred KB of RAM, old 6502 and z80 machines with 64 KB or less). Often having blocks of 1.0 and 1.5 times a power of two works well e.g. 8, 12, 16, 24, 32, 48, 64, 96... In my experience, going more fine-grained than that gets counter-productive.
When such a block is freed, it goes on to a list of free blocks of that size, and can be immediately reused.
Usually I create groups of small blocks by subdividing a large block of maybe 256 bytes or 1K or 4K (fixed, depending on total RAM size) into equal sized small blocks. Then, if an entire large block of small blocks becomes free, it can be reallocated for a different block size if needed. Sorting free blocks of a given size can promote this happening (maybe just a heap, not a full sorted list).
Many systems have periodic memory needs. Often there is a startup phase that is very distinct from the running phase, so it is good to reclaim as much memory as possible at the end of startup. It is good if you can get the application programmer to call a special memory management function at this point.
Pools are good for periodic allocation and mass freeing when the task doing that can be easily separated from other things happening at the same time (or only one thing happens at a time).
One mistake many memory managers have historically made is a large block being freed, and then the memory manager splits it up to service smaller requests. e.g. "first fit" or "worst fit" algorithms. If the demand is periodic then a large block of the same size will very likely be needed soon, and if the original has been split up then a new large block must be found. Repeated many times this can result in a large number of small, unneeded, unusable memory blocks but no contiguous space for a large block to be allocated.
Real memory managers stopped making that mistake decades ago, but alas not people rolling their own by following their old university textbook!
Another thing I do when I have full control over the whole program is to simply forbid individual memory allocations over a certain size. On a machine with a few hundred KB of RAM that might be as small as 256 bytes. On a machine with hundreds of MB to a couple of GB it might be 4 KB (or the same as the VM/TLB page size). If the programmer wants an array bigger than that then they have to use an array of pointers to blocks, or a tree of blocks, or a list. I provide classes for arrays and strings and buffers that do this internally, without complicating the external interface.