If it can't move stuff around the only reasonable thing to expect is a fast OOM error. I don't see how you could do a defragmenting allocator in C without some terribly hacky macro shenanigans, but someone probably made one in C++.
What I did to fix this was to allocate the 100% at boot time and then release it.
I would do everything possible to avoid having a heap. I copy my string and conversion functions from "The C Programming Language" book (Kernighan and Ritchie) rather than use those from the C library. I don't use printf() for the same reason. I can usually get by with some form of puts() call.
I always look at the linker output and hope to not find _sbrk(). I won't provide one and I hope the environment doesn't provide one for me by default. If the library doesn't include _sbrk() then it may be in syscalls.c provided by the IDE (eg STM32Cube)
I have no idea how you prove correctness when using dynamic allocation and an increasing call stack.
becasue even philosophically, it has its meaning: "eternalism" vs "presentism"
Kind of an interesting question to turn on its head: if your habit is to divide things into "eternal" vs. "present", why is that?
what's actually dangerous about memory allocation?
Not so impossible, but slower by design -- namely that double indirect pointers be used at a minimum.The issue is the syntactic sugar. With C++ you can add that as a variation of smart pointers, with C it will require a lot of macro hacking and the code becomes even more error prone than standard C.
With C++ you can add that as a variation of smart pointers
any C++code-example of that? :-//
what's actually dangerous about memory allocation?
you need to allocate a buffer for a task that must be completed in a deadline, and there is no free memory.
This is all a matter of software design, and not an intrinsic problem.
Believe I heard the original Mac did something like this (as a strong preference, or a stipulation of the languages ran on the OS?), so it's not without precedence.
This is all a matter of software design, and not an intrinsic problem.
static allocation is safe by design, dynamic allocation always exposes a probability of disaster due to the Murphy's law, it can be demonstrated mathematically.
- Again using a stack is dynamic allocation. Interestingly enough, many people seem to either ignore it, or just think that it can be risky only if using recursion.
Kind of an interesting question to turn on its head: if your habit is to divide things into "eternal" vs. "present", why is that?
Only because I usually prefer to study things that I can describe in phase space, a space in which all possible states of a system are known and represented, and each possible state corresponds to one unique point in the phase space.
If you want a "predictable" memory allocator, the space space offers the best way to mathematically study and describe it.
and then re-use that array for different sorts of data in different parts of the program
Quoteand then re-use that array for different sorts of data in different parts of the program
The risk there is that the different parts might get to want the memory at the same time, perhaps after some later rework.
I think we might be flopping between extremes, assuming all allocation is either dynamic or static.
Still interested in getting opinions about this. As long as they are not limited to "don't do it". Or when they are, at least give concrete examples we can further discuss and see what kind of approach would be more problematic.
Your input is always appreciated.
You're talking about a rather general purpose allocator, here, as far as I understand. So again that's not quite what I'm after.
Even more interesting is to, OTOH, find use cases for which dynamic allocation (done properly with the right allocators) could actually be safer than just relying on static allocation. Or even if not safer, could at least be the only reasonable way of implementing things.One of my favourite examples is fgets() versus getline(). The former uses a fixed-size buffer (or one you can reallocate yourself), and the latter uses a dynamically allocated buffer. And since the typical use case is
char *line = NULL;
size_t size = 0;
ssize_t len;
while (1) {
len = getline(&line, &size, stdin);
if (len < 0)
break;
/* Use line; the pointer may vary from iteration to iteration.
If you steal the dynamically allocated buffer, set line=NULL, size=0.
*/
}
free(line);
line = NULL;
size = 0;
if (ferror(stdin)) {
/* Read error occurred */
}
it's not like it's that much more code, either. You can even detect embedded nul bytes in the input (strlen(line) != len).With that said, [brucehoult's] points do confirm that it's pretty much all about object lifetime, and favoring fixed-size allocations, as I pointed out before. That's indeed the direction I'm taking.I find myself doing that as well. If I don't have specific knowledge of the object lifetime or size, I just use the default general-purpose one, until I know more.
I think there are very strong parallels to be drawn between games on restricted mobile devices and typical embedded applications. Both are structured around an event loop and run, potentially, forever, with cyclic patterns of allocations and deallocations and a fairly fixed upper bound on the total memory actually being used. The thing you absolutely want to avoid is a cyclic pattern repeated thousands or millions of times slowly growing the heap without bound.Fully agreed. These often have rather unique patterns to the sizes of the allocations, too, so simple rules of thumb are unlikely to work.
Growing the heap (or memory-mapped pools) without bound does not imply a memory leak, either: all you need is a pattern of allocations and releases that interacts badly with your free space coalescing and allocator selection routines, and you end up with holes in your heap that are too small to use for new allocations, with releases only adding new holes (sufficient only for same-sized or smaller allocations). Nasty. (This is something I have not investigated myself, and have left for the general-purpose allocator designers and implementers to worry about.)
As I said a couple of posts up: never take a free'd memory block and spit it up to satisfy a smaller memory allocation. Always assume that whatever needed that large memory block before will need another one the same size again later.Like I said, I haven't investigated this myself; it could indeed be that simple.
The bucket brigade approach –– for example, splitting larger objects like arrays and strings into a sequence of sub-objects containing a varying amount of data in a fixed-size container –– is more efficient, but programmatically much more complex. The two main users I'm familiar with are Apache and the seq_file (https://www.kernel.org/doc/html/latest/filesystems/seq_file.html) interface in the Linux kernel. In the Linux kernel, it's often used with e.g. pseudofiles (the interfaces where the kernel provides run-time information using a file-like interface under /proc (procfs) or /sys (sysfs)).
char ch = ptr[idx];
char ch = ptr[idx/256][idx%256];
The current dynamic memory management interfaces (malloc()/realloc()/free() or new()/delete()) are, in my opinion, rather simplistic. At least a hint "will-grow" or "will-shrink" via a flag should be useful; maybe also grouping allocations (for pools or groups that can all be released at once). Dunno for sure; I just find it interesting to try and work out.
realloc() is evil. Don't use it. Don't try to grow things in place.I disagree: the SegArr just shifts the complexity to different code.
Use SegArr, as described in my most recent comment.
So, @SiliconWizard, do you already have an API or interface you want your implementation to conform to?
And then you get to the question of how much effort is appropriate here, considering how well already existing Garbage Collectors perform? (I have no answer or opinion to that myself.)
Your input is always appreciated.
You're talking about a rather general purpose allocator, here, as far as I understand. So again that's not quite what I'm after.
No, not really.
I think there are very strong parallels to be drawn between games on restricted mobile devices and typical embedded applications.
Both are structured around an event loop and run, potentially, forever, with cyclic patterns of allocations and deallocations and a fairly fixed upper bound on the total memory actually being used. The thing you absolutely want to avoid is a cyclic pattern repeated thousands or millions of times slowly growing the heap without bound.
This is very different -- and uses a different allocator -- from programs such as compilers, or business batch update tasks, or a process that is started up to service a web request.
realloc() is evil. Don't use it. Don't try to grow things in place.I disagree: the SegArr just shifts the complexity to different code.
Use SegArr, as described in my most recent comment.
The current dynamic memory management interfaces (malloc()/realloc()/free() or new()/delete()) are, in my opinion, rather simplistic. At least a hint "will-grow" or "will-shrink" via a flag should be useful; maybe also grouping allocations (for pools or groups that can all be released at once). Dunno for sure; I just find it interesting to try and work out.
realloc() is evil. Don't use it. Don't try to grow things in place. (except if you know the memory allocator gave you N bytes and you know you're using only M, and you want to grow it to something else also less than N -- but that's for you to know, not the memory allocator.)
True. I currently just see realloc() as being the proper place of complexity for certain cases –– not for all, by any means.That's what engineering is :-)realloc() is evil. Don't use it. Don't try to grow things in place.I disagree: the SegArr just shifts the complexity to different code.
Use SegArr, as described in my most recent comment.
At the moment I'm just trying to write a set of specialized allocators that can be used each for specific use cases. As you saw, I'm not trying to reinvent the wheel either; I've - at least for now - implemented well known allocators, such as linear (also called "arena") and pool allocators.Right. I asked, because I'm problem-solving oriented, and wanted to know if there was a *particular* problem you were trying to solve; or just building tools for solving a general class of problems.
I was thinking of entire projects, actually; not just implementing the allocator or GC itself.And then you get to the question of how much effort is appropriate here, considering how well already existing Garbage Collectors perform? (I have no answer or opinion to that myself.)The effort, actually - as you can guess - is not much about implementation. It took me just a couple hours at this point to get something working well. Sure it's just still a WIP. The effort is all about using allocation properly and choosing the "right" allocation patterns.
GCs, even the "best" ones, for anything timing-critical is frankly out of the question in general. I just want allocators - at this point at least - that are ideally O(1), the execution of which thus doesn't depend on state, such as what had already been allocated or freed.
//
// File mymodule.c
//
#define mymodule_malloc(N) malloc(N)
#define mymodule_free(P) free(P)
static bool mymodule_malloc_init()
{
/* Nothing for standard malloc() */
return true;
}
static void mymodule_malloc_cleanup()
{
/* Nothing for standard free() */
}
That's what engineering is :-)
If a system implements multiple tasks that are using dynamic memory allocation (like malloc()), it would be better to organize the architecture in a such way that each task would also manage its own, separate heap.
And I'm definitely not criticizing, just pointing out that at one end of the toolbox are the GC's, at the other end static allocations only, and a whole plethora of tools in the middle. I myself am very interested in the tools in the middle, and enjoy the discussion; just thought it was prudent to mention the GC end since the static end has already been mentioned.
As an example, let's say I'm writing a smart display appliance based on Teensy 4.x ($20) or NXP MIMXRT1062, running at up to 600 MHz, with 512k of tightly coupled RAM, another 512k on the MCU, let's say 8M of PSRAM, (...)
I am actually working on implementing a MIDI Synth on a Teensy 4.1 board, and after implementing all the base functions, I was thinking about ways to make efficient use of the external PSRAM! So that's kind of what triggered the whole thing here. But then of course I began thinking of all the potential applications of using dynamic allocations for typical embedded development while trying to avoid the usual pitfalls.That's why I liked Kalvin's post regarding multiple heaps so much: it nicely tied in my example on the Teensy with my point about using (one) reallocated block, with multiple heaps.
Similar thoughts have crossed my mind for things like simple file storage. Which might not be too bad embedded, a stack of presets for example can just pack into a struct, slot it into an array and you're good. Harder if you want to do wear leveling.If you have a sufficiently large, say 64-bit monotonic sector generator counter at the beginning of each sector (or multi-sector struct), it only takes 3+ceil(log2N) reads, IIRC, to find the oldest and the newest sector when using the entire N-sector device as a circular buffer. Also, one does not need to read the entire sector either, just the initial data (which doesn't speed up the search, but simplifies the code), unless I remember wrong.
using (one) reallocated block, with multiple heaps.
I am actually working on implementing a MIDI Synth on a Teensy 4.1 board, and after implementing all the base functions, I was thinking about ways to make efficient use of the external PSRAM! So that's kind of what triggered the whole thing here. But then of course I began thinking of all the potential applications of using dynamic allocations for typical embedded development while trying to avoid the usual pitfalls.That's why I liked Kalvin's post regarding multiple heaps so much: it nicely tied in my example on the Teensy with my point about using (one) reallocated block, with multiple heaps.
Actually, I think the best here, in this discussion, would be to try and forget the *heap* and not even call memory available for allocation a "heap". Or even several of them. That would avoid being biased by what we are used to.Right. In that case, let me rephrase: the pattern where I do use reallocation is very distinct.
- Do you typically use reallocation for growing blocks, or for both growing and shrinking? (My own use cases of realloc() are almost always only 'growing'.)Typical cases are buffers for things like files that fit in memory. I do not use the pattern where one examines the file size, and then reads that amount of data, because it is unreliable. Instead, I read into a dynamically growing buffer, and when complete, (do a whitespace compaction, comment removal pass and then) shrink the allocation to the size used. The shrinking is useful if additional allocations are done before the file area is freed, for example for key data needed later.
- Is that typically to implement dynamic tables?No; the common denominator seems to be human readable text or a related format, like config files, JSON, HPGL, G-code, etc., and input of unknown length or complexity.
An interesting detail: Current implementations for parsing floating-point numbers are slow. Even on spinny HDDs, the parser tends to be the bottleneck, not the storage I/O speed. Daniel Lemire's Number Parsing at a Gigabyte per Second (https://arxiv.org/abs/2101.11408) describes the issues and approaches well, but even it gives up and uses the old arbitrary precision approaches for values with more than 19 digits in the significand (decimal part excluding the power of ten exponent). (Arbitrary/multiprecision support does not need realloc(), because the new size is known when summing or multiplying anyway.)
That – not getting permission to publish – is really annoying.An interesting detail: Current implementations for parsing floating-point numbers are slow. Even on spinny HDDs, the parser tends to be the bottleneck, not the storage I/O speed. Daniel Lemire's Number Parsing at a Gigabyte per Second (https://arxiv.org/abs/2101.11408) describes the issues and approaches well, but even it gives up and uses the old arbitrary precision approaches for values with more than 19 digits in the significand (decimal part excluding the power of ten exponent). (Arbitrary/multiprecision support does not need realloc(), because the new size is known when summing or multiplying anyway.)Interesting to see someone else has done this now. I did the same thing in a proprietary system in 2006 and wanted to publish but was not allowed to. Most literals are far smaller than 19 digits, and if you're printing from a floating point value then you never need to print more than 17 digits to uniquely identity which IEEE double you started with anyway.
Also: you don't need arbitrary precision. Multiple-precision, yes, but not arbitrarily large. There is an upper limit that is if I recall correctly something like 1024+53+53 bits, or 144 bytes if you round to a multiple of 32/64/128 bits, no matter how long the input string of digits is (could be millions of digits).Of course. The exponent in IEEE 754 Binary64 'double' has 11 bits and mantissa 53 bits, so it cannot be more than 2048+53+1+1 (the extra bits for exact rounding and overflow detection), or 264 bytes.
The key is to use the first 17 digits to find which pair of doubles the input value lies between, then start to convert the mid-point of those two values back to decimal. You can stop as soo as you find an input digit that is greater than or less than the expansion of that mid-point.In general, obtaining the limiting pairs of doubles that bracket the decimal representation, would be useful in many other ways, too: for example, general interval arithmetic (https://en.wikipedia.org/wiki/Interval_arithmetic).
The graphics hardware necessitates the linear addressing
@Marco: would be interesting to give more precise examples of truly arbitrary size allocations in game engines, because those are precisely most often using the kind of allocators that I'm talking about in this thread, among which pool allocators (fixed size) is one of the most commonly used.