Author Topic: What is the use of dynamic memory allocation in C language (Read 13483 times)

Dadu@ · « **on:** February 12, 2022, 12:02:19 pm »

Can someone demonstrate with example what is the use of dynamic memory in C language?

I tried a lot to understand it but I don't understand what is its usefulness. I know that we can allocate dynamic memory or static memory for any variable.

Siwastaja · « **Reply #1 on:** February 12, 2022, 12:33:30 pm »

Example: Your program is a photograph image editor.

User opens a 100Mbytes image file. You allocate 100MB for the image. Once the user closes the image, the memory is freed.

Now if user opens a 100MB, 10MB and 1MB images at the same time, 111MB of memory is actually consumed. Rest is free for opening more images, or running different programs on the OS.

Other option? Statically allocate for 10 100MB images. This requires whopping 1000MB before anything happens, and user can't open any image bigger than 100MB, and is limited to max 10 images at the same time.

This idea is language independent. Some languages do this more automatically for you. In C, you use malloc() to allocate, calloc() to simultaneously allocate and clear to zero, realloc() to change the size of allocation (for example, if image becomes bigger or smaller during editing), and free() to deallocate when done.

Or, are you thinking more about typical small microcontroller applications? In those, you don't often find real use for dynamic memory allocation, and indeed, if you don't need it, do not use it in small embedded applications because then it might do more harm than good. Use wisely.

Nominal Animal · « **Reply #2 on:** February 12, 2022, 05:20:49 pm »

On embedded microcontroller implementations, dynamic memory allocation is rare. However:

Let's say you have a controller device intended to be used in greenhouses. It uses some kind of clever networking scheme, where the sensors can be enumerated at any point in time. Lets say you need to support a few hundred sensors per controller, but the sensors can vary from temperature to moisture to CO₂ and so on.

Do you reserve memory statically for each sensor type (because they need different structures in memory to record their state)?
Or do you instead have some memory set aside as a "pool", from which you allocate the needed structures, one per sensor, as they are enumerated?

Granted, in C, you probably would not use malloc()/free() for this (because on many microcontroller base libraries it uses a single pool of memory shared with stack), and instead write your own dedicated and optimized allocator. (For example, that allocator can quite easily "repack" the structures to avoid memory fragmentation, if there is a separate array, pointing to each sensor, for quick lookup, and no other references to each structure except between request and response to that particular sensor.)

Similarly, whenever you have a situation where you support a limited number of run-time-detected modules or clients, but those can be of different types and require a different structure in memory to properly handle them, you'll probably want some form of dynamic memory management to make sure you can support the maximum number of modules/clients, without requiring extra hardware (like external RAM). Often, it is cheaper to buy that extra hardware per unit, because it is cheaper than the software development cost for the software to run on cheaper hardware but provide the same features.

I suspect that IP networking and especially TLS-secured connections (https in particular) is where dynamic memory management starts being necessary. In particular, TLS has many different key exchange mechanisms and many ciphers both ends need to agree upon, to form the secure connection. This phase can easily require kilobits of RAM that are not needed afterwards. Some IP networking implementation only support one or a small number of connections, to avoid this; but using TLS it becomes more complicated to avoid dynamic memory allocations than to use dynamic memory allocation, unless you reserve the maximum amount of RAM, and do not mind it being unused for almost all the time. If the IP networking implementation limits to a single connection, it can probably use a common buffer for the handshaking data and TCP data transfer buffers, but that is very easy to cripple (Denial of Service) via repeated connection attempts; a more complicated implementation can be much more robust. The fact that e.g. BSD and Linux IP networking implementations are pretty darned rock solid (even though they are under constant attacks) shows that it can be done.

In all cases the pattern is the same: there are more possibilities for use, than one can statically allocate for. Thus, some kind of sharing mechanism is needed. Dynamic memory management is just the name for this sharing-of-memory.

DiTBho · « **Reply #3 on:** February 12, 2022, 05:54:34 pm »

Code: [Select]

lib_btree_other # make run
#info, instrumented version, r194 (Sat Feb 12 2022)
#info, lib_btree_z_v1/btree_init_property: "initialized"
#info, lib_btree_z_v1/btree_init_methods: "initialized"
#info, lib_btree_z_v1/btree_init_bkpool: "initialized"
bkpool has 4 total blocks
bkpool has 3 empty blocks

> disk_format
disk "disk.bin", blocks low level initialization
|.................................|
|.................................|
|.................................|
|.............................    |

> coin_add 0..21
QA: root has changed: prev=(reserved)  curr=00000002  btree_start_new
QA: root has changed: prev=00000002  curr=00000007  node_insert_new_root
QA: root has changed: prev=00000007  curr=00000018  node_insert_new_root

> coin_list
0 1  | 10 11  | 12 13  | 14 15  | 16 17  | 18 19  | 2 20  | 21 3  | 4 5  | 6 7  | 8 9

> bkpool_show
iblock1=00000007  ticket=0001 _ M 0130 node_insert_new_root.bkpool_block_replace("p_bkroot")
iblock1=00000002  ticket=0002 _ _ 0059 node_make.op("p_bknode")
iblock1=00000016  ticket=0003 _ M 0002 btree_coins_show.bkpool_block_replace("p_bknode")
bkpool's
   trig: { ticket=3 lowest=2 }
   blocks:
   {
        3 total
        0 empty
      712 moved
   }

> disk_blocks
disk "disk.bin":
|.d.dddLGddLddLddLGRddLddLddLGddLd|
|dLGddL...........................|
|.................................|
|.............................    |

> disk_sync
bkpool_flushing ... done

> disk_blocks
disk "disk.bin":
|BdLdddLGddLddLddLGRddLddLddLGddLd|
|dLGddL...........................|
|.................................|
|.............................    |
>

This is an example of "custom" dynamic allocator used for an embedded firmware of an industrial embroidery machine.

It's a polymorphic-disk-cached-based b*tree (who? what?

)

The algorithm basically works on { root_node(R), leaf_node(L), glue_node(G), bmap_node(B), data_node(d) }, which are of the same physical disk-block size: 512 byte. It was initially developed on a Linux-workstation and implemented with malloc() and free() and its keys were not polymorphic ( means able to handle different kind of data-type, e.g. strings rather than integers) but only able to handle integer numbers.

Then it got under code-refactoring, and I added the polymorphic layer, but it still it used malloc() and free().

Then it got under a new code-refactoring, and I replaced the malloc() and free() functions with a static-allocator of 64MB of size, practically one and half the total amount of ram of the physical target embedded board. It was a kind of dumb-pool written like an array.

Then it got under a new code-refactoring again, and I replaced the stalloc code with a block-bkpool model that replaces "pointers" with "iblock" addresses, coupled with a smart allocator that somehow works like virtual memory on PCs.

Long story of step-by-step progress but also full of epic failures, catastrophic bugs, and moments so sad that I wanted to give up everything and flee to the north pole. Seriously, "malloc()" is easy to use, while this stuff is very complex and hard to test, and it took me several months to have something working in a decent way.

The smart pool allocator is still under testing, in the example above, you can see the b*tree algorithm only has four blocks of memory to store its nodes, one block is reserved (for bitmap, bitmap is used to manage free blocks on the disk), while the allocator continuously flushes and reload blocks during its functioning.

The number of blocks is limited, as you can see, in order to insert 20 keys-data nodes it needed to move 712 blocks on a pool of 3 blocks: of course, this is a testing-scenario that I purposely created in this way to *stress* the algorithms. The final product will have more 192 blocks in the bkpool.

rstofer · « **Reply #4 on:** February 12, 2022, 11:51:26 pm »

Any time you are using something like a linked list, you will dynamically allocate nodes added to the list. This is the classic example because not only does it deal with allocation, there is plenty of opportunity to try to use a null pointer. You would also deallocate items removed from the list.

There are other ways to do these things but dynamic allocation is the most common.

Nominal Animal · « **Reply #5 on:** February 13, 2022, 01:17:56 am »

Quote from: rstofer on February 12, 2022, 11:51:26 pm

Any time you are using something like a linked list, you will dynamically allocate nodes added to the list.

Well, not necessarily: consider the array form. The classic example is an array of nodes, where each node has the relative index to the following node (and optionally preceding node), or zero at the final node in the list; plus of course the data for the current node. All nodes are statically allocated as an array beforehand. Unused nodes form one linked list. To add a node to some other list, you pick one from the unused list, fill the data members, and prepend/append/insert it into the list you want. Each list is identified by the index to its first member.

Even binary heaps are often managed in an array form. (I personally use a binary min-heap with event timestamps to manage lots of timed events using only one timer. If there is a fixed upper limit to the number of timers I need to support, I use a statically allocated array for the min-heap.) There are no indexes used, because the node at index k (starting at 0) has its parent at (k-1)/2 (integer division), and two child nodes at 2*k+1 and 2*k+2, unless they exceed the valid indexes in the array.

A related data structure, the disjoint-set, is best implemented in array form, but logically forms a set of trees (until flattened to a direct lookup array for fastest use). It is used to determine if nodes belong to the same set or not, by starting from every node separate, then merging pairwise nodes (and all nodes connected to said nodes directly or indirectly) into a single set.

Now, one can argue whether the above is dynamic memory management or not. In C, I consider it dynamic object/structure/element management, since the size and type of the structure used is fixed at compile time.
Others disagree, and consider the above a variant of memory pools, and thus dynamic memory management.

The key difference, in my opinion, is when the type and size of each allocated object/structure is not known beforehand and may vary. Then, you cannot statically allocate separate arrays for each possible size up to the maximum number of nodes; there just isn't enough RAM available for that. So, you set up one or more regions of memory that can be near-arbitrarily split, and used as needed, to hold any structure or data the caller wants. Which definitely is dynamic memory management.

nctnico · « **Reply #6 on:** February 13, 2022, 01:39:05 am »

Quote from: Nominal Animal on February 12, 2022, 05:20:49 pm

I suspect that IP networking and especially TLS-secured connections (https in particular) is where dynamic memory management starts being necessary. In particular, TLS has many different key exchange mechanisms and many ciphers both ends need to agree upon, to form the secure connection. This phase can easily require kilobits of RAM that are not needed afterwards. Some IP networking implementation only support one or a small number of connections, to avoid this; but using TLS it becomes more complicated to avoid dynamic memory allocations than to use dynamic memory allocation, unless you reserve the maximum amount of RAM, and do not mind it being unused for almost all the time.

I ran into this problem recently for a microcontroller and gave it some thought. I came to the conclusion that you will have to make fixed memory allocations (memory pools if you want) that support the largest amount of memory needed by every component of the application at any time in order to make the application work reliable. Ofcourse you could be clever and could lock out tasks from a memory pool for a longer period of time but this gets tricky quickly.

On PCs you can run into the same problem but since there is a user at the helm who can restart it and have the ability to use a harddrive as extra memory, running out of memory is less of a reliability issue.

DiTBho · « **Reply #7 on:** February 13, 2022, 02:56:40 am »

Quote from: nctnico on February 13, 2022, 01:39:05 am

I ran into this problem recently for a microcontroller and gave it some thought.

My approach here is lazy about HTTPS:
- only 4 connections are pre-allocated via a stalloc pool
- if the pool is full, the fifth and following clients will be pending in a FIFO
- they will only got receive a HTTP (Firefox says "unsafe", lol) page telling to wait

This stuff is applied on a Fon2100 running Xinu. It only has 8MB of ram. The OS takes 3MB.

Nominal Animal · « **Reply #8 on:** February 13, 2022, 05:07:52 pm »

OpenWRT is a good example of a full non-RT operating system (Linux) intended to be used in network appliances, especially routers. It is a full 'hosted' environment despite being embedded, running on lots of devices with limited amount of RAM. However, as typical amounts of memory increase, so does the software requirements... Most of OpenWRT, including the Linux kernel, is written in C.

The Linux kernel is one place where dynamic memory management is used a lot, but without any leaks. (Many Linux machines keep running without interruption for years on end; any kernel memory leaks during normal operation would make that impossible, and require rebooting at more or less regular intervals.)

However, the Linux kernel code is definitely advanced stuff. With a minimal userspace and a trimmed kernel (unnecessary options removed), something like 8 MiB of RAM is sufficient for a complete, full-featured Ethernet and WiFi stack. (It's the rest of the stuff, the user interfaces and the userspace services, that cause the recommended amount of RAM available to be so huge (128 MiB) nowadays.)

Considering how microcontrollers like NXP i.MX106x series (as in Teensy 4.x) already have 1 MiB of RAM available, and ESP32-S2-WROVERs have 2 MiB, even on microcontrollers we are approaching the point where dynamic memory management – that is, using a "generic" pool of memory, which is portioned and provided for each task at hand as needed, and then returned back to the pool; for example via C malloc()/calloc()/realloc()/free() – becomes useful enough to use, even considering the much increased complexity of the code.

In every case, us developers need to make decisions. The complexity of the code is an important one. Often it is cheaper and more reasonable to simplify things, even if it means more expensive hardware is needed. I like to do the complex approach myself on my hobby projects, because I find it fun and interesting and challenging; but it would not make sense if I did this as a job; I'd spend way too much time and effort on things that do not yield corresponding rewards. (Poor Ed Kloonk has been waiting for almost two weeks now for me to do a small change to a program –– volume peak meter –– I posted here a year ago, because I just cannot add a kludge on top of a hack, and instead have the innate need to clean it up and make it more reliable and useful.)

The underlying question, "When to use dynamic memory allocation in C" –– is a hard one to answer. It always depends on the situation and developers involved.

When writing C in a fully featured operating system, there is one anti-pattern I often encounter and rant about that I'd like to point out: avoiding dynamic memory management when it really is useful.

A particular case is when using snprintf() (or worse, sprintf()) to construct a formatted string, using a statically allocated buffer that is arbitrarily limited to some size, hopefully large enough.

The following implements sprint2() for use with any C standard library (version C99 or later), and more efficient sprint() for use in Linux/MacOS/Android/BSDs:

Code: [Select]

#define  _GNU_SOURCE
#define  _DEFAULT_SOURCE
#define  _BSD_SOURCE
#define  _XOPEN_SOURCE 500
#include <stdlib.h>
#include <stdarg.h>
#include <stdio.h>

/* printf() to a dynamically allocated buffer.
   Given a printf() format (and optionally data to be formatted),
   this function returns a dynamically allocated pointer to the
   resulting string.
*/
char *sprint(const char *format, ...)
{
    va_list  args;
    char    *ptr = NULL;
    int      len;

    if (!format) {
        fprintf(stderr, "sprint(): NULL format string specified!\n");
        exit(EXIT_FAILURE);
    }

    va_start(args, format);
    len = vasprintf(&ptr, format, args);
    va_end(args);

    if (len < 0) {
        fprintf(stderr, "sprint(): vasprintf() failed, most likely due to running out of memory.\n");
        exit(EXIT_FAILURE);
    }

    return ptr;
}

/* Alternate, slower implementation for any C99 or later standard library.
*/
#ifndef  SPRINT_MIN_SIZE
#define  SPRINT_MIN_SIZE  64
#endif

char *sprint2(const char *format, ...)
{
    va_list  args;
    char    *ptr;
    size_t   size = SPRINT_MIN_SIZE;
    int      len;

    if (!format) {
        fprintf(stderr, "sprint2(): NULL format string specified!\n");
        exit(EXIT_FAILURE);
    }

    ptr = malloc(size);
    if (!ptr) {
        fprintf(stderr, "sprint2(): Out of memory.\n");
        exit(EXIT_FAILURE);
    }

    while (1) {
        va_start(args, format);
        len = vsnprintf(ptr, size, format, args);
        va_end(args);

        if (len < 0) {
            fprintf(stderr, "sprint(): vasprintf() failed, most likely due to running out of memory.\n");
            exit(EXIT_FAILURE);
        } else
        if ((size_t)len < size) {
            return ptr;
        }

        free(ptr);
        size = (size_t)len + 1; /* +1 for the end-of-string NUL char */
        ptr = malloc(size);
        if (!ptr) {
            fprintf(stderr, "sprint2(): Out of memory.\n");
            exit(EXIT_FAILURE);
        }
    }
}

int main(void)
{
    const char *s = "Hello";
    int         x = 4;
    double      d = 5;

    char *example = sprint2("s=\"%s\", x=%d, and d=%g.\n", s, x, d);
    fputs(example, stdout);
    free(example);

    return EXIT_SUCCESS;
}

You use the sprint()/sprint2() function exactly like you would use printf(), except that instead of printing to standard output, it returns a pointer to the dynamically allocated string. You do whatever you want with it, and then free() it when you're done.

(We could wrap the function definition in preprocessor macros that choose the appropriate one for the current operating system and C library, so that one could always use just sprint(). I was just too lazy to do it here.)

I have similar rants about reading lines from standard input or files or other streams using fgets(), when all but MSVC provide getline() and getdelim(), that can allocate and reallocate the input buffer as needed. For example, to output one or more files (or standard input) with line numbering, with both CR LF and LF newlines supported even in the same file:

Code: [Select]

#define  _POSIX_C_SOURCE  200809L
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

int do_file(FILE *in, unsigned long *linenumptr, FILE *out)
{
    unsigned long  linenum = 0;
    char          *line = NULL;
    size_t         size = 0;
    ssize_t        len;
    int            err = 0;

    if (!in) {
        fprintf(stderr, "do_file(): NULL input stream.\n");
        errno = EINVAL;
        return -1;
    }
    if (!out) {
        fprintf(stderr, "do_file(): NULL output stream.\n");
        errno = EINVAL;
        return -1;
    }

    if (linenumptr)
        linenum = *linenumptr;

    while (1) {
        len = getline(&line, &size, in);
        if (len < 0)
            break;

        linenum++;

        /* Remove newline (LF, or CR LF) at end. */
        if (len > 0 && line[len - 1] == '\n') {
            --len;
            if (len > 0 && line[len - 1] == '\r')
                --len;
            line[len] = '\0';
        }

        /* Print the line number, colon, and a space.  Check for errors. */
        if (fprintf(out, "%lu: ", linenum) < 0) {
            err = EIO;
            break;
        }

        /* Since the input may contain embedded NUL chars, we use fwrite() instead of fputs(). */
        if (fwrite(line, 1, (size_t)len, out) != (size_t)len) {
            err = EIO;
            break;
        }

        /* Append a standard newline. */
        fputc('\n', out);
    }

    /* The dynamically allocated line buffer is no longer needed. */
    free(line);
    line = NULL;
    size = 0;

    if (linenumptr)
        *linenumptr = linenum;

    /* Read error? */
    if (!err && (ferror(in) || !feof(in)))
        err = EIO;

    /* Write error? */
    if (ferror(out))
        err = EIO;

    /* Return 0 if no errors. */
    if (!err)
        return 0;

    errno = err;
    return -1;
}

int main(int argc, char *argv[])
{
    unsigned long  lines = 0;
    const char    *name;
    FILE          *in;

    if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
        const char *arg0 = (argc > 0 && argv && argv[0] && argv[0][0] != '\0') ? argv[0] : "(this)";
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s [ -h | --help ]\n", arg0);
        fprintf(stderr, "       %s FILENAME | - ...\n", arg0);
        fprintf(stderr, "\n");
        fprintf(stderr, "This program will output the named files with continuous line numbering.\n");
        fprintf(stderr, "For standard input, use '-' as the file name.\n");
        fprintf(stderr, "\n");
        return EXIT_SUCCESS;
    }

    for (int arg = 1; arg < argc; arg++) {
        if (!strcmp(argv[arg], "-")) {
            name = "(standard input)";
            in = stdin;
        } else {
            name = argv[arg];
            in = fopen(name, "rb");
            if (!in) {
                fprintf(stderr, "%s: %s.\n", name, strerror(errno));
                return EXIT_FAILURE;
            }
        }

        if (do_file(in, &lines, stdout)) {
            fprintf(stderr, "%s: %s.\n", name, strerror(errno));
            return EXIT_FAILURE;
        }

        if (in != stdin) {
            if (fclose(in)) {
                fprintf(stderr, "%s: Read error (while closing file).\n", name);
                return EXIT_FAILURE;
            }
        }
    }

    return EXIT_SUCCESS;
}

Most of the code is error checking. getline() reads a new line from the stream, and also takes a pointer to a dynamically allocated buffer and a pointer to the size allocated to the buffer, with NULL and 0 indicating "no buffer allocated yet", and (re)allocates a buffer large enough for the next line.
This is rock solid. Even lines containing embedded NUL bytes are output correctly, because it uses fwrite(); fputs() would truncate the line at the first embedded NUL byte. A simple cleanup function could be used to remove embedded NUL bytes.

Error checking is not something that can be left to be learned later on, or "to be added if I have time later on". It is either baked in, or it is an afterthough bodge; like fixing a broken part with bubblegum and spittle.

Dynamic memory management in C is not really complicated. When running a full operating system, all dynamically allocated memory is freed by the OS kernel anyway, so usually no memory allocations are freed just before the program exits. Otherwise, whatever you allocate, you free() when no longer used. You just must make sure you don't accidentally leave a pointer pointing to the freed memory and try to access the freed memory that way; this is a common bug, "use after free", and can be hard to debug unless you "poison" the data before freeing. (Poisoning is a fancy name for clearing the data to an invalid value.)

nctnico · « **Reply #9 on:** February 13, 2022, 06:05:43 pm »

Still having to allocate and free is very prone to error. A better approach is to allocate data on the stack (if you need relatively small chunks of memory). In C you are allowed to size arrays at runtime and those go onto the stack. This also very elegantly allows to have a variable buffer size vsnprintf() .

If you need to use malloc/free a good approach is to switch to C++, use objects and make sure all allocated memory is freed when the object is destroyed.

Nominal Animal · « **Reply #10 on:** February 13, 2022, 08:32:29 pm »

Quote from: nctnico on February 13, 2022, 06:05:43 pm

Still having to allocate and free is very prone to error.

It is more complex than static allocations or local stack allocations (alloca() in C), but I would not say it is inherently prone to error.

Because it is more complex, and many programmers do not bother to consider or track the lifetime of the dynamically allocated objects, it is common to have bugs in this. But it is more like people who do not bother to differentiate between which and witch, or write would of or could of instead of the correct would have or could have. Some do not see the difference – or the need to consider the lifetime of the dynamically allocated objects – worth considering, and that is why the errors occur.

In my examples above, both the 'buf = sprintf(...); use(buf); free(buf);' and the 'len = getline(&line, &size, stream); use(line, len); free(line); line = NULL; size = 0;' patterns are absolutely rock solid. Yes, you do need to check for errors, but no more than when using a statically allocated buffer. Less, if you (like me) consider it okay for a program to abort with an out-of-memory error message, instead of trying to work around it, like in the above examples.

Furthermore, there are even more complex things that a C developer needs to worry about, when they use C 'in anger'. A good example is nonblocking I/O, using select()/poll()/epoll(). One of the rants I often end up in, is that then using Unix/POSIX low-level I/O, read() and write() from <unistd.h>, the OS kernel is allowed to return a short count, i.e. read or write only some of the initial data in the buffer; and that there is no guarantees that the full buffer will be read or written. I often get the retort that "it never happens with normal files", but that is not true: POSIX signal delivery to an userspace handler that was installed without SA_RESTART can interrupt even file reads/writes. Plus, it can happen with pipes and FIFOs and most sockets (except that with just created pipes and socket pairs have certain initial atomicity guarantees that depend on the operating system) at any point. It often happens with network sockets, but many programmers think it is because network sockets are somehow special: they are not. It is just that ordinary local files almost always behave nicely, and because us humans are stupid, we have started believing that just because they almost always behave nice, we can expect it to always behave nice. Which isn't true.

I have also shown my own preferred approach to linear algebra/matrix interfaces in C. It uses a simple matrix structure to represent any matrix or matrix view, and refers to reference-counted instance of a data owner object. The numerical data is stored in the data owner, and the data owner counts how many matrices refer to it. When the last matrix using it is destroyed, also the data owner is destroyed. Because matrix views do not differ in any way from "normal" matrices, you can have two matrices, say t and t_transposed, refer to the exact same data, but with the latter having the data already transposed. Any change in one, is immediately visible in the other. You can have views as individual rows or columns into another matrix, any diagonal, or any sub-matrix you want. It Just Works. And the code itself is trivial, immediately obvious from the data structures.

As long as you remember to destroy every matrix you create or obtain in any way, you don't need to worry about memory allocation at all.

Thus, management of dynamically allocated objects can be made very easy and robust as well.

Granted, if one were to use e.g. Boehm GC –– a garbage collection library –– then the programmers would not even need to remember to destroy the matrices they don't need anymore, but I suspect that is the sort of hand-holding attempt that turned PHP from a promising idea to a pile of crappy security hole opportunities. When one uses C, they need to understand that the tool they wield is powerful, and has no safety guards that keeps ones fingers and data intact if you make an awkward move. If you aren't careful enough, C will cut off your arms and legs; it will happily let you create nasal daemons all day long.
However, its low-level power is the other side of the danger coin. The fact that it does not try to stop the developer from doing something stupid/risky, is largely what makes it so powerful in the first place.

nctnico · « **Reply #11 on:** February 13, 2022, 11:51:28 pm »

Recently I spend the better part of a week driving around with valgrind to fix a whole bunch of leaks in a C program that I inherited and which was direly needed for a demo. So forgive me for not being a fan of malloc & free at the moment

Nominal Animal · « **Reply #12 on:** February 14, 2022, 12:38:55 am »

Quote from: nctnico on February 13, 2022, 11:51:28 pm

Recently I spend the better part of a week driving around with valgrind to fix a whole bunch of leaks in a C program that I inherited and which was direly needed for a demo. So forgive me for not being a fan of malloc & free at the moment

I don't do Perl because of having had to work with a horrible code base (even though I like the authors). So I know how you feel. However, it's not the language's fault that crappy code gets written in it; we humans can effortlessly crapify anything and everything.

westfw · « **Reply #13 on:** February 14, 2022, 07:27:19 am »

Quote

A better approach is to allocate data on the stack (if you need relatively small chunks of memory).

Pretty much ALL of the data that I want to dynamically allocate is MUCH too long-lived (and much too "shared") to put on the stack(s)Also, in many embedded systems, the stack is relatively limited in size, while the heap can consume all of free memory.(I suppose you can say that any form of multitasking is also error-prone, dangerous, and to be avoided, but that certainly seems to be widely violated.)

Siwastaja · « **Reply #14 on:** February 14, 2022, 07:34:03 am »

I have also written horrible C beyond salvation in a stressful situation with lack of proper resources. I made it super clear to everyone it needs total replacement. After I left, I felt sad to see an announcement that the company is looking for a C guru to salvage the 30000LoC project. This changed to joy after hearing no one took the job and the main perpetrator ended up in pretty long jailtime for unrelated businesses (ran with similar attitudes, though).

DiTBho · « **Reply #15 on:** February 14, 2022, 11:35:55 am »

Quote from: Nominal Animal on February 13, 2022, 08:32:29 pm

But it is more like people who do not bother to differentiate between which and witch ...

LOL, this is too hilarious expression, which not only it's absolutely true in C programming, but it's also a distill of *THE* problem

Quote from: Nominal Animal on February 13, 2022, 08:32:29 pm

Furthermore, there are even more complex things that a C developer needs to worry about

Yup. In my case, I "(re?)invented" the wheel(1) with "smart-pointers": an object-like-made struct with function pointers used as callback instead of a simple classic pointer.

This way when I pass smart-pointers along functions, each function can in turn directly interact with the pool in a coherent way, and the pool can do the same, so when you need to read or modify something pointed by a smart-pointer, if the pointed block is not in memory it will be loaded, while if it's already in memory it will be also flushed back to the disk.

This also implies intelligence in the pool allocator, in order to avoid to waste I/O cycles. That's why I also used callbacks.

In my B*tree implementation you have to deal with both disk blocks, memory, and an algorithm that believes it can address infinite blocks (from 0x0000.0001 to 0xffff.fffe, ~ (4G - 2) blocks ) while the pool can only store a few blocks in memory (currently only 4 blocks), so in each instruction the pointers could not be valid and you can't assume anything.

Probably it's easier in C++, but I don't know about. I know, in C ... I had to worry about memory<->disk coherency, and minimal I/O cycles.

(1) basically, it's what a "classic cache" system does, just embedded and hidden behind a C-typedef.

Miyuki · « **Reply #16 on:** February 14, 2022, 04:14:43 pm »

The main thing is you cannot have "real" dynamic memory on MCU or any processor without MMU
Sooner or later it will ever end up with fragmentation with no way to solve it
Blocks must have fixed locations or be every time created and released in a fixed order

Nominal Animal · « **Reply #17 on:** February 14, 2022, 06:05:23 pm »

Quote from: Miyuki on February 14, 2022, 04:14:43 pm

The main thing is you cannot have "real" dynamic memory on MCU or any processor without MMU
Sooner or later it will ever end up with fragmentation with no way to solve it
Blocks must have fixed locations or be every time created and released in a fixed order

Absolutely incorrect. Even (older versions of) the Linux kernel can run MMU-less. A memory management unit can provide virtual memory and/or limit access to regions of memory, but is in no way necessary for dynamic memory management. In particular, see uClibc-ng, a standard C library implementation for MMU-less processors.

Furthermore, indirect pointers can be used to avoid memory fragmentation. Here is an example implementation in C.

First, we have an indirect pointer structure:

Code: [Select]

struct iptr {
    uint32_t  offset;
    uint32_t  length;
    int16_t   used;
    int16_t   refs;
}

We could use a direct pointer instead of offset, but if we use an offset to the dynamic memory region, this can be used in fully featured C code running under and actual OS (where the region can shrink and grow, requesting more memory from the OS, and returning unneeded memory back to the OS), as well as on a microcontroller.

The length field is optional, and is useful if the indirect pointer is accessed via a helper function (say get_u32(id, offset, errorval)) to eliminate buffer under- and overruns; the errorval being returned if access is attempted outside the buffer (so that no "errors" can ever occur). The id identifies which struct iptr is being used, but more on that further below.

The refs counter is a simple reference counter, and starts at 1 when the memory is allocated. However, if these objects are used in a list or a tree or in other ways refer to each other, the used counter can be used to make sure that an allocation still in use is not freed before it is no longer used. A typical interface is get_iptr(id)/iptr_ref(id) to increment the counter, and put_iptr(id)/iptr_unref(id) to decrement the counter. There is no "free()" per se; whenever the refs counter drops to zero, the region will be freed.

The used counter is a simple lock, indicating the data itself is being used. A typical interface is iptr_use_begins(id) at the beginning of some operation that uses the dynamically allocated region, with a corresponding iptr_use_ends(id) when that access or loop completes.

The 'trick' is that when the used counter is zero, the data can be moved in memory, compacting the dynamically allocated memory area, so that the memory used by these allocations can be periodically defragmented. If all currently used struct iptr used counters are zero, then the compaction is a very efficient single copy loop over the data.

A practical C implementation (that manages the heap size dynamically by obtaining memory from the OS) might have

Code: [Select]

volatile unsigned char  *heap_ptr = NULL;  /* Beginning of dynamic allocation area */
volatile size_t          heap_len = 0;  /* Size of dynamic allocation area */

int           indirect_num = 0;
struct iptr  *indirect_ptr = NULL;

/* Compact dynamic allocations, defragmenting dynamic memory regions.
   Returns the size of the largest free fragment. */
size_t  imem_compact(void);

/* Allocate size * count bytes, returning the ID of the memory.
   Returns negative if not enough memory available.
   This function will call imem_compact() if necessary. */
int  imem_alloc(size_t size, size_t count);

/* Mark the allocated memory as used by something else.
    Before the memory gets actually freed, as many additional imem_discard()
    calls need to be made as the calls to this function. */
void  imem_obtain(int id);

/* Discard previously allocated memory. */
void  imem_discard(int id);

/* Begin an access to the allocated memory.
   Returns a direct pointer to the data, or 'impossible' if invalid.
   NOTE: Each begin must be paired with a corresponding end,
         and the interval between the two should be minimized. */
 */
static inline void *imem_use_begins(int id, void *impossible);
/* End of accesses via the pointer obtained via imem_use_begins(). */
static inline void  imem_use_ends(int id);

where the last two are rather simple,

Code: [Select]

static inline void *imem_use_begins(int id, void *impossible);
{
    /* Check that the ID number refers to a valid iptr. */
    if (id < 0 || id >= indirect_num)
        return impossible;

    /* Check that the iptr can be referenced. */
    if (indirect[id].refs <= 0)
        return impossible;

    indirect[id].used++;
    return heap_ptr + indirect[id].offset;
}

static inline void imem_use_ends(int id)
{
    if (id >= 0 && id < indirect_num && indirect[id].refs > 0 && indirect[id].used > 0)
        indirect[id].used--;
}

Additionally, you probably would implement helper functions depending on the actual data dynamically allocated, for example to copy and compare, or get and set individual values:

Code: [Select]

static inline bool  imem_copy_from_ptr(int dst_id, int dst_offset, int length, void *src);
static inline bool  imem_copy_to_ptr(void *dst, int length, int src_id, int src_offset);
static inline bool  imem_copy(int dst_id, int dst_offset, int length, int src_id, int src_offset);
static inline int  imem_cmp_ptr(int id, int offset, int length, void *ptr, int invalid);
static inline int  imem_cmp_imem(int id1, int offset1, int length, int id2, int offset2);

static inline uint32_t  imem_get32(int id, int offset, uint32_t invalid);
static inline uint16_t  imem_get16(int id, int offset, uint16_t invalid);
static inline float  imem_getfloat(int id, int offset, float invalid);
static inline bool  imem_set32(int id, int offset, uint32_t value);
static inline bool  imem_set16(int id, int offset, uint32_t value);
static inline bool  imem_setfloat(int id, int offset, float value);

On microcontrollers with external RAM that is not fully directly accessible to the processor –– very much like what DiTBho described, with on-disk structures! –– a similar interface can be used to access that external RAM via memory apertures or caches of some sort.

These are certainly more complex, and incur run-time access cost (because we cannot just use the allocated memory, but have to do bookkeeping around each access or set of accesses) compared to, using a statically allocated memory. However, this is definitely doable, and assuming the "user code" does not do any silly errors (like keep an area "in use" just because the programmer thinks they can shave off a few cycles of run time that way, or forgets to end use somewhere), is quite robust, and does not easily suffer any issues related to memory fragmentation.

Methods like this were heavily used on 8-bit and even 16-bit processors. At the hardware level, memory bank switching was often used. Even today, there are microcontrollers using bank switching; there, an unique pointer is actually a tuple, (bank, address), because the contents of the banked address range depends on the active bank. Note that this is not really "virtual memory", because we cannot control the addresses; only the data visible in the banked address range can be selected (from a small number of options, depending how many banks are mapped to the same region by the hardware). In many cases, the bank selection was actually just a standard serial-to-parallel latch, that controlled the highest address lines to the memory chips, as the processor or microcontroller itself did not have sufficient address lines to access all available memory. Sometimes, the bank switch only switched between ROM and RAM, by manipulating the chip enable pins on the ROM and RAM chips connected to the same address and data buses.

Personally, I have no problems in using dynamic memory management even on small microcontrollers like AVRs. (The Atmel ATmega32u4 is my favourite 8-bit microcontroller.) In my use case (using avr-gcc compiler and avr-libc C library), the heap (region of memory dynamic allocations happen from) and stack are a single shared region, with allocations preferring the low end, and stack growing downwards from the high address end. I have no problems with memory leaks; and the types of things I use memory allocation for, memory fragmentation is not an issue either.

What I do have trouble with, is determining whether at runtime the stack and heap collide/overlap. There are many methods to detect if such a collision has occurred (heuristically; no false positives, but with a small chance that a collision might be missed), for example via canary values, but at that point the operation is already unsafe and the device should be restarted... testing basically consists of using canary values or similar techniques, perhaps periodically measuring the difference between the top of the heap and bottom of the stack (to determine the amount of leeway between the two), and then trying to run the device through all possible scenarios. There are free and commercial simulators that can do static (just analyzing the code) and dynamic (running or simulating running the code) to stress test and find these things out, but they are architecture-specific, and the best ones mostly cost a lot of money.
Only in the simplest cases can I prove how much heap will be used, how fragmented it can be; how much stack will be needed in the worst case; and if the available RAM for both is larger than that, can I prove that the two will not collide.

In other words, this stuff is quite complicated and sensitive, the moreso the less RAM you have available, but not technically impossible.

When others (and I guess myself too!) say that dynamic memory management is not often used with microcontrollers, they do not really mean it is technically impossible; they mean it is not worth the effort. That if you go down that road without enough experience and skill, you will find very little success and a lot of frustration.

Yet, it is a skill that can be taught, much more so that say playing go or chess. I find it frustrating that so few C programmers are interested in such skills, and instead concentrate on being fast at writing code, because that's what employers pay for. (The frustration stems from the fact that the results are becoming less and less reliable, with things like CVE-2021-4043 being commonplace. It's a severe local privilege escalation (letting any local user on the machine get full superuser privileges), due to a very silly programming bug, that nobody caught in over a decade.)

Thus, advise that says it is not a good idea to use dynamic memory management on a microcontroller with very limited amount of RAM, stems from practical reasons and human stuff, and not technical details. You could say very concisely that "most often, it is not cost-effective to use dynamic memory management on a microcontroller in C". It does not mean it cannot be done, just that the effort needed to make it work is usually too much, compared to the rewards.

I have a bit of a dichotomy here.

On one hand, I want to encourage programmers to use dynamic memory management, and get good at it, because when you do get good at it, you can do even those complex things even in constrained environments, and achieve Very Nice Results. On the other hand, if you want programming to be your career, being reasonable is much more valuable. Things like continued maintenance, documentation, and even writing useful comments (describing developer intent and reasoning, and not what the code does)), algorithm analysis, algorithm implementation, general problem solving skills, and software design (modularity, software engineering, and so on); are much more valuable (in terms of employment and wages) than skills in a detail like dynamic memory management.

So, when someone like Dadu@ asks about this, I want to both show how and why, but also show why you might NOT want to do it.

The end result? Walls of text with interspersed code snippets like this post, that nobody has the time or will to read. Apologies.

emece67 · « **Reply #18 on:** February 14, 2022, 06:25:39 pm »

DiTBho · « **Reply #19 on:** February 14, 2022, 07:58:43 pm »

Quote from: Nominal Animal on February 14, 2022, 06:05:23 pm

Absolutely incorrect. Even (older versions of) the Linux kernel can run MMU-less.

yup, for example the Lineo uCsimm/uc68ez328 board (2001-2003) doesn't have any MMU under the hood but runs the kernel Linux 2.2 patched with uclib-linux-ng (flat binary) stuff.

Arcturus Networks had something similar.

DiTBho · « **Reply #20 on:** February 14, 2022, 08:11:13 pm »

Quote from: Nominal Animal on February 14, 2022, 06:05:23 pm

On microcontrollers with external RAM that is not fully directly accessible to the processor –– very much like what DiTBho described, with on-disk structures! –– a similar interface can be used to access that external RAM via memory apertures or caches of some sort.

Not a coincidence

Vladimir said he has a stack of kit to dispose of this year to downsize a lifetime's collection and to free up space, so he sent me a Lineo card (in exchange for a a couple of good bottles of vodka), an he was so kind to also include a CD with the original support provided by Lineo back in 2001.

That code inspired my solution.

NorthGuy · « **Reply #21 on:** February 14, 2022, 09:44:11 pm »

I don't think the problem with heap allocation is that you need to free what you allocate. Certainly, if you don't allocate something, or if you free too early, or if you free twice, the effect may be devastating and unpredictable. But this is rather easy to detect and fix.

One problem is fragmentation. The heap may fragment, in which case the memory may become unusable for further allocations. With arbitrary size of the allocations, and arbitrary lifespan, it may be difficult to predict how bad the fragmentation may be, and therefore difficult to asses how much memory you need.

Another big problem is that an allocation may fail. In many cases, it is very difficult to figure out what to do if allocation fails. If you want to do something sensible for every failed allocation, you may need to write lots of code with this sole purpose, and it will take long time to test all the possible allocation failures.

On PCs, where you never will face fragmentation (as the memory may be remapped by the OS), and your allocations will never fail unless you have a memory leak (as the memory is abundant), using heap is easy and an average program will use lots of dynamic allocations.

On a small MCU though, these two problems are very real. It may be worth avoiding heap allocations alltogether. There are other forms of dynamic allocations, such as stacks(LIFO) or queues(FIFO) which you can use to allocate objects of varying sizes (although C doesn't have any buit-in support for that aside of the call stack). If your allocation pattern allows using a stack or a queue instead of a heap, it may be worth considering. These things will never fragment, although you still can run out of memory. Of course, static allocation is the most reliable approach.

nctnico · « **Reply #22 on:** February 14, 2022, 10:08:40 pm »

In addition to what NorthGuy wrote: at some point you will want / have to provide proof that your implementation has a predictable behaviour. When sharing memory with dynamic allocation between several tasks, this is almost impossible to do. With fixed blocks of memory using pooling with fixed element sized or FIFO buffers, the proof is much easier to provide.

Take LWIP for example: with LWIP internal memory pooling, you can predict the amount of memory it needs -worst case- per connection precisely. This means that you can set clear boundaries for when the system behaves as expected and at what point it will break.

dietert1 · « **Reply #23 on:** February 14, 2022, 11:18:00 pm »

Maybe one term to mention is "handle". I met that first learning the original Macintosh OS on one of the shoebox Macs. They used another layer of indirection to keep most of the memory blocks moveable in order to facilitate heap defragmentation. I think it existed in MS Windows, too, but was later absorbed into hardware. Don't know whether that method still exists in one of the free operating systems for MCUs.

Once i managed to load a 2 MB text file into the 4 MB RAM of the MAC after several days of try and error. Don't remember what it was meant to do there.

Regards, Dieter

dare · « **Reply #24 on:** February 15, 2022, 12:46:25 am »

Quote

On PCs, where you never will face fragmentation (as the memory may be remapped by the OS), and your allocations will never fail unless you have a memory leak (as the memory is abundant), using heap is easy and an average program will use lots of dynamic allocations.

This isn't really true. The current discussion is around the use of application-level heap allocators such as malloc/free, not OS page allocators. In this context, desktop and server applications can absolutely experience fragmentation. And when this fragmentation gets bad enough the result can indeed be a failure of the application (which I've seen a number of times in my career). The existence of an MMU and the OS's ability to do things like paging doesn't change this.

It is more correct to say that, in general, the effects of fragmentation are easier to ignore in desktop/server applications because most of these applications run on systems where the total amount of memory is significantly larger than the application's peak memory requirements. As others have said, once you get into small embedded environments, where total memory is much more constrained, this relationship doesn't hold and heap fragmentation can become a major issue.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: What is the use of dynamic memory allocation in C language (Read 13483 times)

Share me