Author Topic: Impact of memory leak in c code (Read 717 times)

Kittu20 · « **on:** January 26, 2024, 01:26:36 pm »

In my understand if we don't release allocated memory and set pointer to other location memory leak will happens. In my code The pointer ptr is allocated memory using malloc, but this memory is not released using free. Instead, the pointer is reassigned to NULL without freeing the allocated memory

Code: [Select]

#include<stdio.h>
#include<stdlib.h>
 
int main()
{
   int *ptr = malloc(sizeof(*ptr));
   
   if ( ptr != NULL )
   {
       *ptr = 1;
   }	   
   
   //free(ptr); // forgot to release allocated memory 
   
   ptr = NULL; // Pointer doesn't point valid memory
   
    return 0;
}

but need more clarification on it. Could you elaborate what adverse effects might occur if a memory leak happen in this code?

AndriusBurokas · « **Reply #1 on:** January 26, 2024, 01:38:57 pm »

Here you won't 'feel' any adverse effects, as the operating system will clear everything allocated by the program when it exits.

But if long running program cyclically would allocate blocks of memory without releasing them, then you would eventually run out of physical and/or virtual memory and the program will stall/crash, os would freeze... - mostly depends on environments.

Veteran68 · « **Reply #2 on:** January 26, 2024, 01:48:10 pm »

Freeing allocated memory is not the same as clearing or reinitializing memory. It just removes the reference to that block from the allocation table and makes it available for re-allocation. If you truly want to zero out memory that you're done using (for security reasons or whatever), you need to do that yourself before you free it.

No different than deleting files from your hard drive. The standard OS delete command doesn't actually wipe the data, it just marks the file/folder entry in the file allocation table as now free to be overwritten by a future allocation. Which means the deleted file can still be read/recovered until something else comes along and overwrites it.

Also, allocated memory is tied to the running process, and is freed when the process exits whether you explicitly free it or not.

Jeroen3 · « **Reply #3 on:** January 26, 2024, 02:00:16 pm »

With malloc you obtain a lease on some size of memory. It's then yours to use, until you exit or free().
Destroying your copy of the lease (ptr) does not destroy the lease. This memory is now lost and cannot be re-used by you or others until your application exits.

Do this too much and there won't be any unclaimed memory left.

dobsonr741 · « **Reply #4 on:** January 26, 2024, 03:02:34 pm »

The example with main() is not the right way to illustrate the problem, as it runs only once.

When malloc is not released it will cause issue when called from loops, and as pointed out. At one point malloc will return null, when run out of allocative memory.

Then the rest of the code will think the pointer points to a valid block, and start manipulating it. The zero memory location (by the null pointer) is home to many important things, like interrupt vectors and alike, and overwriting it will crash the system. The example is for simple embedded systems, but carries the point.

shapirus · « **Reply #5 on:** January 26, 2024, 03:16:12 pm »

Quote from: Veteran68 on January 26, 2024, 01:48:10 pm

Also, allocated memory is tied to the running process, and is freed when the process exits whether you explicitly free it or not.

It must be pointed out, however, that the programmer should not rely on this behavior and take care of proper memory management anyway to make sure that all allocated memory is free()'d before the program exits. One reason for this is that one should write quality code and employ best practices, another is immediately practical: when you use a memory allocation debugger, it will report all allocations that weren't freed, so you will want it to list only those you missed by genuine mistake.

golden_labels · « **Reply #6 on:** January 26, 2024, 06:17:02 pm »

As often, it’s worth untangling a few related, but separate subjects.

There is the language with its abstract concepts,⁽¹⁾ and there is an actual implementation.

In C, the language alone, the thing is not even discussed. The lifetime of an object “extends from the allocation until the deallocation”, but neither is there a requirement to end its life nor it’s said that it must be done by free. It’s not even required to perform any action, or to make the allocation in a manner requiring explicit freeing. Memory leaks are in the undefined behavior territory: both the mechanism and the consequences.

The implicit outcome is, however, that you can’t rely on the language to perform deallocation. So, unless you wish to rely on platform-specific behavior, in practice you must explicitly call free exactly once for each object allocated through malloc, calloc or realloc.

In actual implementations the thing is platform-specific. In any modern multi-tasking system you may expect that memory pages provided to the process are going to be reclaimed as soon as the process ends. For this reason your code will have no impact on the operating system.⁽²⁾ The situation becomes complicated for running programs. As long as the process is running, memory pages belong to the process. If the allocator runs out of space for next malloc call, it will just ask system for more. Over and over again. This may lead to the dangerous form of a memory leak: the process slowly accumulates garbage over time or it enters a runaway memory leak state. In both situations the process will hoard more and more memory pages from the system. This is first leading to decreased performance, then to some processes in the system dying. The second form, due to how fast memory is exhausted, wreaks havoc in the system.

In either case this is a part of a wider concept: resource management. It’s not only memory, but also other resources: files, file descriptors/handlers, sockets, network connections, GUI elements, sessions in other services — to name a few. You must care about them too. Memory is not special — it’s just one of the facets of the bigger problem. In C it receives huge attention, but only because the language exposes you constantly to the risk and requires to do this tedious task. I find approaching it just like any other resource giving least headaches.

⁽¹⁾ Further complicated by relying on existing vocabulary and concepts, which does not exactly match the meaning intended.
⁽²⁾ If that’s a Linux-based system with default configuration, the other reason is that malloc likely didn’t take any memory from the system.

Veteran68 · « **Reply #7 on:** January 26, 2024, 09:37:38 pm »

Quote from: shapirus on January 26, 2024, 03:16:12 pm

Quote from: Veteran68 on January 26, 2024, 01:48:10 pm
Also, allocated memory is tied to the running process, and is freed when the process exits whether you explicitly free it or not.
It must be pointed out, however, that the programmer should not rely on this behavior and take care of proper memory management anyway to make sure that all allocated memory is free()'d before the program exits. One reason for this is that one should write quality code and employ best practices, another is immediately practical: when you use a memory allocation debugger, it will report all allocations that weren't freed, so you will want it to list only those you missed by genuine mistake.

Of course! I should have clarified I guess. Programmers should always follow the pattern of one deallocation for every allocation regardless of where it occurs. Best practices are best practices for a reason, you practice them all the time until they become muscle memory, then you are less likely to be surprised later.

DiTBho · « **Reply #8 on:** January 27, 2024, 09:39:44 am »

Quote from: Veteran68 on January 26, 2024, 01:48:10 pm

The standard OS delete command doesn't actually wipe the data, it just marks the file/folder entry in the file allocation table as now free to be overwritten by a future allocation

Yup, when I implemented my own filesystem I had to create a special feature to assure every physical disk block of 512 byte, belonging to both file content and metadata, get actually wiped out - means replaced by 0x00 - on deletion.

I copied the idea of this feature from MacOS v10.4 as the Finder shows "safe file delete" in its bar menu (a bit hidden, but it's there)

Usually, operating systems don't even implement anything remotely similar, probably because it's very slow.

Nominal Animal · « **Reply #9 on:** January 27, 2024, 11:49:07 am »

Not freeing dynamically allocated memory keeps that memory assigned for your current process. Many operating systems allow admins to set policies to limit the amount of memory allowed per process (this is ulimit -m in Linux, Unix, and BSD's), and if you exceed that, allocations may fail.

If you have a Linux, BSD, or Mac OS computer (it might even work under WSL2 in Windows), here is a related, illustrative POSIX C example you might consider:

Code: [Select]

#define _POSIX_C_SOURCE  200809L
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

int main(void)
{
    char   *line = NULL;
    size_t  size = 0;
    ssize_t len;

    unsigned long  linenum = 0;

    while (1) {
        // Read new line to the dynamically resized line buffer
        len = getline(&line, &size, stdin);

        // Break out of the while loop when getline() returns -1
        if (len == -1)
            break;

        // Count lines.
        linenum++;
    }

    // Since line buffer is no longer needed, we can clear it.
    free(line);
    line = NULL;    // Optional, but resets the variables to their original state,
    size = 0;       // allowing safe reuse.

    // Check for errors.  If feof() reports not-end-of-file, we ran out of memory.
    if (ferror(stdin) || !feof(stdin)) {
        fprintf(stderr, "Error reading from standard input.\n");
        return EXIT_FAILURE;
    }

    printf("Read %lu lines.\n", linenum);

    return EXIT_SUCCESS;
}

If you run it on the command line, pressing Ctrl+D at the beginning of a new line ends the input.

Internally, getline() uses the pointer it has if the size pointed to is sufficient. Otherwise it will call realloc() (or equivalent internal magic) to resize the pointer to large enough to hold the entire line including a terminating nul ('\0') character, updating both the pointer and the size. It returns the number of characters read, including the newline, or -1 if end of input occurs or it cannot allocate enough memory for the line. This means it has no line length limitation, and can handle input with embedded nuls (end-of-string characters) just fine.

Do we need the free(line);? Because the process is about to exit, it is not really needed, and technically just does extra work. If we did more work afterwards, it would make sense. However, it is the nice thing to do, and if you always do it, you can use tools like Valgrind to detect memory leaks.

What happens if there is no input at all? Nothing bad, because free(NULL); is explicitly safe to do, and does nothing (see man 3 free).

If you add fputs(line, stdout) to near where the line count is incremented, it will print each input line as-is except end at first embedded nul if any; fwrite(line, 1, len, stdout) will print each input line as-is (and return len if successful: it too may fail, for example because the output was to a pipe and the pipe closes early due to the reader exiting).

However, you can also discard the entire line buffer there (free(line); line=NULL; size=0;), and nothing bad will happen. On the next call, getline() will see the zero size, and just allocate a new one. Nice!

Let's say you are looking for a line that matches a specific pattern (POSIX regular expression via <regex.h> (regcomp(), regexec(), regfree(); see POSIX Basic Regular Expressions here or here), a directory glob pattern via <fnmatch.h>, a fixed substring via strstr(), or an exact full line match via strcmp(). If you wanted a copy of that, you cannot just remember the value of line, because it always points to the buffer. (It may change if the buffer is resized, though.) You would create a new dynamically allocated copy of it via e.g. strdup() or malloc()+memcpy().

Let's say we want to tokenize each line into whitespace-separated substrings (or words):

Code: [Select]

#define _POSIX_C_SOURCE  200809L
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

int main(void)
{
    char   *line = NULL;
    size_t  size = 0;
    ssize_t len;

    char  **token = NULL;
    size_t  maxtokens = 0;

    unsigned long  linenum = 0;

    while (1) {
        // Read new line to the dynamically resized line buffer
        len = getline(&line, &size, stdin);

        // Break out of the while loop when getline() returns -1
        if (len == -1)
            break;

        // Count lines.
        linenum++;

        // Split line into tokens.
        size_t  tokens = 0;
        char   *saveptr, *next;
        next = strtok_r(line, "\t\n\v\f\r ", &saveptr);
        while (next) {

            // Make sure token array has room for another token and a terminating NULL.
            if (tokens + 1 > maxtokens) {
                size_t  new_maxtokens = (tokens | 7) + 5;   // Growth policy!
                char  **new_token = realloc(token, new_maxtokens * sizeof token[0]);
                if (!new_token) {
                    fprintf(stderr, "Out of memory.\n");
                    exit(EXIT_FAILURE);
                }
                token = new_token;
                maxtokens = new_maxtokens;
            }

            token[tokens++] = next;
            token[tokens] = NULL;   // Just so there is always a NULL pointer after the valid tokens

            next = strtok_r(NULL, "\t\n\v\f\r ", &saveptr);
        }

        // We can access token[i] here, as long as i >= 0 and i < tokens.
        // Note that tokens may be zero, in which case token may even be NULL!

        printf("Line %lu: %zu words:", linenum, tokens);
        for (size_t i = 0; i < tokens; i++)
            printf("    token[%zu] = \"%s\"\n", i, token[i]);
    }

    // Token pointer array is no longer needed, so we discard it.
    free(token);
    token = NULL;
    maxtokens = 0;

    // Since line buffer is no longer needed, we can clear it.
    free(line);
    line = NULL;    // Optional, but resets the variables to their original state,
    size = 0;       // allowing safe reuse.

    // Check for errors.  If feof() reports not-end-of-file, we ran out of memory.
    if (ferror(stdin) || !feof(stdin)) {
        fprintf(stderr, "Error reading from standard input.\n");
        return EXIT_FAILURE;
    }

    printf("Read %lu lines.\n", linenum);

    return EXIT_SUCCESS;
}

Now, the possible pointers in token will point to the current line buffer, so they're only valid as long as line is valid. strtok_r() modifies the line buffer contents, removing any ASCII whitespace characters. There is no fixed limit to the number of tokens/words on each line, because we grow the token array as needed; we can discuss what kind of growth policy one would use here. Because off-by-one errors are common (and because argv[] has it too), we make sure that when there is at least one token/word pointer, the next one is always NULL.

(This one uses linear growth policy. The currently needed token count is rounded up to next multiple of eight, plus four. Thus, the size sequence is 12, 20, 28, 36, and so on. I chose this because the typical number of words per line is small. In general, exponential growth policies are more effective. Here, any policy that makes the new count at least 2 larger than the current number of tokens will work, even simple maxtokens+2. Reallocations are relatively slow, but we don't want to waste too much memory at run time, so growth policies implement the balance between reallocation work and reserving memory whether used or not.)

Again, to save one token, you need to copy it. To save multiple tokens, you need to allocate space for the character data (including string-terminating nul characters) as well as the pointers. If you just copy the pointers from token, you get copies of pointers pointing to where the line buffer was at one point in time. Even if you take a copy of the line buffer, you need to adjust your copied pointers to point to the new buffer. A proper way to do that would be something like

Code: [Select]

typedef struct {
    size_t  count;      // Number of tokens in this buffer
    char   *token[];    // Token pointers (token[count] == NULL)
} tokenbuf;

tokenbuf *tokenbuf_create(ssize_t n, char **t)
{
    // n must be positive, and t non-NULL.
    if (!t || n < 1) {
        errno = EINVAL;
        return NULL;
    }

    // Since n is of ssize_t type, but we know it is positive,
    // we can use  (size_t)n  to 'cast' it to the normal size_t type.

    // Find out the size needed for the character data in the tokens.
    size_t  ntoks = 0;
    size_t  chars = 0;
    for (size_t i = 0; i < (size_t)n; i++) {
        // Do not count NULL tokens.
        if (t[i]) {
            chars += strlen(t[i]) + 1;  // Include end-of-string NUL byte
            ntoks ++;
        }
    }

    // If there are no tokens to copy, return NULL.
    if (!ntoks) {
        errno = 0;  // It is not an error..
        return NULL;
    }

    // Allocate sufficient area for the structure, including the pointers and the string data.
    tokenbuf *tb = malloc(sizeof (tokenbuf) + (ntoks + 1)*(sizeof tb->token[0]) + chars);
    if (!tb) {
        errno = ENOMEM;
        return NULL;
    }

    // Locate the start of the character data for the first token,
    char *next = (char *)(tb->token + ntoks + 1); // = (char *)tb + sizeof (tokenbuf) + (ntoks + 1)*sizeof (char *)

    // and copy each token and set the pointer.
    size_t  ti = 0;
    for (size_t i = 0; i < (size_t)n; i++) {
        if (t[i]) {
            size_t  tlen = strlen(t[i]);    // Number of chars in token (excluding \0)
            tb->token[ti] = next;           // Set pointer to where the copy will be at
            memcpy(next, t[i], tlen + 1);   // Copy the chars plus the trailing \0
            next += tlen + 1;
            ti++;
        }
    }
    tb->token[ti] = NULL;
    tb->count = ti;

    // A careful sanity check follows.
    if (ti != ntoks || next != (char *)(tb->token + ntoks + 1) + chars) {
        // Something changed between when we scanned the tokens and when we copied them.
        free(tb);
        errno = EINTR;  // "interrupted", yes, but not by a signal... Good enough description.
        return NULL;
    }

    return tb;
}

which you can also use to copy command-line arguments, say tokenbuf *args = tokenbuf_create(argc - 1, argv + 1); (assuming you declared int main(int argc, char *argv[])); or in the above program calling tokenbuf *args = tokenbuf_create(tokens, token); whenever token > 0.

Then, args->count is the number of tokens in the buffer, args->token[i] for i >= 0 and i < args->count, will point to each token in it. To discard both the pointers and the data they point to, just call free(args), since the tokenbuf_create() allocates everything in one linear memory chunk.

The careful check at the end verifies that the token data did not change between scanning and copying. It does not make it safe to concurrently modify the line buffer or token pointers in another thread; it only makes it easier to detect if such modification were ever to happen. Since we have the information whether everything went as expected, I think it is sensible to let the caller know!

Within the code, the expression tb->token is a char **, i.e. it is a pointer to pointer to char, approximately equivalent to an array of pointers to char, or an array of strings. sizeof tb->token[0] is an expression that tells the size (in chars) needed for each element in the tb->token array –– which is sizeof (char *) ––, and is safe to do even if tb is NULL or undefined, because sizeof is a special keyword which only looks at the type of the expression; tb is not actually dereferenced or any memory accessed here. The very reason I use sizeof expression and not sizeof(expression) is exactly because it is a special keyword and does NOT behave like a function. Even sizeof i++ does not actually increment i! If I used parentheses, I might accidentally see it as a function expression, and forget this special behaviour, and that can lead to bugs.

The expression (tb->token + ntoks + 1) is exactly the same as &(tb->token[ntoks + 1]). I read it as a pointer to where the pointer at index ntoks+1 is in the tb->token array, because as I above explained, tb->token is an array of pointers. Note the 'is': it is not 'points to'. It is the location in memory where the pointer would be stored at, not where that pointer would point to. When it is cast to char *, it becomes the pointer just after the token pointers (including the final NULL pointer), and is the start of where the pointed-to character data is stored at.

Reading and understanding pointer expressions correctly is very, very important in C, so I suggest you spend some extra effort with real-life code.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Impact of memory leak in c code (Read 717 times)

Kittu20

Impact of memory leak in c code

AndriusBurokas

Re: Impact of memory leak in c code

Veteran68

Re: Impact of memory leak in c code

Jeroen3

Re: Impact of memory leak in c code

dobsonr741

Re: Impact of memory leak in c code

shapirus

Re: Impact of memory leak in c code

golden_labels

Re: Impact of memory leak in c code

Veteran68

Re: Impact of memory leak in c code

DiTBho

Re: Impact of memory leak in c code

Nominal Animal

Re: Impact of memory leak in c code

Share me