Not freeing dynamically allocated memory keeps that memory assigned for your current process. Many operating systems allow admins to set policies to limit the amount of memory allowed per process (this is
ulimit -m in Linux, Unix, and BSD's), and if you exceed that, allocations may fail.
If you have a Linux, BSD, or Mac OS computer (it might even work under WSL2 in Windows), here is a related, illustrative POSIX C example you might consider:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
int main(void)
{
char *line = NULL;
size_t size = 0;
ssize_t len;
unsigned long linenum = 0;
while (1) {
// Read new line to the dynamically resized line buffer
len = getline(&line, &size, stdin);
// Break out of the while loop when getline() returns -1
if (len == -1)
break;
// Count lines.
linenum++;
}
// Since line buffer is no longer needed, we can clear it.
free(line);
line = NULL; // Optional, but resets the variables to their original state,
size = 0; // allowing safe reuse.
// Check for errors. If feof() reports not-end-of-file, we ran out of memory.
if (ferror(stdin) || !feof(stdin)) {
fprintf(stderr, "Error reading from standard input.\n");
return EXIT_FAILURE;
}
printf("Read %lu lines.\n", linenum);
return EXIT_SUCCESS;
}
If you run it on the command line, pressing Ctrl+D at the beginning of a new line ends the input.
Internally,
getline() uses the pointer it has if the size pointed to is sufficient. Otherwise it will call
realloc() (or equivalent internal magic) to resize the pointer to large enough to hold the entire line including a terminating nul (
'\0') character, updating both the pointer and the size. It returns the number of characters read, including the newline, or -1 if end of input occurs or it cannot allocate enough memory for the line. This means it has no line length limitation, and can handle input with embedded nuls (end-of-string characters) just fine.
Do we need the
free(line);? Because the process is about to exit, it is not really needed, and technically just does extra work. If we did more work afterwards, it would make sense. However, it is the nice thing to do, and if you always do it, you can use tools like Valgrind to detect memory leaks.
What happens if there is no input at all? Nothing bad, because
free(NULL); is
explicitly safe to do, and does nothing (see
man 3 free).
If you add
fputs(line, stdout) to near where the line count is incremented, it will print each input line as-is except end at first embedded nul if any;
fwrite(line, 1, len, stdout) will print each input line as-is (and return
len if successful: it too may fail, for example because the output was to a pipe and the pipe closes early due to the reader exiting).
However, you can also discard the entire line buffer there (
free(line); line=NULL; size=0;), and nothing bad will happen. On the next call,
getline() will see the zero size, and just allocate a new one. Nice!
Let's say you are looking for a line that matches a specific pattern (POSIX regular expression via
<regex.h> (
regcomp(),
regexec(),
regfree(); see POSIX Basic Regular Expressions
here or
here), a directory glob pattern via
<fnmatch.h>, a fixed substring via
strstr(), or an exact full line match via
strcmp(). If you wanted a copy of that, you cannot just remember the value of
line, because it always points to the buffer. (It may change if the buffer is resized, though.) You would create a new dynamically allocated
copy of it via e.g.
strdup() or
malloc()+
memcpy().
Let's say we want to
tokenize each line into whitespace-separated substrings (or words):
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
int main(void)
{
char *line = NULL;
size_t size = 0;
ssize_t len;
char **token = NULL;
size_t maxtokens = 0;
unsigned long linenum = 0;
while (1) {
// Read new line to the dynamically resized line buffer
len = getline(&line, &size, stdin);
// Break out of the while loop when getline() returns -1
if (len == -1)
break;
// Count lines.
linenum++;
// Split line into tokens.
size_t tokens = 0;
char *saveptr, *next;
next = strtok_r(line, "\t\n\v\f\r ", &saveptr);
while (next) {
// Make sure token array has room for another token and a terminating NULL.
if (tokens + 1 > maxtokens) {
size_t new_maxtokens = (tokens | 7) + 5; // Growth policy!
char **new_token = realloc(token, new_maxtokens * sizeof token[0]);
if (!new_token) {
fprintf(stderr, "Out of memory.\n");
exit(EXIT_FAILURE);
}
token = new_token;
maxtokens = new_maxtokens;
}
token[tokens++] = next;
token[tokens] = NULL; // Just so there is always a NULL pointer after the valid tokens
next = strtok_r(NULL, "\t\n\v\f\r ", &saveptr);
}
// We can access token[i] here, as long as i >= 0 and i < tokens.
// Note that tokens may be zero, in which case token may even be NULL!
printf("Line %lu: %zu words:", linenum, tokens);
for (size_t i = 0; i < tokens; i++)
printf(" token[%zu] = \"%s\"\n", i, token[i]);
}
// Token pointer array is no longer needed, so we discard it.
free(token);
token = NULL;
maxtokens = 0;
// Since line buffer is no longer needed, we can clear it.
free(line);
line = NULL; // Optional, but resets the variables to their original state,
size = 0; // allowing safe reuse.
// Check for errors. If feof() reports not-end-of-file, we ran out of memory.
if (ferror(stdin) || !feof(stdin)) {
fprintf(stderr, "Error reading from standard input.\n");
return EXIT_FAILURE;
}
printf("Read %lu lines.\n", linenum);
return EXIT_SUCCESS;
}
Now, the possible pointers in
token will point to the current line buffer, so they're only valid as long as
line is valid.
strtok_r() modifies the line buffer contents, removing any ASCII whitespace characters. There is no fixed limit to the number of tokens/words on each line, because we grow the
token array as needed; we can discuss what kind of
growth policy one would use here. Because off-by-one errors are common (and because
argv[] has it too), we make sure that when there is at least one token/word pointer, the next one is always NULL.
(This one uses linear growth policy. The currently needed token count is rounded up to next multiple of eight, plus four. Thus, the size sequence is 12, 20, 28, 36, and so on. I chose this because the typical number of words per line is small. In general, exponential growth policies are more effective. Here, any policy that makes the new count at least 2 larger than the current number of tokens will work, even simple
maxtokens+2. Reallocations are relatively slow, but we don't want to waste too much memory at run time, so growth policies implement the balance between reallocation work and reserving memory whether used or not.)
Again, to save one token, you need to copy it. To save multiple tokens, you need to allocate space for the character data (including string-terminating nul characters) as well as the pointers. If you just copy the pointers from
token, you get copies of pointers pointing to where the line buffer was at one point in time. Even if you take a copy of the line buffer, you need to adjust your copied pointers to point to the new buffer. A proper way to do that would be something like
typedef struct {
size_t count; // Number of tokens in this buffer
char *token[]; // Token pointers (token[count] == NULL)
} tokenbuf;
tokenbuf *tokenbuf_create(ssize_t n, char **t)
{
// n must be positive, and t non-NULL.
if (!t || n < 1) {
errno = EINVAL;
return NULL;
}
// Since n is of ssize_t type, but we know it is positive,
// we can use (size_t)n to 'cast' it to the normal size_t type.
// Find out the size needed for the character data in the tokens.
size_t ntoks = 0;
size_t chars = 0;
for (size_t i = 0; i < (size_t)n; i++) {
// Do not count NULL tokens.
if (t[i]) {
chars += strlen(t[i]) + 1; // Include end-of-string NUL byte
ntoks ++;
}
}
// If there are no tokens to copy, return NULL.
if (!ntoks) {
errno = 0; // It is not an error..
return NULL;
}
// Allocate sufficient area for the structure, including the pointers and the string data.
tokenbuf *tb = malloc(sizeof (tokenbuf) + (ntoks + 1)*(sizeof tb->token[0]) + chars);
if (!tb) {
errno = ENOMEM;
return NULL;
}
// Locate the start of the character data for the first token,
char *next = (char *)(tb->token + ntoks + 1); // = (char *)tb + sizeof (tokenbuf) + (ntoks + 1)*sizeof (char *)
// and copy each token and set the pointer.
size_t ti = 0;
for (size_t i = 0; i < (size_t)n; i++) {
if (t[i]) {
size_t tlen = strlen(t[i]); // Number of chars in token (excluding \0)
tb->token[ti] = next; // Set pointer to where the copy will be at
memcpy(next, t[i], tlen + 1); // Copy the chars plus the trailing \0
next += tlen + 1;
ti++;
}
}
tb->token[ti] = NULL;
tb->count = ti;
// A careful sanity check follows.
if (ti != ntoks || next != (char *)(tb->token + ntoks + 1) + chars) {
// Something changed between when we scanned the tokens and when we copied them.
free(tb);
errno = EINTR; // "interrupted", yes, but not by a signal... Good enough description.
return NULL;
}
return tb;
}
which you can also use to copy command-line arguments, say
tokenbuf *args = tokenbuf_create(argc - 1, argv + 1); (assuming you declared
int main(int argc, char *argv[])); or in the above program calling
tokenbuf *args = tokenbuf_create(tokens, token); whenever
token > 0.
Then,
args->count is the number of tokens in the buffer,
args->token[i] for
i >= 0 and
i < args->count, will point to each token in it. To discard both the pointers and the data they point to, just call
free(args), since the
tokenbuf_create() allocates everything in one linear memory chunk.
The careful check at the end verifies that the token data did not change between scanning and copying. It does not make it safe to concurrently modify the line buffer or token pointers in another thread; it only makes it easier to detect
if such modification were ever to happen. Since we have the information whether everything went as expected, I think it is sensible to let the caller know!
Within the code, the expression
tb->token is a
char **, i.e. it is a pointer to pointer to char, approximately equivalent to an array of pointers to char, or an array of strings.
sizeof tb->token[0] is an expression that tells the size (in chars) needed for each element in the
tb->token array –– which is
sizeof (char *) ––, and is safe to do even if
tb is NULL or undefined, because
sizeof is a special keyword which only looks at the
type of the expression;
tb is not actually dereferenced or any memory accessed here. The very reason I use
sizeof expression and not
sizeof(expression) is exactly because it is a special keyword and does NOT behave like a function. Even
sizeof i++ does not actually increment
i! If I used parentheses, I might accidentally see it as a function expression, and forget this special behaviour, and that can lead to bugs.
The expression
(tb->token + ntoks + 1) is exactly the same as
&(tb->token[ntoks + 1]). I read it as
a pointer to where the pointer at index ntoks+1 is in the tb->token array, because as I above explained,
tb->token is an array of pointers. Note the 'is': it is not 'points to'. It is the location in memory where the pointer would be stored at, not where that pointer would point to. When it is cast to
char *, it becomes the pointer just after the token pointers (including the final NULL pointer), and is the start of where the pointed-to character data is stored at.
Reading and understanding pointer expressions correctly is very, very important in C, so I suggest you spend some extra effort with real-life code.