Computing > Programming

How to change a buffer from main RAM (DATA or BSS) to the heap?

<< < (4/5) > >>

DavidAlfa:
Holy sh**. I will need some time to process this :-DD
What about this?

--- Code: ---  *(ucHeap+0)=0;                               // Write 0 to first heap byte
  *(ucHeap+configTOTAL_HEAP_SIZE-1)=0xFF;      // Write 0xFF to last heap byte

--- End code ---

newbrain:

--- Quote from: Nominal Animal on May 31, 2021, 11:29:46 pm ---Our two answers are in complete agreement, just look at it from different perspectives.  (Assume any differences are my wording errors; me often fail English.)

--- End quote ---
Of course they are (I never find anything to object with your code): unless someone is using implementation defined or unspecified behaviour (neither is the case here) the standard will tell you exactly what to expect, and for I.D.B. it also describes the alternatives/ranges.


--- Quote from: Nominal Animal, edits by me ---I honestly just don't perceive those problems [with pointers and pointer dereferencing] at all anymore, and don't remember if I ever did (nor how);

--- End quote ---
Same here - I had to unlearn bad C picked up from bad books (might Schildt experience nasal demons in eternity) - my cure was reading (and, I hope, understanding) the actual C standard(s), but I used to have more problems with Pascal pointers!

Ah, slicing in Python: love it, it reminds me of when I dabbed in APL. But I always need more than one try to get it right  :-[

newbrain:

--- Quote from: DavidAlfa on June 01, 2021, 06:50:58 am ---What about this?

--- Code: ---  *(ucHeap+0)=0;                               // Write 0 to first heap byte
  *(ucHeap+configTOTAL_HEAP_SIZE-1)=0xFF;      // Write 0xFF to last heap byte

--- End code ---

--- End quote ---
DavidAlfa, I think I see the fundamental problem with your understanding.
As I said in my first post, eschewing the use of pointer to arrays is in general a good thing, as their behaviour hides some trap for players of all ages.

I'll pick this apart again, hoping I make myself clearer, using the simpler expressions above.

What is ucHeap?
Its definition tell us that ucHeap is a pointer to an object of ucHeap_t type.

What is an object of ucHeap_t type?
The typedef tells us an ucHeap_t object is an array of configTOTAL_HEAP_SIZE elements of type uint8_t.

What's the size of such object?
Easy, you correctly used in the malloc() statement: it's configTOTAL_HEAP_SIZE times the size of uint8_t, which must be one, hence configTOTAL_HEAP_SIZE.

What happens (looking at actual memory addresses) when we add (or subtract) an integer to a pointer?
The integer is multiplied by the size of the object pointed to and added to (subtracted from) the pointer to yield the new address.

In our case, the offset is multiplied by configTOTAL_HEAP_SIZE, so in the second statement we are adding  (configTOTAL_HEAP_SIZE -1) × configTOTAL_HEAP_SIZE to the address contained in ucHeap which is definitely a bit too much!

The only offset (or index, [] is just "syntactic sugar" for pointer arithmetic) you can use with ucHeap is, in fact 0, as there is only one element of ucHeap_t type (also 1, but only if the result is never dereferenced).

The main takeaways are:

* A pointer to an array is not the same as the pointer to an element of the array:
Though they might correspond to numerically equal addresses in memory, they are object of different types, and follow the rules accordingly.
* Array names, in most contexts (exception: sizeof, and & operator), 'decay' to pointers to their first element, but they are not pointer to arrays (well, unless the element themselves are arrays, as in a int a[10][20]; )
Standard references are the same as my previous post, plus, for array decaying into pointers: C11, 6.3.2.1  Lvalues, arrays, and function designators, §3

dunkemhigh:

--- Quote ---I'll pick this apart again, hoping I make myself clearer, using the simpler expressions above.
--- End quote ---

Whilst you give excellent detail of what's going on, I think the solutions proposed so far kind of muddy the water. Instead of addressing the intent of the original and why/where it falls over, they put forward detailed solution of a different intent.

Perhaps fixing the original in a minimal way would highlight better what the problem is there. With that in mind...


--- Code: ---typedef uint8_t ucHeap_t;
ucHeap_t *ucHeap;

void main(void){
    ucHeap = malloc(sizeof(ucHeap_t) * configTOTAL_HEAP_SIZE);          // Malloc heap, assign pointer to ucHeap
    ucHeap[0]=0;                               // Write 0 to first heap byte
    ucHeap[configTOTAL_HEAP_SIZE-1]=0xFF;      // Write 0xFF to last heap byte
}

--- End code ---

As I saw it, the intent was to malloc a bunch of memory and then access that memory as an array, which is what this does. The main problem with it was just the invalid use of *, presumably through not grasping what they mean (a common issue when learning this stuff). The solutions proposed, whilst obviously correct, have tried to include the * and thus hidden what the original problem was.

Nominal Animal:
I agree with dunkemhigh above, but want to take it further.

In dunkemhigh's minimal example, the size of the dynamically allocated array ucHeap is a compile-time constant, configTOTAL_HEAP_SIZE.  What do we gain by having it a compile time constant? Not much, really.  The array now resides in the heap instead of uninitialized data, but that's it.

I know OP explicitly asked how to do this.  I just think the question itself is wrong.  Or rather, that the answer to the question solves the actual problem OP is trying to solve by doing this, but that solution is not the sensible/efficient/correct one.

So, let's explore this stuff a bit further, because I don't think we can hope for OP to describe what they are trying to solve by doing this.

First thing we could do to dunkemhigh's minimal example is add the size as a variable, too:

--- Code: ---typedef  uint8_t  ucHeap_t;
ucHeap_t  *ucHeap;
size_t  ucHeap_size;

int main(void) {
    ucHeap_size = configTOTAL_HEAP_SIZE;
    ucHeap = malloc(ucHeap_size * sizeof ucHeap[0]);

    ucHeap[0] = 0;  // Set first heap element (byte) to zero
    ucHeap[ucHeap_size-1] = 0;  // Set last heap element (byte) to 0xFF

    return 0;
}

--- End code ---
What did we win here?  Nothing yet.  However, what if we don't allocate it before we actually need it?  Then, we can initialize the array to zero size (or "not available", whatever you wish), and only allocate it when it is needed.  Then, we can also release it when no longer needed, if there are long periods when we do other stuff that does not use ucHeap at all:

--- Code: ---typedef  uint8_t  ucHeap_t;
ucHeap_t  *ucHeap = NULL;
size_t  ucHeap_size = 0;

void ucHeap_init(void) {
    if (ucHeap_size != configTOTAL_HEAP_SIZE) {
        if (ucHeap_size > 0) {
            free(ucHeap);
        }
        ucHeap_size = configTOTAL_HEAP_SIZE;
        ucHeap = malloc(ucHeap_size * sizeof ucHeap[0]);
    }
}

void ucHeap_free(void) {
    free(ucHeap);
    ucHeap = NULL;
    ucHeap_size = 0;
}

int main(void) {
    ucHeap_init();

    ucHeap[0] = 0;  // Set first heap element (byte) to zero
    ucHeap[ucHeap_size-1] = 0;  // Set last heap element (byte) to 0xFF

    return 0;

--- End code ---
However, now the compile-time size setting, configTOTAL_HEAP_SIZE makes not that much sense anymore.  Instead of setting a compile-time constant, why don't we just pass the size we need when we start needing the heap at all?  That's even simpler, you see:

--- Code: ---uint8_t  *ucHeap = NULL;
size_t  ucHeap_size = 0;

void ucHeap_need(size_t size) {
    // If we already have a large enough heap allocated, we're good.
    if (ucHeap_size >= size)
        return;

    // Resize the existing heap to the size needed.
    // Note: realloc(NULL, size) is equivalent to malloc(size).
    void *old = ucHeap;
    ucHeap = realloc(ucHeap, size * sizeof ucHeap[0]);
    if (!ucHeap) {
        // Reallocation failed, but the old heap (at old) still exists.
        // TODO: Fail, abort, restart, whatever here.  Cannot continue.
    }
    ucHeap_size = size;
}

void ucHeap_free(void) {
    free(ucHeap);
    ucHeap = NULL;
    ucHeap_size = 0;
}

int main(void) {

    // We do something, that needs a heap of 5000 bytes.
    ucHeap_need(5000);
    ucHeap[0] = 0;
    ucHeap[4999] = 0xFF;

    // Something else needs a heap of 7000 bytes.
    ucHeap_need(7000);
    // Note, ucHeap[0] is still 0 and ucHeap[4999] is still 255.
    ucHeap[6999] = 0xFF;

    return 0;

--- End code ---
See how factoring out the allocation/reallocation logic, the array-utilizing code becomes much more logical?

We did create a new problem, though: what to do when the allocation/reallocation fails.  The reasons for such a failure are twofold: running out of available memory, and having the available memory fragmented, with allocated memory and freed memory scattered about, so that although the sum total of unused memory is greater than what we need, there isn't a large enough consecutive section we could use.

In cases where the previous contents of the array are no longer needed, we can just destroy the old one, then allocate the new one.  That way the C library and OS has more chances of finding a suitable free memory fragment:

--- Code: ---void ucHeap_init(size_t size, unsigned char zero)
{
    if (ucHeap_size >= size) {
        memset(ucHeap, zero, size * sizeof ucHeap[0]);
        return;
    }

    free(ucHeap);
    ucHeap = malloc(size * sizeof ucHeap[0]);
    if (!ucHeap) {
        // Out of memory. Abort, exit, reboot etc.
    }
    memset(ucHeap, zero, size * sizeof ucHeap[0]);
    ucHeap_size = size;
}

--- End code ---

Anyway, memory fragmentation is the reason why so many developers consider dynamic memory management "bad", especially in embedded environments.
Using malloc() to allocate a buffer for the lifetime of the process (or until the microcontroller is restarted, for os-less environments) does not suffer from memory fragmentation at all, because such buffers are never free()d.

Yet, if we wanted to, we could just use more sensible data structures, instead of just a Big Dumb Buffer!

A commonly used example is a bucket brigade.  A single array is split into sub-arrays of fixed size, so that memory fragmentation will not be a problem – the fragments being the same size.  For example:

--- Code: ---typedef struct  ucBucket  ucBucket;
struct ucBucket {
    struct ucBucket  *next;
    size_t  size;
    unsigned char data[UCBUCKET_DATA_SIZE];
};

--- End code ---
The size field in the structure is an unsigned integer between 0 and UCBUCKET_DATA_SIZE, inclusive, the latter being a compile-time constant, so that all allocated buckets take the same amount of memory, and memory fragmentation is no longer an issue.  It indicates the amount of data in that bucket.  Say you have a very large string, and you wish to modify it in the middle.  Instead of copying the rest of the array over, it is sufficient to just modify the buckets that contain the to-be-replaced part, even if the replacement is shorter or longer than the original.

The downside here is that instead of continuous array functions, we must traverse the linked list also.  For example, to find the index of a specific character, or the number of chars in a bucket brigade:

--- Code: ---ssize_t  ucBB_find_char(ucBucket *bucket, unsigned char c) {
    size_t  offset = 0;
    while (bucket) {
        if (bucket->size > 0) {
            unsigned char *p = memchr(bucket->data, bucket->size, c);
            if (p) {
                return offset + (p - bucket->data);
            }
            offset += bucket->size;
        }
        bucket = bucket->next;
    }
    // Not found.
    return -1;
}

size_t  ucBB_len(ucBucket *bucket)
{
    size_t  len = 0;

    while (bucket) {
        len += bucket->size;
        bucket = bucket->next;
    }

    return len;
}

--- End code ---
In a multithreaded environment, we also need a mutex (or an rwlock, or a similar locking structure) protecting access to each bucket brigade, so we usually have a separate "handle" structure, which often contains a pointer to the final bucket (for very fast append operations), and sometimes even a reference count, in addition to the mutex and initial bucket pointer.  All this is just "library" stuff, however, and not visible to the end developer, who just uses the function interfaces provided.

In freestanding C (without the C library), for example in microcontroller environments, it is possible to not have "malloc" at all, just a statically allocated array of "buckets".  Then, completely unused buckets can be put into a separate linked list, so that whenever a new bucket is needed, the first one in that list is used; and when a bucket is no longer needed, it is put back onto that list.

In fact, that's very much how C libraries internally implement malloc(), with just two big differences: the buckets are not of fixed size, and the arrays are not statically allocated, but obtained from the operating system via sbrk or mmap calls (in POSIX systems; similarly in other operating systems), and often are not even consecutive in the process address space.  (You can think of the malloc() implementation having a super-malloc, maintaining these "allocation pools" it obtains from the OS.  Although the details vary between implementations, as a first rough approximation, that gives a reasonable intuition on how things happen "under the hood".)

In systems programming – operating system services and daemons like web servers –, allocation pools are sometimes used explicitly.  For example, if each connected client has their own allocation pool, the memory use per client can be trivially monitored and controlled; and when the client disconnects, all the related memory allocations can be discarded at once, with minimal effort.

It is true that automatic garbage collection has advanced so much, that for typical applications, it can perform better than explicit dynamic memory management.  There is a lot of peer reviewed literature on how GC is done correctly, and the Boehm–Demers–Weiser garbage collector is a perfect place to start for a C or C++ developer.

Interestingly, Boehm GC provides support for cords, immutable strings that implement something very similar to bucket brigades above.  Instead of a linked list, cords are based on a tree, with leaves containing the data (either C strings or the functional description of the contents).

What can we say based on this wall of text as a summary, then?

Dynamic memory management is a tool among other tools.  It makes sense, when it makes sense.

Usually the actual problem you're trying to solve has a better solution you can only find by going in deeper; very much like XY problem.

Having a large array with mutable values can be a completely incorrect solution.  More complex solutions can be even more efficient, and as shown by Boehm GC in general and its cords in particular, even more robust and reliable than the simple method.  Things that appear to be single continuous entities, do not need to be so "under the hood".

In this thread, most posts have been talking about how to use the ucHeap buffer correctly, but the true question should be, what is it used for?

As it is, we're really discussing here how one can use a wood-splitting axe as a hammer, and not what we should be doing: What is that buffer used for? Why are you using a wood-splitting axe as a hammer in the first place; don't you have more appropriate tools you can use?

The latter question is why I seriously believe Data Structures should be discussed in depth immediately when introducing Pointers in C.  Abstract data structures like lists, trees, heaps, disjoint sets, and directed graphs make a lot of problems rather easy to solve.  They do not always need dynamic memory management, either; for example, (binary) heaps are usually represented as arrays.  Heaps are often used for things like priority queues, or when you need more software timers than you have hardware for, even in embedded environments and freestanding C.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

There was an error while thanking
Thanking...
Go to full version