Author Topic: Having trouble avoiding malloc on embedded ARM, suggestions? (Read 6258 times)

jnz · « **on:** May 04, 2018, 05:26:29 am »

Have an ARM chip with 128K RAM, I almost certainly have enough for anything I’ll want to do but not entirely certain and I don’t really have a say to solve my problem that I like.

Scenario is I have a stream of data coming in. It could be 20 bytes to 2,048 bytes. I need the entire stream received, block decrypt it, then store it for another function to use. There are 10 types of data that can come in and while i can only handle one “7” type at a time, I could handle a “3” and a “7” concurrently. If a “7” comes in while I’m working on a “7” I ignore it. Further complicated that “6” “7” and “8” types can not run concurrently, so they “could” share memory.

Problem is, I either statically allocate 20K or RAM in 2K chunks which seems very wasteful as most data blocks will be far less than that 2,048 worst case, I use malloc() and free() carefully, or I try and manage the memory myself by declaring a block and allocating and freeing from inside that manually.

Am I missing an obvious option?

I read a lot of malloc() hate for embedded but it would make things seem clearer.

ataradov · « **Reply #1 on:** May 04, 2018, 05:37:48 am »

malloc-style allocation work if you have predictable and limited processing time. This will eliminate problems with memory fragmentation. In that case, even if holes are formed, they are naturally eliminated and don't hang around for a long time. It will work for quick processing of frames.

Just never allocate long-living structures, that's a path to disaster.

I would not use malloc directly, but a custom implementation that may benefit from specific knowledge of the data size limits and allocation patterns. This approach also eliminates casual use for other things

Kalvin · « **Reply #2 on:** May 04, 2018, 06:05:20 am »

Here is a nice article on memory pools in embedded systems:
https://barrgroup.com/Embedded-Systems/How-To/Malloc-Free-Dynamic-Memory-Allocation

Mechatrommer · « **Reply #3 on:** May 04, 2018, 06:37:56 am »

Quote from: jnz on May 04, 2018, 05:26:29 am

Problem is, I either statically allocate 20K or RAM in 2K chunks which seems very wasteful as most data blocks will be far less than that 2,048 worst case

what is datatype sized 2048? is there possibility 10 consecutive of this datatype occured? if yes, allocating 20K is not a wastefull strategy. but not enough information given, such as how fast can you process data when they keep coming, you need to decide worst case scenario about how many data left in the buffer before you can process them, how much data to ignore etc etc. the keyword to this type of problem is "circular buffer", you set how many you want and then you decide how much faster to process them, this afaik is done in any hardware communication channels such as SPI, UART I2C etc, some mcu may provide only 1 or 2 bytes, some maybe many, you process slower than datarate coming in, you lose data, simple as that. knowing your worst case scenario and case specification will tell you how much memory to be allocated permanently.

Quote from: jnz on May 04, 2018, 05:26:29 am

I read a lot of malloc() hate for embedded but it would make things seem clearer.

malloc is just a tool, if you dont know how to use it, it will bite you in the butt...

MosherIV · « **Reply #4 on:** May 04, 2018, 07:03:34 am »

Hi

As Mecatrommer says, circular buffers as definitely a good idea.

Use the concept of a 'communications protocol stack'. What that means in practise is that your code to handle each stage is seperated into different parts/layers.
Start with the receive layer. Keep this small,fast and short. No need to assemble whole message packet. Just get data and pass onto next layer.
Next layer assembles the whole message/packet. Pass it onto next layer.
Next layer decodes type of message and passes on to correct decoder. Etc

Keep the buffers for each layer the appropriate size.

Not using malloc/free is only essential in safety critical application.

poorchava · « **Reply #5 on:** May 04, 2018, 07:30:48 am »

I'd consider using an RTOS of some type as they generally have heap allocation implemented as well as mechanisms for detecting overflows and such. If you don't want / don't need / can't use multitasking, then just run everything in one thread and that's it.

DeanCording · « **Reply #6 on:** May 04, 2018, 07:35:47 am »

Quote from: jnz on May 04, 2018, 05:26:29 am

Have an ARM chip with 128K RAM, I almost certainly have enough for anything I’ll want to do

Quote

Problem is, I either statically allocate 20K or RAM in 2K chunks which seems very wasteful as most data blocks will be far less than that 2,048 worst case, I use malloc() and free() carefully, or I try and manage the memory myself by declaring a block and allocating and freeing from inside that manually.

Why is statically allocating memory a waste? It's not like you can't save it up for another project! The only wasted memory in an embedded single process system is the memory you never put to use.

If, at any point, you are going to need to malloc 20K, then it makes no difference whether it is statically allocated at startup or dynamically allocated during runtime. It is going to need to be allocated sometime and you might as well find out that you are short of space during development instead in production. Besides, by the time you add in the extra code to do the malloc, keep track of the allocations, and clean up after yourself, you will probably end up using nearly as much in code space.

IanB · « **Reply #7 on:** May 04, 2018, 08:43:57 am »

Quote from: jnz on May 04, 2018, 05:26:29 am

Problem is, I either statically allocate 20K or RAM in 2K chunks which seems very wasteful as most data blocks will be far less than that 2,048 worst case, I use malloc() and free() carefully, or I try and manage the memory myself by declaring a block and allocating and freeing from inside that manually.

Am I missing an obvious option?

Quote from: DeanCording on May 04, 2018, 07:35:47 am

Why is statically allocating memory a waste?

Precisely. What is the "waste" you are concerned about? You either have 20 K available on the system or you don't. If you "may" have 20 K available but you are not sure then you have an unreliable design and a bug waiting to happen. Your question betrays faulty logic in your thinking.

You should create a memory table, laying out what memory is to be used for what purpose, and then allocate the appropriate memory buffers statically at startup.

ebastler · « **Reply #8 on:** May 04, 2018, 08:50:24 am »

Quote from: DeanCording on May 04, 2018, 07:35:47 am

Why is statically allocating memory a waste? It's not like you can['t] save it up for another project! The only wasted memory in an embedded single process system is the memory you never put to use.

My thoughts exactly. Allocate the memory you need for your application, organize it in a suitable structure (a ring buffer of blocks may make sense), and be done with it.

Berni · « **Reply #9 on:** May 04, 2018, 09:15:12 am »

To me 20K does not sound that bad if you have 128K of RAM

For C programing it works quite well to have a big typedef struct sitting statically allocated in memory. Its harder to garble it by doing something silly with memory access and in debugging you can easily place a watch on it. This means you can stop your program at any time in its execution and be able to see all the most recent packets.

Its nice to see programmers still being worried about using 20KB rather than just amusing RAM is infinite as most non MCU programmers do. But if this means a lot more complex code just to turn the 20KB into perhaps 10KB while you don't really need those extra 10KB to get your program to fit in then its mostly a waste of time. Heck in your example if you get a big burst of data and processing is slow dynamic allocation might use more than 20KB when the packets pile up. Heck perhaps in a rare oddball case it might use up all the available dynamic RAM and cause your program to crash in some very strange ways that are difficult to reproduce. This is also how a lot of security vulnerabilities appear in software where barfing an unexpected stream of data into it causes something to run out of memory or overflow, leading to a 'soft crash' in just the right way to trample over some memory and cause the software to do something it shouldn't later on.

Brutte · « **Reply #10 on:** May 04, 2018, 09:52:35 am »

Quote from: jnz on May 04, 2018, 05:26:29 am

There are 10 types of data that can come in and while i can only handle one “7” type at a time, I could handle a “3” and a “7” concurrently. If a “7” comes in while I’m working on a “7” I ignore it. Further complicated that “6” “7” and “8” types can not run concurrently, so they “could” share memory.

Here is my proposition. It is statically allocated and occupies only the necessary amount.

Code: [Select]

typedef enum{
    notUsed = 0x000,
    type001 = 0x001,
    type002 = 0x002,
    type003 = 0x003,
    type031 = 0x031,
    type013 = 0x013,
    //... all possible scenarios
  }Scenario_t;

  //This layout is only for debugging and memory allocation
  struct{
  Scenario_t currentScenario;
    union{
      struct{int data1[99];}type001;
      struct{int data2[10];}type002;
      struct{int data3[17];}type003;
      struct{int data1[99]; int data3[17];}type031;
      struct{int data3[17]; int data1[99];}type013;
      //.....
    };
  }staticAllocation;

And whenever a new packet arrives, you can calculate what is already in the allocated buffer and push in a new data. Mind the staticAllocaiton layout is for debugging mostly as the runtime calculations can be based on masking currentScenario. Below I show a naive implelementation that you should not use as this would require tons of switch()-es.

Code: [Select]

  //...
  //if a type3 arrives when type1 is already at place:
  if(staticAllocation.currentScenario == type001){
    staticAllocation.currentScenario = type031;
    staticAllocation.type031.data3[14]=0x01234567;
  }

andyturk · « **Reply #11 on:** May 04, 2018, 10:32:08 am »

Consider a non-malloc heap allocator. The advantage there is that you know that no other code will be using it and you're free to wipe it out and start fresh as your program runs. You'll have to statically allocate the space for the heap itself, but then you can be dynamic within its boundaries.

Here's one you might want to look at: https://github.com/rhempel/umm_malloc

SiliconWizard · « **Reply #12 on:** May 04, 2018, 12:10:32 pm »

Quote from: ebastler on May 04, 2018, 08:50:24 am

Quote from: DeanCording on May 04, 2018, 07:35:47 am
Why is statically allocating memory a waste? It's not like you can['t] save it up for another project! The only wasted memory in an embedded single process system is the memory you never put to use.

My thoughts exactly. Allocate the memory you need for your application, organize it in a suitable structure (a ring buffer of blocks may make sense), and be done with it.

That's usually the preferred way, especially for small systems.

Besides, dynamic memory allocation can be dangerous and usually has unpredictable execution time.
Some safety-critical related guidelines (such as MISRA-C) and standards (such as IEC 61508) either explicitely prohibit all forms of dynamic allocation, or at least strongly recommend against using it.

nctnico · « **Reply #13 on:** May 04, 2018, 01:10:12 pm »

I'm wondering how long the data needs to be stored. Can't the data be processed as soon as it arrives? That way you'd need to hold a maximum of 2 blocks of data (one which is being received and one which is being processed). If you can't process the data as fast as it comes in then there is always the chance data will be lost.

A circular buffer is one way of doing it. Another way is chopping the memory up in small chunks and use a linked list to create buffers which have a variable size. The advantage is that fragmentation isn't a problem but you'd have to iterate through each chunk of a buffer (you can't have a pointer to a buffer!). I have implemented this in a really space constrained device and it worked well.

Mechatrommer · « **Reply #14 on:** May 04, 2018, 01:55:58 pm »

Quote from: nctnico on May 04, 2018, 01:10:12 pm

I'm wondering how long the data needs to be stored. Can't the data be processed as soon as it arrives? That way you'd need to hold a maximum of 2 blocks of data (one which is being received and one which is being processed). If you can't process the data as fast as it comes in then there is always the chance data will be lost.

yes this is another good solution. double buffering and page flipping. when each packet/data block/type arrived, buffer is flipped, earlier buffer is processed ASAP. but this requires processing must be completed before the buffer is flipped again to receive 3rd packet. so you can just allocate 2 max size packet ie 2 x 2048 = 4KB memory. or you can just use circular buffer of size 4KB without the above requirement needed by page flipping technique. you only need get_next_byte() or something from processing layer and send it to appropriate data type processor, if you can process fast, you dont have to ignore anything, otherwise you can just use move_to_next_byte() or something to skip the data and move/rotate the start pointer of your circular buffer faster to give more room for incoming data. ymmv.

Quote from: DeanCording on May 04, 2018, 07:35:47 am

Why is statically allocating memory a waste? It's not like you can't save it up for another project! The only wasted memory in an embedded single process system is the memory you never put to use.

i believe whats he wants is so he can maximize memory allocation for other parameters, maybe he has another elastic (size keep changing) memory. so if his 20K buffer is only 50% utilized all the time, his another elastic memory cannot be freed and expanded 10KB bigger, but this will end up a messy housekeeping job imho, in the end, the get bigger memory size applied, if you want stable operation.

otoh with malloc, if me. i will allocate critical, fixed sized and static parameters first, once and untill to the end, and elastic (expandable) memories at the end of the row. so less housekeeping/shuffling memory job either by manually tracking them or by using malloc.h facility. ymmv.

jnz · « **Reply #15 on:** May 04, 2018, 05:21:01 pm »

There is some great data here but some questions about time.

The shortest answer is IDK.

Datatype “7” sized 1k comes in. I need somewhere to put it. I need to decrypt or other “process” it. I need to use it which ideally would just be passing a pointer to it to another function. When that function is done I can remove it.

This could be 1mS or 30S until I’m don’t with it.

In that time no other “7” type of data will be expected/allowed.

At any time I can expect an average use max of 5K data, so it seems bad to allow 20k of storage that basically can’t happen.

I don’t mind the cyclic buffer, but I want to store and use this data “in place” so at the wrap around that’s a bad idea. I’d have to store here and copy somewhere else - and in that case I have 10x 2K buffers just somewhere else sitting around. Using cyclic buffer without copying the data out means if I pass a pointer and the object is on the roll over I need all sorts of logic supporting that.

I don’t mind malloc-ing a region and try and divide that up.

I still need to check the links posted. Just wanted to update the time requirements and explain that where-ever I store these, I also need to work with them from until they are of no use anymore.

ebastler · « **Reply #16 on:** May 04, 2018, 05:26:37 pm »

Quote from: jnz on May 04, 2018, 05:21:01 pm

The shortest answer is IDK.

Datatype “7” sized 1k comes in. I need somewhere to put it. I need to decrypt or other “process” it. I need to use it which ideally would just be passing a pointer to it to another function. When that function is done I can remove it.

Enlighten me, please -- so what does IDK stand for? "Incoming, Decrypt, Kremove"?

IanB · « **Reply #17 on:** May 04, 2018, 06:04:54 pm »

Quote from: ebastler on May 04, 2018, 05:26:37 pm

Enlighten me, please -- so what does IDK stand for? "Incoming, Decrypt, Kremove"?

"I Don't Know"

IanB · « **Reply #18 on:** May 04, 2018, 06:12:51 pm »

Quote from: jnz on May 04, 2018, 05:21:01 pm

At any time I can expect an average use max of 5K data, so it seems bad to allow 20k of storage that basically can’t happen.

If your system has 128 K of memory and your application when it's running leaves 50 K unused, then you are wasting that 50 K of memory.

On an embedded system there is no concept of "waste" in the way you are thinking. You own the whole system; you are not sharing it and you are not competing with anyone or anything else for resources (?).

Therefore, the only problem with allocating a 20 K buffer at startup and keeping it available is if it prevents you successfully executing other parts of your application. If you have enough memory to allocate that buffer permanently, then allocate it permanently. Your application will execute faster, have fewer bugs, be more reliable.

ebastler · « **Reply #19 on:** May 04, 2018, 06:13:03 pm »

Quote from: IanB on May 04, 2018, 06:04:54 pm

"I Don't Know"

Ah, thanks -- that's one I had not come across yet.

jnz · « **Reply #20 on:** May 04, 2018, 08:40:49 pm »

Quote from: IanB on May 04, 2018, 06:12:51 pm

Quote from: jnz on May 04, 2018, 05:21:01 pm
At any time I can expect an average use max of 5K data, so it seems bad to allow 20k of storage that basically can’t happen.

If your system has 128 K of memory and your application when it's running leaves 50 K unused, then you are wasting that 50 K of memory.

On an embedded system there is no concept of "waste" in the way you are thinking. You own the whole system; you are not sharing it and you are not competing with anyone or anything else for resources (?).

Therefore, the only problem with allocating a 20 K buffer at startup and keeping it available is if it prevents you successfully executing other parts of your application. If you have enough memory to allocate that buffer permanently, then allocate it permanently. Your application will execute faster, have fewer bugs, be more reliable.

IT'S NOT FINISHED. I can't tell you how much of the RAM is left, what I know right now is that I don't want to be wasteful just because I didn't think ahead of how this would work. I don't want to devote 20k to 10 buffers of 2k when realistically most objects cases with be 32 bytes or so.

I have whole swaths of future features that will need surely RAM. Just pretend that I can't budget every byte. This isn't a toy with a fixed life and no update schedule.

nctnico · « **Reply #21 on:** May 04, 2018, 08:53:40 pm »

But how about optimising when really needed instead of banging your head against the wall to solve a problem which may never arrive?
Still given your problem I think a circular receive buffer is the easiest way to solve it. Read a packet from the receive buffer and then process it in a seperate buffer. Sure you'll need an extra buffer but that is only 2k extra. However you could put that 2k buffer onto the stack (declare it inside a function and do processing from there) so it gets freed when it is no longer needed without needing to worry about memory fragmentation. I often declare temporary buffers on the stack to share memory.

Howardlong · « **Reply #22 on:** May 04, 2018, 08:57:04 pm »

Quote from: jnz on May 04, 2018, 08:40:49 pm

IT'S NOT FINISHED. I can't tell you how much of the RAM is left, what I know right now is that I don't want to be wasteful just because I didn't think ahead of how this would work. I don't want to devote 20k to 10 buffers of 2k when realistically most objects cases with be 32 bytes or so.

I have whole swaths of future features that will need surely RAM. Just pretend that I can't budget every byte. This isn't a toy with a fixed life and no update schedule.

It sounds very much like you’re coming up with a solution before you know what the requirements are.

Edit:

I’ll add to that, you need to scribble down all of your requirements, and prioritise them.

Accept that compromises will be made: compromises are a fundamental part of engineering, driven by priorities.

I’ll bet that getting a product out of the door is far more important than possible future updates.

NorthGuy · « **Reply #23 on:** May 04, 2018, 09:04:22 pm »

This may sound paradoxical, but ... You achieve better resource utilization when you use substantially all of the available resources because you have already paid for all your memory. To contrary, if you have 128K of memory and only use 8K - that's a waste of memory.

Therefore, if using more memory helps you create better code (as with static allocation case), there's absolutely no doubts that this is the way to go.

MosherIV · « **Reply #24 on:** May 04, 2018, 09:05:21 pm »

Quote

I don't want to devote 20k to 10 buffers of 2k when realistically most objects cases with be 32 bytes or so.

Nctnico pretty much said this suggestion in a different way.

Layer your functions.
Layer 0 with fixed 2K buffer receives data, when full msg received, copy and post it to next layer.
Layer 1 decides what packet is and what need to be done, 2k fixed buffer. Copies data to buffers in next layer.
Layer 2, 1 fixed size buffer per object. Holds data until done, buffers can then be reused.

Ok it still uses 6K but better than 10K and no dynamic allocation.
You could collaspe layer1 and 2 if you can process fast enough.

amyk · « **Reply #25 on:** May 05, 2018, 04:14:17 am »

Early UNIX ran with substantially less than 128K of RAM and used malloc()/free(). From that perspective, 128K is far beyond the realm of "static allocation only".

However, I would recommend static allocation in this case since it's a fixed-function embedded system. You don't need 20K, only as much as you will use at most.

Mechatrommer · « **Reply #26 on:** May 05, 2018, 07:40:08 am »

Quote from: jnz on May 04, 2018, 05:21:01 pm

Datatype “7” sized 1k comes in. I need somewhere to put it. I need to decrypt or other “process” it. I need to use it which ideally would just be passing a pointer to it to another function. When that function is done I can remove it.

This could be 1mS or 30S until I’m don’t with it.

In that time no other “7” type of data will be expected/allowed.

At any time I can expect an average use max of 5K data, so it seems bad to allow 20k of storage that basically can’t happen.

there's another way you can do it if analyzing worst case condition vs incoming data rate is too much hassle to do first hand. linked list... you just add the list when there is "acceptable" data type arrived. linked list member is only pointer to newly allocated memory with size corresponds to the data type received. accepting new incoming data will be done by sort of filter function based on your machine state. eg we just received data 7, so we can ignore data 7,8 and so on, we can accept the rest including data 10 (2048 bytes).

this FIFO elastic structure by norm should be destroyed and created as they come and go, so malloc and free will be your toy, fragmentation should occur around this elastic structure. but getting some level of statistic assurance about the worst case scenario is inevitable as this is the engineering of it. imagine if you received and processing data 7 in 30s, and during that time 63 packets of data type 10 arrived (2048 each) if you keep accepting them blindly, then you can have either lost data, corrupted program state or mcu in limbo. ymmv.

nctnico · « **Reply #27 on:** May 05, 2018, 10:39:48 am »

Quote from: amyk on May 05, 2018, 04:14:17 am

Early UNIX ran with substantially less than 128K of RAM and used malloc()/free(). From that perspective, 128K is far beyond the realm of "static allocation only".

But that probably had virtual memory on disk creating a memory pool which was much bigger.

Howardlong · « **Reply #28 on:** May 05, 2018, 08:30:38 pm »

Quote from: nctnico on May 05, 2018, 10:39:48 am

Quote from: amyk on May 05, 2018, 04:14:17 am
Early UNIX ran with substantially less than 128K of RAM and used malloc()/free(). From that perspective, 128K is far beyond the realm of "static allocation only".
But that probably had virtual memory on disk creating a memory pool which was much bigger.

...and in those days a small team of operators to reboot it when things went wrong.

SiliconWizard · « **Reply #29 on:** May 06, 2018, 12:34:40 am »

Well, the first Macintosh had 128 KB of RAM and its OS used dynamic allocation everywhere. There is no doubt you can make use of dynamic allocation for this amount of memory - even on 64 KB computers, that made sense.

The issue is still that it can be very unsafe for an embedded system, because it's a potential source of critical bugs and that dynamic allocators usually have unpredictable execution time. (You can of course write your own allocator that would be guaranteed to have a fixed execution time, but then it would probably manage memory in an inefficient way, and the cases in which it would not, you could do the same with a simple static scheme anyway.)

andyturk · « **Reply #30 on:** May 06, 2018, 02:42:00 am »

Quote from: SiliconWizard on May 06, 2018, 12:34:40 am

The issue is still that it can be very unsafe for an embedded system, because it's a potential source of critical bugs and that dynamic allocators usually have unpredictable execution time. (You can of course write your own allocator that would be guaranteed to have a fixed execution time, but then it would probably manage memory in an inefficient way, and the cases in which it would not, you could do the same with a simple static scheme anyway.)

Dynamic memory allocation isn't unsafe for all embedded systems. But depending on the requirements of a particular application, it could be. So, you've got to know what you're building and what's important.

Using a non-system dynamic allocator has some advantages too. Since you're in complete control, you can write a test harness to verify things like allocation time or fragmentation. System allocators often have mutual exclusion built into them (so they're safe to call from multiple threads). That might cause problems. Again, this depends on your actual requirements. A non-system allocator that's exclusively called by one thread can simplify things.

JS · « **Reply #31 on:** May 06, 2018, 02:16:58 pm »

I'd go for the static 20k or just for an RTOS. FreeRTOS has 5 different approaches for memory allocation, one of them should fit your needs. I think it's the second chapter on their mastering guide. It has the libraries to implement them safely, using custom malloc() and free() functions for each case.

JS

Howardlong · « **Reply #32 on:** May 06, 2018, 10:00:13 pm »

Quote from: SiliconWizard on May 06, 2018, 12:34:40 am

Well, the first Macintosh had 128 KB of RAM and its OS used dynamic allocation everywhere. There is no doubt you can make use of dynamic allocation for this amount of memory - even on 64 KB computers, that made sense.

The issue is still that it can be very unsafe for an embedded system, because it's a potential source of critical bugs and that dynamic allocators usually have unpredictable execution time. (You can of course write your own allocator that would be guaranteed to have a fixed execution time, but then it would probably manage memory in an inefficient way, and the cases in which it would not, you could do the same with a simple static scheme anyway.)

If memory serves my right (no pun intended!) the Mac used handles (double dereferencing) rather than pointers for all its memory management. While that makes defragging easy, it’s a crappy scenario in real time systems due to its indeterministic nature. I am going back to the mid 80s here, I used to write terminal emulators for Macs in those days.

SiliconWizard · « **Reply #33 on:** May 06, 2018, 11:02:04 pm »

Quote from: Howardlong on May 06, 2018, 10:00:13 pm

If memory serves my right (no pun intended!) the Mac used handles (double dereferencing) rather than pointers for all its memory management.

That's absolutely right! You needed to "lock" a handle to access the corresponding block, then "unlock" it when you were done.
That allowed the OS to move blocks around to get the most out of the very limited memory available. That was quite clever, although it would slow things down, so you still had to be pretty conservative as to the number of memory blocks you allocated.

Quote from: Howardlong on May 06, 2018, 10:00:13 pm

While that makes defragging easy, it’s a crappy scenario in real time systems due to its indeterministic nature.

Yes. And as I said, even when using more conventional allocators where memory blocks have a fixed location, the allocator still has to find the fittest free area, which takes an unpredictable amount of time which depends on the current distribution of all the already allocated blocks.

To be completely fair though, even when you're not explicitely using dynamic allocation, you most often still use some form of it. It's called the stack.
A huge advantage of the stack is that it can't fragment memory, and that it's automatic, so there can't be memory leaks or access to freed memory. But it's still dynamic allocation, albeit a lot less flexible.

Berni · « **Reply #34 on:** May 07, 2018, 05:27:20 am »

Yep a stack is a pretty reliable dynamic memory allocation method since there is a strictly defined area of time where the memory is valid from the start til the end of the function. Also perhaps a potential plus is that if you overflow an array stored in stack you will likely also run over something else pretty important like a function return address so the application will catastrophically crash soon after. It might sound like a bad thing but this makes array overflow memory corruption much easier to detect and debug by making it crash on the spot rather than trampling over less critical data that just makes the program act weird in a completely different place.

On the other hand keeping arrays on the stack makes certain outside exploits easier since its easier for the attacker to inject malicious code by causing a array to overflow in just the right way. Easy to get at the pointers and everything allocates in the same spots.

SiliconWizard · « **Reply #35 on:** May 07, 2018, 10:20:35 pm »

Well, not all architectures are plagued with this potential issue. Architectures in which code and data memory areas (and sometimes even the call stack) are completely separate can't suffer from this kind of code injection. I know it's up for heated debate, but IMO any other type or architecture (namely those that allow mixing code and data in the same areas) should be banned forever.

NorthGuy · « **Reply #36 on:** May 07, 2018, 10:59:29 pm »

Quote from: SiliconWizard on May 07, 2018, 10:20:35 pm

Well, not all architectures are plagued with this potential issue. Architectures in which code and data memory areas (and sometimes even the call stack) are completely separate can't suffer from this kind of code injection. I know it's up for heated debate, but IMO any other type or architecture (namely those that allow mixing code and data in the same areas) should be banned forever.

Von Neumann architecture is certainly easier for everyone, but Harvard architecture is much more suitable for the embedded world because it completely eliminates collisions between program and data buses thus making the whole thing much more predictable.

nctnico · « **Reply #37 on:** May 08, 2018, 12:13:29 am »

Quote from: SiliconWizard on May 07, 2018, 10:20:35 pm

Well, not all architectures are plagued with this potential issue. Architectures in which code and data memory areas (and sometimes even the call stack) are completely separate can't suffer from this kind of code injection. I know it's up for heated debate, but IMO any other type or architecture (namely those that allow mixing code and data in the same areas) should be banned forever.

Well microcontrollers run from flash so you can't overwrite the code anyway. Some of the bigger ARM Cortex controllers have an MPU which allows to disable the SRAM to execute code from.

SiliconWizard · « **Reply #38 on:** May 09, 2018, 04:48:32 pm »

Quote from: nctnico on May 08, 2018, 12:13:29 am

Well microcontrollers run from flash so you can't overwrite the code anyway. Some of the bigger ARM Cortex controllers have an MPU which allows to disable the SRAM to execute code from.

Harvard-like architectures are good in that respect indeed.

But a lot of microcontrollers (and most CPUs) are still Von Neumann architectures or derivatives, allowing to run code from memory areas that can be used for data storage. ARM cores are an example, but MIPS also allows this (like on PIC32 MCUs), all 8051 derivatives (there are still many in use!), and most bigger CPUs.

As you mentioned, one way that has been developed to circumvent the security issues associated with Von Neumann architectures is the MPU. This was a good step forward, but it's still not completely safe. New security holes are constantly discovered on most MPUs, and there's still a fundamental issue: MPUs are software programmed, so the memory isolation they can procure is entirely software-dependent.

nctnico · « **Reply #39 on:** May 09, 2018, 05:05:36 pm »

Quote from: SiliconWizard on May 09, 2018, 04:48:32 pm

Quote from: nctnico on May 08, 2018, 12:13:29 am
Well microcontrollers run from flash so you can't overwrite the code anyway. Some of the bigger ARM Cortex controllers have an MPU which allows to disable the SRAM to execute code from.
Harvard-like architectures are good in that respect indeed.

But a lot of microcontrollers (and most CPUs) are still Von Neumann architectures or derivatives, allowing to run code from memory areas that can be used for data storage. ARM cores are an example, but MIPS also allows this (like on PIC32 MCUs), all 8051 derivatives (there are still many in use!), and most bigger CPUs.

All those examples are NOT Von Neumann architectures. The 8051 is a classic example of the Harvard architecture with seperate address spaces for everything. ARM and MIPS are modified Harvard with seperate busses for code and data but you can tie these busses together into a unified memory space.

The Harvard architecture by itself sucks to write code from. Try to implement a fast look-up table in ROM for example. You have to copy it to RAM first so you waste a lot of memory AND your look-up tabel can be overwritten. Or try to use (function) pointers.

SiliconWizard · « **Reply #40 on:** May 09, 2018, 06:26:23 pm »

Quote from: nctnico on May 09, 2018, 05:05:36 pm

All those examples are NOT Von Neumann architectures. The 8051 is a classic example of the Harvard architecture with seperate address spaces for everything. ARM and MIPS are modified Harvard with seperate busses for code and data but you can tie these busses together into a unified memory space.

That's why I said "or derivatives". 8051 had means of seperating the areas, but most 8051 implementations I've come across (admit I've never worked with original ones) could run code from RAM without any protection, so the isolation is moot. IMO, if there is any bridge that can allow access to the same areas, then those areas are not isolated by definition. The presence of a proper MPU can mitigate this somewhat, but as I said, it's certainly not safe per se. Again, vulnerabilities are discovered all the time, and this is to be expected, since they are soft-programmable.

Quote from: nctnico on May 09, 2018, 05:05:36 pm

The Harvard architecture by itself sucks to write code from. Try to implement a fast look-up table in ROM for example. You have to copy it to RAM first so you waste a lot of memory AND your look-up tabel can be overwritten.

This is not a problem with the Harvard architecture per se. You could have separate spaces in ROM/Flash for code an constant data, and thus you wouldn't need any copying in RAM. The fact that most implementations do not provide this separation is probably just for cost reasons (doubled by the fact that it would be hard to select the right proportion of non-volatile memory dedicated to constant data and dedicated to code and please the whole market). But this mixed storage in NVM is IMO a violation of the Harvard arch, even if it's read-only.

Strictly speaking, the Harvard arch. separates code and data areas, not ROM and RAM memory spaces.

Quote from: nctnico on May 09, 2018, 05:05:36 pm

Or try to use (function) pointers.

As long as they point to code memory, there is no issue with them. The processor just needs to implement indirect calls properly, and the compilers must enforce a strict non-compatibility between function pointers and data pointers.

That said, function pointers still are potential security holes, since they can be modified at run-time. They can still only point to code areas in Harvard arch., but they may be modified to point to addresses that could run unwanted code (even if it wouldn't be arbitrary code). So granted, the Harvard arch. is not the panacea and should definitely evolve to take all those points in consideration. But I don't think that current MPUs are the right answer. They are just much better than nothing.

orin · « **Reply #41 on:** May 09, 2018, 07:13:12 pm »

Quote from: nctnico on May 05, 2018, 10:39:48 am

Quote from: amyk on May 05, 2018, 04:14:17 am
Early UNIX ran with substantially less than 128K of RAM and used malloc()/free(). From that perspective, 128K is far beyond the realm of "static allocation only".
But that probably had virtual memory on disk creating a memory pool which was much bigger.

FWIW

I first met Unix on a PDP11/34. The address space in which your program ran was 64K - that was for both instructions and data. Sure, there could be more physical RAM and a swap file, but the per process address space was still only 64K. later PDP11s had separate instruction and data spaces which gave you a whopping 128K per process address space.

If you wanted more address space, some programs ran several processes. The Ingres database was one example - you could only run one instance on the 11/34 without it grinding to a halt, or getting in trouble with an out of swap space panic. The operators got kind of upset when we did that...

tszaboo · « **Reply #42 on:** May 09, 2018, 09:14:23 pm »

Quote from: jnz on May 04, 2018, 08:40:49 pm

Quote from: IanB on May 04, 2018, 06:12:51 pm
Quote from: jnz on May 04, 2018, 05:21:01 pm
At any time I can expect an average use max of 5K data, so it seems bad to allow 20k of storage that basically can’t happen.

If your system has 128 K of memory and your application when it's running leaves 50 K unused, then you are wasting that 50 K of memory.

On an embedded system there is no concept of "waste" in the way you are thinking. You own the whole system; you are not sharing it and you are not competing with anyone or anything else for resources (?).

Therefore, the only problem with allocating a 20 K buffer at startup and keeping it available is if it prevents you successfully executing other parts of your application. If you have enough memory to allocate that buffer permanently, then allocate it permanently. Your application will execute faster, have fewer bugs, be more reliable.

IT'S NOT FINISHED. I can't tell you how much of the RAM is left, what I know right now is that I don't want to be wasteful just because I didn't think ahead of how this would work. I don't want to devote 20k to 10 buffers of 2k when realistically most objects cases with be 32 bytes or so.

I have whole swaths of future features that will need surely RAM. Just pretend that I can't budget every byte. This isn't a toy with a fixed life and no update schedule.

You design embedded systems with worst case scenarios in mind. Worst case it that you will get 10x2K data. If you dont prepare the system for it, it will crash. If you prepare it for it, you could just as well allocate it static.

Berni · « **Reply #43 on:** May 10, 2018, 05:12:02 am »

Exactly.

You don't want to built it in such a way that it relies of the input data being small and slow enough for it to keep up. There might be quirks or failures in the rest of the system that cause more data than usual to get to it or the system might be updates at some point to send more stuff. Sure you might not have the processing power to process the data at the max speed, but you want to at least safely throw away packets rather than crash catastrophically or even worse, trample over memory and cause your program to act unpredictably.

Usually you can do some planing and predict what might need a lot of RAM for the rest of your project. Things like variables or small structs scattered around your program tend to use very little memory (few KB) unless its a giant program. Its the large arrays that can make a dent in your 128K of memory. If you go in with this mentality of not declaring big arrays just willynilly all over your code you will not use up your 128K quick at all, so don't worry about the 20KB that much.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Having trouble avoiding malloc on embedded ARM, suggestions? (Read 6258 times)

Share me