Author Topic: What's eating my RAM? (Atmel SAM D11) (Read 5794 times)

ajb · « **on:** October 18, 2016, 08:04:25 pm »

I've been working on a project using an Atmel SAMD11D14A, which is a Cortex M0+ With 16kB flash and 4kB SRAM, in Atmel Studio 7, and I am NOT using ASF. I've been thinking that the RAM usage on the project seemed a bit high, but hadn't gotten around to investigating that until today, when I finally ran out.

I've been fumbling my way through trying to figure out exactly what's eating all of my RAM so that I can decide how best to proceed. I'm far from an expert on the linker, though, so it's been slow. Here's what I think I know so far/questions I have:

I'm using the default Atmel Studio-generated flash linker script, which appears to be setting the the stack size to 2k. That seems rather high, especially since the RAM linker script has the stack at 1k. Would it be wise to reduce the stack size to 1k? I'm not doing a lot of nested function calls (again, no ASF), so I would think 2k is a bit excessive.

Trying to get a handle on where the other 2k of RAM is going, I've been looking at the .map file, both directly and in amap. That identified a couple of things that weren't as const as they could be, so that helped a bit. The section summary looks like this:

Quote

Section Size Length Number of Records
.text 9264 9264 80
.vfp11_veneer 0 0 0
.v4_bx 0 0 0
.iplt 0 0 0
.eh_frame 0 0 0
.rel.dyn 0 0 0
.jcr 0 0 0
.igot.plt 0 0 0
.ARM.exidx 0 0 0
.relocate 1444 1444 11
.bss 470 470 18
.stack 0 0 0
.ARM.attributes 1176 1176 27
.comment 1079 1079 12
.debug_info 77316 77316 12
.debug_abbrev 5642 5642 12
.debug_loc 5018 5018 11
.debug_aranges 712 712 12
.debug_ranges 664 664 11
.debug_macro 62317 62317 83
.debug_line 15950 15950 12
.debug_str 379729 379729 12
.debug_frame 1844 1844 20

From my understanding, .bss and .relocate are the only things that wind up in RAM, although it seems that some debug info ends up there as well, but I'm not sure which debug info in particular that might be. Anyway, .bss + .relocate plus 2k of stack add up to about the total data memory usage AS reports when I build the project. Are the variables that wind up in .relocate simply those which are explicitly initialized, versus .bss which are uninitialized or initialized to zero?

I can easily account for most of the names that are listed in .bss and .relocate, except for one, which is this:

Code: [Select]

Section		SubSection	Address		Size	Demangled Name	Module Name												File Name
.relocate	.data		200005a8	1064	_erelocate = .	c:/program files (x86)/atmel/studio/7.0/toolchain/arm/arm-gnu-toolchain/bin/../lib/gcc/arm-none-eabi/5.3.1/../../../../arm-none-eabi/lib/armv6-m\libc.a	lib_a-impure.o

That accounts for fully half of my nonstack RAM allocation! I'm not using printf or malloc or, as far as I know, any of the other intensive C standard lib functions. Is it unavoidable that libc will consume so much memory, and if so can I keep it out of RAM? Or is something else going on?

Any advice is appreciated!

Edit: The forum eats repeated tab characters when you edit a post, so tabular formatting is a pain. This is helpful

andersm · « **Reply #1 on:** October 18, 2016, 08:35:51 pm »

Quote from: ajb on October 18, 2016, 08:04:25 pm

From my understanding, .bss and .relocate are the only things that wind up in RAM, although it seems that some debug info ends up there as well, but I'm not sure which debug info in particular that might be.

Debug info isn't written to the device, it's used by the debugger running on your PC.

Quote

Are the variables that wind up in .relocate simply those which are explicitly initialized, versus .bss which are uninitialized or initialized to zero?

Initialized data and code that should be executed from RAM. The startup code copies the contents of the section from flash to its runtime location in RAM.

Quote

Is it unavoidable that libc will consume so much memory, and if so can I keep it out of RAM?

AFAIK, the impure_data section is related to re-entrancy of Newlib, and the suggested solution seems to be to use Newlib-nano. If you're not using any C library functions, you could also just not link it at all (-nodefaultlibs/-nostdlib).

ataradov · « **Reply #2 on:** October 18, 2016, 08:56:53 pm »

Quote from: ajb on October 18, 2016, 08:04:25 pm

I'm using the default Atmel Studio-generated flash linker script, which appears to be setting the the stack size to 2k. That seems rather high, especially since the RAM linker script has the stack at 1k. Would it be wise to reduce the stack size to 1k? I'm not doing a lot of nested function calls (again, no ASF), so I would think 2k is a bit excessive.

It is better to put stack at the end of the RAM and not worry about its exact size. This will also exclude it from memory size calculation, so you see totals of your actual variables.

ataradov · « **Reply #3 on:** October 18, 2016, 09:01:18 pm »

Also, look at the output of "arm-none-eabi-nm -Sn build/Bootloader.elf" (probably a bit cleaner than raw map file) and check if you see big things with names you don't recognize. Then investigate what those things are.

ajb · « **Reply #4 on:** October 19, 2016, 03:57:32 am »

Thanks, this is all quite helpful. Part of the reason I've stuck with Atmel Studio and AVRs as long as I have is so I didn't have to deal with all of this startup and linker stuff

But I'm learning things, and that's good!

I started a new bare project that I could poke and prod with impunity, and found that the default flash linker script has the stack as only 1k, unlike the other project I've been working with, not sure what's up with that. I tried building it with --specs=nano.specs, and my little blinky program builds and runs with no trouble and only 32 bytes of .bss+.relocate, which appears to be a result of the autogenerated startup and system files.

I'd like to know why the standard lib requires such a large RAM footprint. I see vague mentions that it's to do with reentrancy of library functions, but nothing more detailed than that.

Quote from: ataradov on October 18, 2016, 08:56:53 pm

It is better to put stack at the end of the RAM and not worry about its exact size. This will also exclude it from memory size calculation, so you see totals of your actual variables.

Here's the part where I admit my near total lack of linker knowledge. I've included the linker script for reference below. I see where the stack size is defined, and I see where .stack is described. I guess _sstack and _estack define the start and end of the stack, respectively, right? And if I'm understanding the overall structure of the linker script, the stack simply starts at the end of everything else, and ends 1K later, right? Is it as simple as setting _estack to the highest RAM address, and _sstack = _estack - 1k?

Clearly the linker is something I could stand to learn more about, as well as what's going on under the hood of a typical C program, so if anyone has any recommendations for good reading material that goes deep into those sorts of details I'd be happy to hear about it.

samd11d14am_flash.ld:

Code: [Select]

OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")
OUTPUT_ARCH(arm)
SEARCH_DIR(.)

/* Memory Spaces Definitions */
MEMORY
{
  rom      (rx)  : ORIGIN = 0x00000000, LENGTH = 0x00004000
  ram      (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00001000
}

/* The stack size used by the application. NOTE: you need to adjust according to your application. */
STACK_SIZE = DEFINED(STACK_SIZE) ? STACK_SIZE : DEFINED(__stack_size__) ? __stack_size__ : 0x400;

/* Section Definitions */
SECTIONS
{
    .text :
    {
        . = ALIGN(4);
        _sfixed = .;
        KEEP(*(.vectors .vectors.*))
        *(.text .text.* .gnu.linkonce.t.*)
        *(.glue_7t) *(.glue_7)
        *(.rodata .rodata* .gnu.linkonce.r.*)
        *(.ARM.extab* .gnu.linkonce.armextab.*)

        /* Support C constructors, and C destructors in both user code
           and the C library. This also provides support for C++ code. */
        . = ALIGN(4);
        KEEP(*(.init))
        . = ALIGN(4);
        __preinit_array_start = .;
        KEEP (*(.preinit_array))
        __preinit_array_end = .;

        . = ALIGN(4);
        __init_array_start = .;
        KEEP (*(SORT(.init_array.*)))
        KEEP (*(.init_array))
        __init_array_end = .;

        . = ALIGN(4);
        KEEP (*crtbegin.o(.ctors))
        KEEP (*(EXCLUDE_FILE (*crtend.o) .ctors))
        KEEP (*(SORT(.ctors.*)))
        KEEP (*crtend.o(.ctors))

        . = ALIGN(4);
        KEEP(*(.fini))

        . = ALIGN(4);
        __fini_array_start = .;
        KEEP (*(.fini_array))
        KEEP (*(SORT(.fini_array.*)))
        __fini_array_end = .;

        KEEP (*crtbegin.o(.dtors))
        KEEP (*(EXCLUDE_FILE (*crtend.o) .dtors))
        KEEP (*(SORT(.dtors.*)))
        KEEP (*crtend.o(.dtors))

        . = ALIGN(4);
        _efixed = .;            /* End of text section */
    } > rom

    /* .ARM.exidx is sorted, so has to go in its own output section.  */
    PROVIDE_HIDDEN (__exidx_start = .);
    .ARM.exidx :
    {
      *(.ARM.exidx* .gnu.linkonce.armexidx.*)
    } > rom
    PROVIDE_HIDDEN (__exidx_end = .);

    . = ALIGN(4);
    _etext = .;

    .relocate : AT (_etext)
    {
        . = ALIGN(4);
        _srelocate = .;
        *(.ramfunc .ramfunc.*);
        *(.data .data.*);
        . = ALIGN(4);
        _erelocate = .;
    } > ram

    /* .bss section which is used for uninitialized data */
    .bss (NOLOAD) :
    {
        . = ALIGN(4);
        _sbss = . ;
        _szero = .;
        *(.bss .bss.*)
        *(COMMON)
        . = ALIGN(4);
        _ebss = . ;
        _ezero = .;
    } > ram

    /* stack section */
    .stack (NOLOAD):
    {
        . = ALIGN(8);
        _sstack = .;
        . = . + STACK_SIZE;
        . = ALIGN(8);
        _estack = .;
    } > ram

    . = ALIGN(4);
    _end = . ;
}

ataradov · « **Reply #5 on:** October 19, 2016, 04:05:10 am »

Quote from: ajb on October 19, 2016, 03:57:32 am

Thanks, this is all quite helpful. Part of the reason I've stuck with Atmel Studio and AVRs as long as I have is so I didn't have to deal with all of this startup and linker stuff

That's because those linker scripts were written by thinking people

Quote from: ajb on October 19, 2016, 03:57:32 am

I'd like to know why the standard lib requires such a large RAM footprint. I see vague mentions that it's to do with reentrancy of library functions, but nothing more detailed than that.

It normally does not. You need to check what exactly is using all that memory.

Quote from: ajb on October 19, 2016, 03:57:32 am

And if I'm understanding the overall structure of the linker script, the stack simply starts at the end of everything else, and ends 1K later, right?

That is correct. Here https://github.com/ataradov/mcu-starter-projects you will find really small and simple starter projects for various MCUs that include better linker files.

To fix that linker file, remove the whole ".stack (NOLOAD): {}" section and add "PROVIDE(_estack = ORIGIN(ram) + LENGTH(ram));" instead. This will move the stack to the bottom of the ram and it can use all remaining memory, as it should.

ajb · « **Reply #6 on:** October 19, 2016, 04:54:26 am »

Quote from: ataradov on October 19, 2016, 04:05:10 am

Quote from: ajb on October 19, 2016, 03:57:32 am
Thanks, this is all quite helpful. Part of the reason I've stuck with Atmel Studio and AVRs as long as I have is so I didn't have to deal with all of this startup and linker stuff
That's because those linker scripts were written by thinking people

And I immensely respect those people and very much appreciate their work! I've just always been more interested in designing the hardware and in writing the actual application than mucking around in linker scripts and toolchain settings. Each to their own; I'm grateful for those hardworking people who have heretofore facilitated my blissful ignorance.

Quote from: ataradov on October 19, 2016, 04:05:10 am

Quote from: ajb on October 19, 2016, 03:57:32 am
I'd like to know why the standard lib requires such a large RAM footprint. I see vague mentions that it's to do with reentrancy of library functions, but nothing more detailed than that.
It normally does not. You need to check what exactly is using all that memory.

Even creating a brand new empty project in Atmel Studio (using the "GCC C Executable" Template, no ASF) and immediately building it with NO additional user code results in >1k of RAM usage. That's with no printf or any of its ilk, no math.h, nothing but main(){while(1){}}. Again, lib_a.impure.o comprises almost all of the non-stack RAM allocation. (In my real project I've explicitly avoided printf and math.h because of the ridiculous code size impact, but even so they didn't impact RAM utilization that badly.) Using --specs=nano.specs drops that down to practically nothing, so that seems to be a good solution, but I'd like to have a better understanding of what the actual difference is and why it's so drastic.

Thank you for your github link, I will take a look at it tomorrow.

ataradov · « **Reply #7 on:** October 19, 2016, 05:11:48 am »

Quote from: ajb on October 19, 2016, 04:54:26 am

the actual application than mucking around in linker scripts and toolchain settings

I personally consider linker scripts to be a part of the program rather than toolchain, so I like that ARM world makes me carry a linker script, even if it is a stock one. At some point you will need to make a change - to accommodate a bootloader, or some special are in flash that your application needs, and that's where you have everything ready and no drastic changes are needed for ARM project.

Quote from: ajb on October 19, 2016, 04:54:26 am

Again, lib_a.impure.o comprises almost all of the non-stack RAM allocation.

Application from that link compiles with this size results:

Quote

size:
text     data     bss     dec     hex   filename
660     0     0     660     294   build/Demo.elf
660     0     0     660     294   (TOTALS)

And there are custom AS projects. I have no idea what settings AS uses by default, but they are clearly not great for a clean project.

ale500 · « **Reply #8 on:** October 19, 2016, 09:34:46 am »

Something to also have in mind is that globals (and statics inside functions) end up in ram: let's say that you have some initialized array, a key-map or something like that, if such things are not initialized using const they end up in flash and are copied in ram, the compiler doesn't know that you are not going to modify them (but lint probably does, some versions of it).

Code: [Select]

char some_array[16] = { 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8 };

This is better because it only lives in flash:

Code: [Select]

const char some_array[16] = { 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8 };

westfw · « **Reply #9 on:** October 19, 2016, 10:15:13 am »

I've also been under the impression that "newlib" originated on rather larger "embedded systems" than the current crop of Cortex microcontrollers - those embedded ARM7/ARM9 things with several MB of RAM... I'm rather surprised that AS7 doesn't default to using the "nano" version which is smaller. (perhaps it does, and it's still big compared to the SAMD11?)

For example:

Quote

Newlib's standard input and output facilities are surprisingly complete. The complete C file API is also provided, complete with read and write buffering, seeking, and stream flushing capabilities. Variations like sprintf, fprintf, and vfprintf (takes va_list arguments) are also included, which makes a newlib environment look strikingly similar to one you'd expect to see in a more workstation-oriented programming environment.

Someone should port avr-libc :-)

snarkysparky · « **Reply #10 on:** October 20, 2016, 12:16:49 pm »

Does the linker link in functions that are not even called in the code if a library is included. If so WHY ??

Kalvin · « **Reply #11 on:** October 20, 2016, 12:51:18 pm »

Quote from: snarkysparky on October 20, 2016, 12:16:49 pm

Does the linker link in functions that are not even called in the code if a library is included. If so WHY ??

The linker needs options to be enabled in order to remove the unused functions. The C compiler may need some compiler options in order to place each function in its own section. That's why.

Spend a few minutes reading the documentation, and get to know how to read the linker map files and how to make the compiler to produce assembly output.

andersm · « **Reply #12 on:** October 20, 2016, 12:54:20 pm »

Quote from: snarkysparky on October 20, 2016, 12:16:49 pm

Does the linker link in functions that are not even called in the code if a library is included. If so WHY ??

In the GNU toolchain, libraries are linked at object file resolution. If the library is built so that each function is placed in its own section, unused functions can be removed via the linker's garbage collector.

mikeselectricstuff · « **Reply #13 on:** October 20, 2016, 09:17:43 pm »

Quote from: ale500 on October 19, 2016, 09:34:46 am

Something to also have in mind is that globals (and statics inside functions) end up in ram: let's say that you have some initialized array, a key-map or something like that, if such things are not initialized using const they end up in flash and are copied in ram, the compiler doesn't know that you are not going to modify them (but lint probably does, some versions of it).

Code: [Select]
char some_array[16] = { 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8 };
This is better because it only lives in flash:

Code: [Select]
const char some_array[16] = { 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8 };

There are some compilers that put const arrays in RAM unless you add a qualifier to force them into flash. A quick check is to add another element and see if RAM usage changes.
Some linkers include a default allocation for a heap, which is unused if malloc isn't used ( which it generally shouldn't be but that's a whole 'nother thread..). Note Printf may use heap and/or significant stack.
Unless you're using a lot of formatting, printf can often be replaced with much simpler application-specific function. Some compilers have options to select different flavours of printf.

dgtl · « **Reply #14 on:** October 20, 2016, 10:36:08 pm »

Enable function-sections and data-sections compiler and linker flags and -Wl,--gc-sections linker flag. These flags make each function and each global (or local static) variable put in separate linker section. (so you have .text.myfunc1, .text.myfunc2 etc, not just one big .text).
This has 2 advantages:
* The linker can check what is needed and what is not and throw out the functions that are not referenced but are not static (thus have global linkage)
* The linker output map file has detailed information about every function and global variable flash/ram usage.
In addition to viewing a plain map file, a useful visualization about where the space goes is trace_analyze.py tool made for linux kernel ( https://github.com/ezequielgarcia/trace_analyze ). This can be used to generate a png donut diagram corresponding to project folder structure. Some minor changes are needed to graph .data and .text sections separately.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: What's eating my RAM? (Atmel SAM D11) (Read 5794 times)

ajb

What's eating my RAM? (Atmel SAM D11)

andersm

Re: What's eating my RAM? (Atmel SAM D11)

ataradov

Re: What's eating my RAM? (Atmel SAM D11)

ataradov

Re: What's eating my RAM? (Atmel SAM D11)

ajb

Re: What's eating my RAM? (Atmel SAM D11)

ataradov

Re: What's eating my RAM? (Atmel SAM D11)

ajb

Re: What's eating my RAM? (Atmel SAM D11)

ataradov

Re: What's eating my RAM? (Atmel SAM D11)

ale500

Re: What's eating my RAM? (Atmel SAM D11)

westfw

Re: What's eating my RAM? (Atmel SAM D11)

snarkysparky

Re: What's eating my RAM? (Atmel SAM D11)

Kalvin

Re: What's eating my RAM? (Atmel SAM D11)

andersm

Re: What's eating my RAM? (Atmel SAM D11)

mikeselectricstuff

Re: What's eating my RAM? (Atmel SAM D11)

dgtl

Re: What's eating my RAM? (Atmel SAM D11)

Share me

Section	Size	Length	Number of Records
.text	9264	9264	80
.vfp11_veneer	0	0	0
.v4_bx	0	0	0
.iplt	0	0	0
.eh_frame	0	0	0
.rel.dyn	0	0	0
.jcr	0	0	0
.igot.plt	0	0	0
.ARM.exidx	0	0	0
.relocate	1444	1444	11
.bss	470	470	18
.stack	0	0	0
.ARM.attributes	1176	1176	27
.comment	1079	1079	12
.debug_info	77316	77316	12
.debug_abbrev	5642	5642	12
.debug_loc	5018	5018	11
.debug_aranges	712	712	12
.debug_ranges	664	664	11
.debug_macro	62317	62317	83
.debug_line	15950	15950	12
.debug_str	379729	379729	12
.debug_frame	1844	1844	20