Author Topic: A question on GCC .ld linker script syntax  (Read 7945 times)

0 Members and 1 Guest are viewing this topic.

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3640
  • Country: gb
  • Doing electronics since the 1960s...
Re: A question on GCC .ld linker script syntax
« Reply #75 on: March 22, 2023, 08:37:28 am »
Back to the topic, I think I managed to catch all relevant __weak functions which are used but not being replaced with real code, but I am not 100% sure.

They are easy enough to find with a Search, without even using objcopy etc to list them. If you have say 100 of them then it may take you some hours to see if any don't tie up, and put
#if 0
#endif
around any suspicious ones, and this is safe because the linker will complain if you went too far.
« Last Edit: March 22, 2023, 10:38:49 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6130
  • Country: fi
    • My home page and email address
Re: A question on GCC .ld linker script syntax
« Reply #76 on: March 22, 2023, 11:54:02 am »
Win7-64. However I have cygwin and so have bash and gawk, but not awk. 267 executables ...

I have a dir called awk but it contains grcat and pwcat.
Okay.  I don't use Windows, so the following script might need massaging wrt. path separator stuff.

AFAICT the ST code was never intended for a library. It is in source form and is meant to be loaded into Cube IDE in that form. I suppose somebody might have then made a .a lib out of it but I haven't come across that. Can't see the point of a lib if you have the sources, ever.

They supply newlib printf etc as a lib and w/o sources and that lib is not weak ;) As previously posted, I had to weaken it to replace the printf code.
Heh, so it is cargo cult programming on behalf of ST, then.  Not a big surprise, though; Enterprise-grade code is rarely of high (or even medium) quality.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6130
  • Country: fi
    • My home page and email address
Re: A question on GCC .ld linker script syntax
« Reply #77 on: March 22, 2023, 02:04:02 pm »
Here is the script I cobbled together.  It is not intended to be minimal or the fastest possible; I wanted it to be as easily modified and adapted as possible.
Save it somewhere as say checksym.sh or something, make it executable, and specify the object and archive names on the command line.
Code: [Select]
#!/bin/bash
# SPDX-License-Identifier: CC0-1.0
# -*- coding: utf-8 -*-

OBJDUMP="${OBJDUMP:-arm-none-eabi-objdump}"
AWK="${AWK:-gawk}"
SED="${SED:-sed}"
SORT="${SORT:-sort}"
XARGS="${XARGS:-xargs}"
TOUCH="${TOUCH:-touch}"

# Use default locale, since we're parsing command outputs
export LANG=C LC_ALL=C

# Create an auto-removed temporary directory for our temporary files
Work="$(mktemp -d)" || exit 1
trap "rm -rf '$Work'" EXIT

# We list all object files in "$Work/objfiles",
# and all libraries/archives in "$Work/libfiles",
# one name or path per line.
printf '%s\n' "$@" | "$SED" -ne '/\.o$/p' > "$Work/objfiles"
printf '%s\n' "$@" | "$SED" -ne '/\.a$/p' > "$Work/libfiles"
# Note that you can replace this with e.g.
#   find . -name '*.o' -printf '%p\n' > "$Work/objfiles"
#   find . -name '*.a' -printf '%p\n' > "$Work/libfiles"
# Feel free to replace the above with whatever mechanism you like.

# To ensure we only look at each library file once, we sort the list.
# The sed is magic: it changes to tab separators, with symbol table
# having five entries (addr,flags,section,addr,name), and file names
# two entries (object,archive):
"$SORT" -u "$Work/libfiles" | "$XARGS" -r -d '\n' "${OBJDUMP}" -t \
 | "$SED" -e '/^$/d ; /^SYMBOL TABLE:/d ; s|^In archive \(.*\): *$|\t\1|; s|: \+file format .*$|\t|; s| \([^ \t]\+\t[0-9A-Fa-f]\+\) |\t\1\t| ; s|^\([0-9A-Fa-f]\+\) |\1\t|' \
 > "$Work/symbols"
#
# Object files are handled in a very similar manner, with the
# only exception being that there are no archive file names, and object file names are single-field records.
"$SORT" -u "$Work/objfiles" | "$XARGS" -r -d '\n' "${OBJDUMP}" -t \
 | "$SED" -e '/^$/d ; /^SYMBOL TABLE:/d ; s|: \+file format .*$||; s| \([^ \t]\+\t[0-9A-Fa-f]\+\) |\t\1\t| ; s|^\([0-9A-Fa-f]\+\) |\1\t|' \
 >> "$Work/symbols"

#
# We use awk to process the combined symbol list.  There are four types of lines/records, with TAB separators:
#   object-file-name                        Names an object file as the source for the following symbols
#   object-file-name<TAB>                   Names an object file within the current archive/library for the following symbols
#   <TAB>archive-file-name                  Names the archive/library for the following symbols
#   hex<TAB>flags<TAB>section<TAB>symbol    Names a symbol.  Flags are per objdump -t. References name *UND* as their section.
"$AWK" 'BEGIN {
            FS = "\t";
            split("", funs)
            split("", objs)
            split("", refs)
            aname = ""  # Archive file name
            oname = ""  # Object file name within an archive
            fname = ""  # File name (or combined aname ":" oname)
        }

        NF==1 { # Solitary object file name
            aname = ""
            oname = ""
            fname = $1
        }

        NF==2 { # Archive file
            if (length($2) > 0) {
                aname = $2
                oname = ""
                fname = aname
            } else
            if (length($1) > 0) {
                oname = $1
                fname = aname ":" oname
            }
        }

        NF==5 { # Symbol table record

            # Only consider symbols that start with _ or a letter
            if (!($5 ~ /^[_A-Za-z]/)) next;

            # Skip local, debug, dynamic, indirect, file, and warning symbols
            if ($2 ~ /[lWIiDdf]/) next;

            # Skip common and absolute-address stuff
            if ($3 == "*COM*" || $3 == "*ABS*") next;

            # If the symbol or reference is weak, we prefix the symbol name with !.
            if ($2 ~ /w/) {
                weak = 1
                sym = "!" $5
            } else {
                weak = 0
                sym = $5
            }

            if ($3 == "*UND*") {
                # Symbol reference. Add file name to refs under this symbol.
                if (sym in refs)
                    refs[sym] = refs[sym] "\t" fname
                else
                    refs[sym] = fname
            } else {
                if ($2 ~ /F/) {
                    # Function definition
                    if (sym in funs)
                        funs[sym] = funs[sym] "\t" fname
                    else
                        funs[sym] = fname
                } else {
                    # Non-function definition
                    if (sym in objs)
                        objs[sym] = objs[sym] "\t" fname
                    else
                        objs[sym] = fname
                }
            }
        }

        END {

            # Find strong function definitions defined in more than one file
            split("", syms)
            for (sym in funs)
                if (!(sym ~ /^!/) && (funs[sym] ~ /\t/))
                    syms[sym] = funs[sym]
            if (length(syms) > 0) {
                printf "%d duplicate (non-weak) function definitions:\n", length(syms)
                for (sym in syms)
                    printf "  %s in %s\n", sym, syms[sym]
                printf "\n"
            } else {
                printf "There are no duplicate (non-weak) function definitions.\n\n"
            }

            # Find weak function definitions without corresponding strong symbol definitions
            split("", syms)
            for (wsym in funs) if (wsym ~ /^!/) {
                sym = substr(wsym, 2)
                if (!(sym in funs))
                    syms[sym] = funs[wsym]
            }
            if (length(syms) > 0) {
                printf "%d weak functions without strong function definitions:\n", length(syms)
                for (sym in syms)
                    printf "  %s defined in %s\n", sym, syms[sym]
                printf "\n"
            } else {
                printf "All weak function definitions have corresponding strong function definitions.\n\n"
            }

            # Find strong function symbols that are never referenced
            split("", syms)
            for (sym in funs) if (!(sym ~ /^!/)) {
                wsym = "!" sym
                if (!(sym in refs) && !(wsym in refs))
                    syms[sym] = funs[sym]
            }
            if (length(syms) > 0) {
                printf "%d (non-weak) functions that are never referenced:\n", length(syms)
                for (sym in syms)
                    printf "  %s defined in %s\n", sym, syms[sym]
                printf "\n"
            } else {
                printf "All (non-weak) functions are referenced at least once.\n\n"
            }

            # Find weak function symbols that are never referenced
            split("", syms)
            for (wsym in funs) if (wsym ~ /^!/) {
                sym = substr(wsym, 2)
                if (!(sym in refs) && !(wsym in refs))
                    syms[sym] = funs[wsym]
            }
            if (length(syms) > 0) {
                printf "%d weak functions that are never referenced:\n", length(syms)
                for (sym in syms)
                    printf "  %s defined in %s\n", sym, syms[sym]
                printf "\n"
            } else {
                printf "All weak functions are referenced at least once.\n\n"
            }

            # Find references that cannot be resolved
            split("", syms)
            for (ref in refs) {
                if (ref ~ /^!/) {
                    sym = substr(ref, 2)
                    wsym = ref
                } else {
                    sym = ref
                    wsym = "!" ref
                }
                if (!(sym in funs) && !(wsym in funs) && !(sym in objs) && !(wsym in objs)) {
                    if (sym in syms)
                        syms[sym] = syms[sym] "\t" refs[ref]
                    else
                        syms[sym] = refs[ref]
                }
            }
            if (length(syms) > 0) {
                printf "%d unresolved symbols:\n", length(syms)
                for (sym in syms)
                    printf "  %s referenced in %s\n", sym, syms[sym]
                printf "\n"
            } else {
                printf "No unresolved symbols.\n\n"
            }

        }' "$Work/symbols"
This has only been tested on Linux.

I used OBJDUMP, SED, etc. for the corresponding executables' pathnames.  The Bash substitution VAR="${VAR:-default}" uses the already set non-empty value, or default if none set or empty.  So, one can use e.g. bash -c 'AWK=some-other-awk-variant checksym.sh' to override the script-set value.

Since we parse objdump output, we set the default C locale, to ensure the output is not unexpectedly localized.  (Compare e.g. date and LANG=C LC_ALL=C date).

printf is a bash builtin, a bit nicer than echo.  It is used to save all object file names, one per line, to "$Work/objfiles", and all library or archive file names, one per line, to "$Work/libfiles".  These are separated only so that we can differentiate between standalone object files and object files within an archive in our output.

The reason we kick the names to a file, is so that huuuge projects with tens of thousands of files, can be supported.  On some systems, the number of command-line parameters is limited, you see.  If you encounter that limit, then this initial part of the script can be modified to pass library and object file names in a different way.  (-@file-name would be a common way we could use to specify a file containing object or archive/library file names.  We could also specify name patterns looked for in an entire subtree, via find.)

Application startup causes significant latencies in batch scripts.  To minimize this, and to ensure we only dump each library or object file once, we sort the file containing the names of files to be processed, and feed it to xargs, which executes objdump with as many parameters as is possible, for all files.

The objdump -t output is fed through a SED script, that manipulates the output in the following ways:
  • /^$/d;
    Deletes empty lines
  • /^SYMBOL TABLE:/d;
    Deletes lines beginning with SYMBOL TABLE:
  • s|^In archive \(.*\): *$|\t\1|;
    Replaces lines beginning with In archive and ending with a colon with a TAB and the archive name
  • s|: \+file format .*$|\t|;
    Replaces colon, space(s), followed by "file format", with a single TAB skipping the rest of the line.
  • s| \([^ \t]\+\t[0-9A-Fa-f]\+\) |\t\1\t| ;
    If there is a space followed by a token, tab, a hexadecimal number, and a space, replaces those spaces with TABs
  • s|^\([0-9A-Fa-f]\+\) |\1\t|
    Replaces the space following the first hexadecimal number with a TAB.
The object file sed is similar, except it omits the archive name, and the object file name itself is converted to a single-field record.
Both are emitted to $Work/symbols, which contains TAB-separated fields, one record per line.  It has
  • Records with two fields naming an object file within an archive.
    If the first field is empty, the second field names a new archive file.  If the second field is empty, the first field names the object file.
    I use the convention archive-file:object-file for these, in the above script
  • Records with just one field name a separate object file (not inside any archive).
  • Records with five fields specify a symbol table entry.
    First and fourth fields are hexadecimal numbers, and not interesting.
    Second field contains flag characters:
      g: Global
      u: Unique global
      !: Global and local
      w: Weak
      F: Function
    For others, see man 1 objdump, under -t or --syms option.
Finally, we feed the $Work/symbols to awk, which tracks the file name (for each record), adding symbol references (those with section *UND*) to refs[] array, function definitions to funs[] array, and other object definitions to objs[] array.  Weak symbols are internally differentiated by adding a ! in front of the symbol name.
Each of the three arrays (funs, objs, and refs) has the symbol name as a key, preceded with a ! if the symbol or reference is weak, and the value is a TAB-separated list of file names where the definition or reference occurs in.

The report is output in the END rule, which is triggered after all input records have been processed.

If any non-weak function symbol in funs has a value with a TAB in it, it is defined in more than one file.

If there is a weak function symbol in funs without a corresponding non-weak symbol (say, there is !foo but no foo), then we have a weak function symbol definition without a corresponding strong symbol definition.

Each key in funs must be defined in refs also, or that key (weak or non-weak function) is not referenced at all.  Such functions are either unnecessary, or unnecessarily global.  (They might be used in the same object file they are defined in; to resolve these, we'd need to look at the relocation table for that particular object file.)

If there is a key in refs, but no corresponding (weak or non-weak) key in funs or objs, we have a dangling, unresolvable reference.
« Last Edit: March 22, 2023, 02:06:14 pm by Nominal Animal »
 
The following users thanked this post: peter-h

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3640
  • Country: gb
  • Doing electronics since the 1960s...
Re: A question on GCC .ld linker script syntax
« Reply #78 on: June 30, 2023, 02:34:56 pm »
Unfortunately, after some hours on this, I am asking for help again with the unbelievable GCC LD syntax





Thank you in advance :)

If I simply comment-out line 271, I get a syntax error on line 281.

The purpose of those two blocks is to place initialised data for main.c into RAM first, and then place all other initialise data into RAM after that. I then repeat the exercise for BSS (main.c first, rest later).
« Last Edit: June 30, 2023, 02:37:05 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8090
  • Country: fi
Re: A question on GCC .ld linker script syntax
« Reply #79 on: June 30, 2023, 02:50:25 pm »
I have never seen either a dummy, empty output section entry (line 268) or anonymous output section (line 281) in any linker script file. Documentation ( https://sourceware.org/binutils/docs/ld/Output-Section-Description.html ) does show section name and braces as mandatory:

"The colon and the curly braces are also required."

I don't think there is anything "unbelievable" in this. What do you think omitting the content would do? It's like, in C, doing

Code: [Select]
    switch(var)

    go_on_with_program_forgetting_the_cases();

and then complaining the compiler is stupid.

If you comment line 271 out, then that part becomes valid because line 268 is now the output section name for braces starting at 272, but there is another error, at line 280 you would need another output section name.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3640
  • Country: gb
  • Doing electronics since the 1960s...
Re: A question on GCC .ld linker script syntax
« Reply #80 on: June 30, 2023, 03:07:59 pm »
Thanks.

The label all_nonboot_data was intended to facilitate
_si_nonboot_data = LOADADDR(.all_nonboot_data);
which in turn is used in

Code: [Select]
// Initialise DATA

extern char _s_nonboot_data;
extern char _e_nonboot_data;
extern char _si_nonboot_data;
memcpy(&_s_nonboot_data, &_si_nonboot_data, &_e_nonboot_data - &_s_nonboot_data);

I have now solved it (but need to examine the memory to make sure it is actually working) as follows

Code: [Select]

 
/* Initialized data sections for non boot block code. These go into RAM. LMA copy is loaded after code. */
/* This stuff is copied from FLASH to RAM by C code in the main stub */
. = ALIGN(4);

/* main.c stuff is loaded first, for RAM address consistency between factory and customer code */
  .XXX_main_data :
  {
    . = ALIGN(4);
    _s_nonboot_data = .;        /* create a global symbol at data start */
    KEEP(*(.XXX_main))
    *XXX_main.o (.data .data*)      /* .data sections */
    . = ALIGN(4);
  } >RAM  AT >FLASH_APP

/* Remaining DATA stuff */
.XXX_other_data :
  {
    . = ALIGN(4);
    *(.data .data*)      /* .data sections */
      . = ALIGN(4);
    _e_nonboot_data = .;        /* define a global symbol at data end */
  } >RAM  AT >FLASH_APP

  /* used by the main stub C code to initialize data */
  _si_nonboot_data = LOADADDR(.XXX_main_data);


  /* Uninitialized data section (BSS) for non block boot code */
/* This stuff is zeroed by C code in the main stub */
 
  .XXX_main_bss :
  {
    . = ALIGN(4);
    _s_nonboot_bss = .;        /* create a global symbol at BSS start */
    KEEP(*(.XXX_main))
    *XXX_main.o (.bss .bss* .COMMON .common .common*)      /* .bss sections */
    . = ALIGN(4);
  } >RAM
 
  /* Remaining BSS stuff */
 
  .XXX_other_bss :
  {
      . = ALIGN(4);
    *(.bss .bss* .COMMON .common .common*)
    . = ALIGN(4);
    _e_nonboot_bss = .;          /* define a global symbol at BSS end */
  } >RAM
 

I don't mind being called an idiot so long as I am offered a solution ;)

Those weird sections were in the original ST linkfiles.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8090
  • Country: fi
Re: A question on GCC .ld linker script syntax
« Reply #81 on: June 30, 2023, 03:29:01 pm »
That : thing is not just some arbitrary label, it's part of output section description syntax, and mandatory parts are output section name, the :, and {}.

Assignment using LOADADDR is completely different beast. https://sourceware.org/binutils/docs/ld/SECTIONS.html lists the four possible kinds of things that can appear in SECTIONS. Your example uses two of them, symbol assignments and output section descriptions.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3640
  • Country: gb
  • Doing electronics since the 1960s...
Re: A question on GCC .ld linker script syntax
« Reply #82 on: June 30, 2023, 03:54:52 pm »
This stuff is a whole new world. One spends years to understand how to write C at a basic level and then some months to learn how GCC linkfiles work.

I've been doing this stuff for 40 years and have never seen anything as tacky as GCC LD syntax. It's obvious from projects one finds online that most people just use the one which came with the development board (which is basically what I did, a few years ago) and they never change it.

It's a whole new paradigm, whole new philosophy to learn.

It seems to run OK and the memcpy and memset statements are processing the correct addresses

Code: [Select]
// Initialise DATA

extern char _s_nonboot_data;
extern char _e_nonboot_data;
extern char _si_nonboot_data;
memcpy(&_s_nonboot_data, &_si_nonboot_data, &_e_nonboot_data - &_s_nonboot_data);

// Zero BSS and COMMON

extern char _s_nonboot_bss;
extern char _e_nonboot_bss;
memset(&_s_nonboot_bss, 0, &_e_nonboot_bss - &_s_nonboot_bss);

There is still a strange problem:

I have these two sections

Code: [Select]

  .XXX_main_bss :
  {
    . = ALIGN(4);
    _s_nonboot_bss = .;        /* create a global symbol at BSS start */
    KEEP(*(.XXX_main))
    *XXX_main.o (.bss .bss* .COMMON .common .common*)      /* .bss sections */
    . = ALIGN(4);
  } >RAM
 
  /* Remaining BSS stuff */
 
  .XXX_other_bss :
  {
      . = ALIGN(4);
    *(.bss .bss* .COMMON .common .common*)
    . = ALIGN(4);
    _e_nonboot_bss = .;          /* define a global symbol at BSS end */
  } >RAM
 
 

The intention is to have BSS variables from XXX_main.o first and then all other BSS variables after that. But it isn't working, looking at their addresses in the .map file.

The allocation of variables in the .map file (in address order) is fairly random and not in the order of declaration in the .c file. I guess this is not guaranteed.
« Last Edit: June 30, 2023, 05:02:58 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3640
  • Country: gb
  • Doing electronics since the 1960s...
Re: A question on GCC .ld linker script syntax
« Reply #83 on: July 01, 2023, 04:31:54 pm »
I have another LD syntax question:



I get a LD syntax error at line 160.

Yet the same format is used elsewhere in the linkfile. Removing RAM AT removes the error but then I have code which is linked to run in FLASH and not in the RAM.

The intention is for this to work

Code: [Select]

// === At this point, interrupts and DMA must still be disabled ====
// Execute loader. Reboots afterwards.
// Parameters for loader are in the SSA, although they could have simply
// been passed as function parameters in loader_entry().

extern char _loader_start;
extern char _loader_end;
extern char _loader_ram_start;

// Copy loader code and its init data to RAM.
B_memcpy(&_loader_ram_start, &_loader_start, &_loader_end - &_loader_start);

// See comments in loader.c for why the long call.

extern void loader_entry() __attribute__((long_call));
loader_entry();

// never get here (loader always reboots)
for (;;);

It worked previously but the RAM location for the loader was being set by a MEMORY statement which generates complaints by the linker that it was overlapping some other stuff (which was true but physically irrelevant) and while I spent days looking for ways to specify the execution address for code in some other ways (including posts here) I never found anything which was supported by the arm32 GCC LD.

If there was a way to specify an execution address, either in the linkfile or in the .c file, that would do the job because when the loader is running I have the whole 128k/192k RAM to play with. The only place it cannot go is the CCM which cannot run code; I use it for the loader stack and various buffers.

An info much appreciated as always.

I thought that perhaps it doesn't like "text" being placed in RAM because at that point it can't know the execution address for the code, but removing the "text" parts does not change it.
« Last Edit: July 01, 2023, 04:34:03 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6130
  • Country: fi
    • My home page and email address
Re: A question on GCC .ld linker script syntax
« Reply #84 on: July 01, 2023, 04:54:53 pm »
What is RAM?  Shouldn't that be
    } >RAM AT>FLASH_BOOT
per Output Section Attributes?
 
The following users thanked this post: peter-h

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3640
  • Country: gb
  • Doing electronics since the 1960s...
Re: A question on GCC .ld linker script syntax
« Reply #85 on: July 01, 2023, 06:30:53 pm »
Oh bugger I do apologise for something so simple!

It actually opened up a can of worms... the linkfile section is not collecting anything from b_loader.o

Code: [Select]

  /* b_loader.o - code goes last in the boot block (no special need for that) */
  /* code and initialised data is all lumped together */
  /* **** The loader must not use any BSS **** */
  /* The loader code is located to execute after the end of BSS above and gets */
  /* copied there in b_main */
     
  .b_loader_all :
  {
    . = ALIGN(4);
    _loader_ram_start = .;
    KEEP(*(.b_loader))
    *b_loader.o (.text .text* .rodata .rodata* .data .data*)
      . = ALIGN(4);
      _loader_ram_end = .;
  } >RAM AT>FLASH_BOOT
 
  _loader_flash_start = LOADADDR(.b_loader_all);
 
  /* this is just for .map file lookup */
 
  _loader_size = _loader_ram_end - _loader_ram_start;

Code: [Select]

.b_loader_all   0x000000002000fba0        0x0 load address 0x0000000008001ed8
                0x000000002000fba0                . = ALIGN (0x4)
                0x000000002000fba0                _loader_ram_start = .
 *(.b_loader)
 *b_loader.o(.text .text* .rodata .rodata* .data .data*)
                0x000000002000fba0                . = ALIGN (0x4)
                0x000000002000fba0                _loader_ram_end = .
                0x0000000008001ed8                _loader_flash_start = LOADADDR (.b_loader_all)
                0x0000000000000000                _loader_size = (_loader_ram_end - _loader_ram_start)

Notably _loader_size=0.

It is for this

Code: [Select]
extern char _loader_ram_start;
extern char _loader_ram_end;
extern char _loader_flash_start;

// Copy loader code and its init data to RAM.
B_memcpy(&_loader_ram_start, &_loader_flash_start, &_loader_ram_end - &_loader_ram_start);

I've had these constructs working before but this time there is something else. I did check that no earlier linkfile statement is referencing b_loader.o and stealing the code.

I think this block may be stealing the loader stuff

Code: [Select]
/* This collects all other stuff, which gets loaded into FLASH */
    .code_constants_etc :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _e_code_constants_etc = .;        /* define a global symbol at end of code */
} >FLASH_APP

but I can't put the loader block above this one because that would break something else.

Is there some way to ignore certain input e.g. ignore b_loader.o? That way a later block could pick it up. After a lot of time searching the LD manual and googling I found the EXCLUDE_FILE directive

Code: [Select]
 
/* This collects all other stuff, which gets loaded into FLASH */
    .code_constants_etc :
  {
 
      *(EXCLUDE_FILE(*b_loader.o) .text .text* .rodata .rodata* .data .data* )
     
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))
   
    . = ALIGN(4);
    _e_code_constants_etc = .;        /* define a global symbol at end of code */
} >FLASH_APP

but , hey, it doesn't work :) Either no error or another useless syntax error. The internet has loads of people trying the same thing and failing. It could be the ARM32 version of GCC LD doesn't support EXCLUDE_FILE.

EDIT: after spending many hours on this, I am sure that EXCLUDE_FILE doesn't work. Lots of examples on the web but not for ARM32 GCC LD, and some of them have typos so won't even compile. I think I have solved it now but only by a suitable section ordering.

There is also a /DISCARD/ directive but that discards the symbols permanently, there and then.
« Last Edit: July 02, 2023, 08:01:16 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 823
  • Country: es
Re: A question on GCC .ld linker script syntax
« Reply #86 on: July 02, 2023, 02:32:35 pm »
The answer is straight in the manual (again): https://sourceware.org/binutils/docs/ld/Input-Section-Basics.html
EXCLUDE_FILE doesn’t affect subsequent definitions, so your *(.text) etc below it are catching everything. Just comment them out, the EXCLUDE_FILE line itself also collects all sections listed in it from other files not matching the exclusion pattern.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3640
  • Country: gb
  • Doing electronics since the 1960s...
Re: A question on GCC .ld linker script syntax
« Reply #87 on: July 02, 2023, 03:14:30 pm »
I tried the exclude after the other statements, too.

I've given up on this now. A re-ordered linkfile seems to have done the job. Even reading that manual section is double dutch to me. One has to have a really deep understanding of this stuff.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 823
  • Country: es
Re: A question on GCC .ld linker script syntax
« Reply #88 on: July 02, 2023, 04:18:08 pm »
The ordering of exclude vs other *(xx) entries doesn’t matter. EXCLUDE_FILES itself does collect sections from it’s section list from all other files excluding the files from it’s exclusion list. It does not tell other collection statements to exclude anything. If they are present too - they work as usual (collecting everything in your case), you need to remove them completely.
 
The following users thanked this post: peter-h

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6130
  • Country: fi
    • My home page and email address
Re: A question on GCC .ld linker script syntax
« Reply #89 on: July 02, 2023, 05:05:00 pm »
OP did start a new thread on this latest sub-question, to which I constructed a rather long description of how to analytically go about building your own linker script, using a spreadsheet or plain text file first to model your address space.  I also posted an example and explanation of the EXCLUDE_FILE syntax.

I fully understand the frustration with trying to construct a linker script, before one truly groks the paradigm, the approach or model or idea behind how the linker is controlled.  Thing is, it isn't too complicated; it is just strange, in the sense that the hurdle is understanding how it is supposed to be used and how the language works.  Yes, it will take several hours to grok it, and yes, it is annoying as hell to have to do that, but unless you do, you will suffer when trying to do anything but the simplest changes to existing linker scripts.  (Me, I like to do experiments when learning.  In this case, one can create a really trivial "firmware" with just the minimum symbols – at least one for each section – but it does not need to be functional; then, starting with a minimal linker script, and making changes and storing the resulting map with the copy of the linker script, will let you [spend a lot of time and] learn the ins-and-outs of the syntax.  Which is nothing like C at all.)

Having the entire memory, address ranges and output sections and the linker defined symbols in a spreadsheet or diagram, does mean that anyone with sufficient familiarity with linker scripts can help with its implementation.  Working on just the linker script itself, we have the same age-old problem we have with programming without comments: we can see what the script is doing, but we don't know the actual intent; the spreadsheet or diagram provides that, if sufficiently well organized.
 
The following users thanked this post: peter-h, SiliconWizard

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3640
  • Country: gb
  • Doing electronics since the 1960s...
Re: A question on GCC .ld linker script syntax
« Reply #90 on: July 02, 2023, 07:45:31 pm »
FWIW my linker script is heavily commented. I spend half the time updating the comments.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6130
  • Country: fi
    • My home page and email address
Re: A question on GCC .ld linker script syntax
« Reply #91 on: July 02, 2023, 08:55:54 pm »
FWIW my linker script is heavily commented. I spend half the time updating the comments.
From experience, I can say the comments are nothing compared to an actual memory map table or diagram.

I know you don't want to spend the "extra" time creating such a spreadsheet or table, and would prefer just to get it working and move on to more interesting things, but I promise, you will save time overall if you do the spreadsheet or table, and include the logic and descriptions from your linker script comments as musings in the same file as the spreadsheet or table.

I do not use Inkscape, Dia, LibreOffice Calc, Graphviz, etc. (and obviously pen and paper!) just because I like pretty pictures.  I use them because they make it possible for me to do things I would not be able to do without.  Even sketching each of the memory regions (flash, RAM, closely-coupled RAM) as a bar, and then marking regions with }-marks or hatches with lines to explanations will help a lot.  Just remember to include everything, and not leave subtle details to work out in the script, because it is exactly those subtle details that WILL bite you if you don't prepare for them.

Tools.  Use 'em.  Boil 'em, mash 'em, put 'em in a stew.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3640
  • Country: gb
  • Doing electronics since the 1960s...
Re: A question on GCC .ld linker script syntax
« Reply #92 on: July 03, 2023, 08:37:45 pm »
A little Q on this linkfile line:

   RAM (xrw)           : ORIGIN = 0x20000000, LENGTH = 128K /*FOR 32F417 */

Am I right in that the ORIGIN = sets up the start of where all the >RAM directives deposit the variable allocations? Obviously that is right, but is there anything else? I cannot find any reference to "RAM" anywhere except in the >RAM directives.

Also AFAICT the LENGTH = parameter is used purely to check for a) section overlap and b) generating the % bargraphs in the Build Analyser display. My code runs exactly the same if I put in LENGTH = 1024K which is nowhere near physically present.

But I wonder whether Cube might be digging these out and using them for something. I don't think so; I think it gets everything from the ELF file and from Cube Debug settings.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3669
  • Country: us
Re: A question on GCC .ld linker script syntax
« Reply #93 on: July 03, 2023, 09:25:06 pm »
It is mostly used to detect if you try to link something that doesn't fit.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6130
  • Country: fi
    • My home page and email address
Re: A question on GCC .ld linker script syntax
« Reply #94 on: July 03, 2023, 10:42:03 pm »
A little Q on this linkfile line:

   RAM (xrw)           : ORIGIN = 0x20000000, LENGTH = 128K /*FOR 32F417 */

Am I right in that the ORIGIN = sets up the start of where all the >RAM directives deposit the variable allocations?
Yes, but only when there is no AT> (or there is a superfluous AT>RAM).

But I wonder whether Cube might be digging these out and using them for something.
Your toolchain includes objdump (it is part of binutils, just like ld is; they're part of the same package).  Try running
    objdump -fwh path-to-ELF-file
It will tell you the architecture, start address (ENTRY), and list all sections (if this is the linked ELF result, these are the output sections).
The VMA column contains the logical addresses (expected runtime addresses), whereas the LMA column contains the storage addresses (controlled by AT).

This, or the equivalent, is what Cube is looking at, and what the tool that generates the firmware .hex file from the ELF object file looks at.

is there anything else? I cannot find any reference to "RAM" anywhere except in the >RAM directives.
The access mode (Read, Write, eXecute) is stored in the ELF header file, but for microcontrollers without virtual memory etc. does not matter.  It is useful for us humans, I guess, though.

Technically, you don't need the MEMORY command, if you just specify the address and storage address for each output section; in that case, I'd use constant symbols to specify the start addresses, but I think it would just be messier and more text in the script.



Think of it this way:
    MEMORY { address space rules and memory region names }
    SECTIONS { output section definitions }
with each output section definition being
    outputsection { input section rules } >region
or
    outputsection { input section rules } >useregion AT>storageregion
(or one of the valid expressions involving symbols and the current output address).

If only region is specified, then useregion=region and storageregion=region.

useregion is where the linker assumes the contents (data or code) are during use.
storageregion is where the linker stuffs the data.

Most often you'll only see } >RAM AT>FLASH (in addition to normal } >region), because it tells the linker that "these output sections are stored somewhere in flash, but there is code that copies them to RAM before they are used".
 
The following users thanked this post: peter-h

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3640
  • Country: gb
  • Doing electronics since the 1960s...
Re: A question on GCC .ld linker script syntax
« Reply #95 on: July 04, 2023, 08:35:25 am »
Great; thanks.

It sounds like this issue
https://www.eevblog.com/forum/microcontrollers/32f417-32f437-auto-detect-of-extra-64k-ram/
is not related to having SIZE=192K with a 32F417.

Experimentally I can confirm that but I wondered if there was something else.

It does produce a confusing Build Analyser display and I just have to document that...
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6130
  • Country: fi
    • My home page and email address
Re: A question on GCC .ld linker script syntax
« Reply #96 on: July 04, 2023, 11:41:23 am »
Yup.

If you want to order your functions in a specific way, you can use
    __attribute__ ((section (".text_NNNNN"))) function-definition
if, in the linker script, you replace input section
    *(.text*)
with the following three input sections:
    *(SORT(.text_*)) *(.text) *(SORT(.text*))

Being two input sections, for example
    EXCLUDE_FILE(*loader.o *boot.o) *(.text*)
becomes
    EXCLUDE_FILE(*loader.o *boot.o) *(SORT(.text_*))
    EXCLUDE_FILE(*loader.o *boot.o) *(.text)
    EXCLUDE_FILE(*loader.o *boot.o) *(SORT(.text*))

SORT() sorts files or sections by name.  If you use a nonnegative integer for NNNNN, make sure you use a fixed number of digits, because .text_09 < .text_10 < .text_9.

A nonnegative integer NNNNN is also compatible with compiling with gcc -ffunction-sections option, as gcc typically uses section names .text.functionname then.  (When compiling with -ffunction-sections, the linker can omit unused sections, and therefore unused functions, when --gc-sections is used.)

In the binary, the .text_NNNNN sections will be first, in sorted order of NNNNN.  If a section contains more than one symbol, those symbols are in random order.  Next come functions without the section attribute, compiled without -ffunction-sections.  Finally come functions without the section attribute, compiled with -ffunction-sections, sections in sorted order (by function name).

(Edited to add a missing [/tt] tag that wonkified the text in the post.)
« Last Edit: July 04, 2023, 09:49:18 pm by Nominal Animal »
 
The following users thanked this post: peter-h

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3640
  • Country: gb
  • Doing electronics since the 1960s...
Re: A question on GCC .ld linker script syntax
« Reply #97 on: July 04, 2023, 09:19:11 pm »
That's a really interesting capability.

What order would the linker normally use when following the linkfile, say within just one block of it?

If it does it in the order in which .o files are found in the directory (using the traditional first-first and find-next directory listing method) then it could be totally random.

But looking at the .map file it looks like it is sorting the stuff by full pathname, which is interesting because it is sorting across the whole directory tree

Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6130
  • Country: fi
    • My home page and email address
Re: A question on GCC .ld linker script syntax
« Reply #98 on: July 04, 2023, 10:06:39 pm »
The files should occur in the order they are specified in the link command line by default.  Perhaps Cube sorts them?

Remember, ld does not actually go look for any file names, it only consideres the file names specified to the command, and uses that order by default.

Section order, however, can be affected by --sort-section=name, which causes all input section names to be wrapped in SORT, i.e. any section name glob pattern to be expanded in sorted order.  It shouldn't affect file name ordering at all, though.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf