Here is the script I cobbled together. It is not intended to be minimal or the fastest possible; I wanted it to be as easily modified and adapted as possible.
Save it somewhere as say
checksym.sh or something, make it executable, and specify the object and archive names on the command line.
#!/bin/bash
# SPDX-License-Identifier: CC0-1.0
# -*- coding: utf-8 -*-
OBJDUMP="${OBJDUMP:-arm-none-eabi-objdump}"
AWK="${AWK:-gawk}"
SED="${SED:-sed}"
SORT="${SORT:-sort}"
XARGS="${XARGS:-xargs}"
TOUCH="${TOUCH:-touch}"
# Use default locale, since we're parsing command outputs
export LANG=C LC_ALL=C
# Create an auto-removed temporary directory for our temporary files
Work="$(mktemp -d)" || exit 1
trap "rm -rf '$Work'" EXIT
# We list all object files in "$Work/objfiles",
# and all libraries/archives in "$Work/libfiles",
# one name or path per line.
printf '%s\n' "$@" | "$SED" -ne '/\.o$/p' > "$Work/objfiles"
printf '%s\n' "$@" | "$SED" -ne '/\.a$/p' > "$Work/libfiles"
# Note that you can replace this with e.g.
# find . -name '*.o' -printf '%p\n' > "$Work/objfiles"
# find . -name '*.a' -printf '%p\n' > "$Work/libfiles"
# Feel free to replace the above with whatever mechanism you like.
# To ensure we only look at each library file once, we sort the list.
# The sed is magic: it changes to tab separators, with symbol table
# having five entries (addr,flags,section,addr,name), and file names
# two entries (object,archive):
"$SORT" -u "$Work/libfiles" | "$XARGS" -r -d '\n' "${OBJDUMP}" -t \
| "$SED" -e '/^$/d ; /^SYMBOL TABLE:/d ; s|^In archive \(.*\): *$|\t\1|; s|: \+file format .*$|\t|; s| \([^ \t]\+\t[0-9A-Fa-f]\+\) |\t\1\t| ; s|^\([0-9A-Fa-f]\+\) |\1\t|' \
> "$Work/symbols"
#
# Object files are handled in a very similar manner, with the
# only exception being that there are no archive file names, and object file names are single-field records.
"$SORT" -u "$Work/objfiles" | "$XARGS" -r -d '\n' "${OBJDUMP}" -t \
| "$SED" -e '/^$/d ; /^SYMBOL TABLE:/d ; s|: \+file format .*$||; s| \([^ \t]\+\t[0-9A-Fa-f]\+\) |\t\1\t| ; s|^\([0-9A-Fa-f]\+\) |\1\t|' \
>> "$Work/symbols"
#
# We use awk to process the combined symbol list. There are four types of lines/records, with TAB separators:
# object-file-name Names an object file as the source for the following symbols
# object-file-name<TAB> Names an object file within the current archive/library for the following symbols
# <TAB>archive-file-name Names the archive/library for the following symbols
# hex<TAB>flags<TAB>section<TAB>symbol Names a symbol. Flags are per objdump -t. References name *UND* as their section.
"$AWK" 'BEGIN {
FS = "\t";
split("", funs)
split("", objs)
split("", refs)
aname = "" # Archive file name
oname = "" # Object file name within an archive
fname = "" # File name (or combined aname ":" oname)
}
NF==1 { # Solitary object file name
aname = ""
oname = ""
fname = $1
}
NF==2 { # Archive file
if (length($2) > 0) {
aname = $2
oname = ""
fname = aname
} else
if (length($1) > 0) {
oname = $1
fname = aname ":" oname
}
}
NF==5 { # Symbol table record
# Only consider symbols that start with _ or a letter
if (!($5 ~ /^[_A-Za-z]/)) next;
# Skip local, debug, dynamic, indirect, file, and warning symbols
if ($2 ~ /[lWIiDdf]/) next;
# Skip common and absolute-address stuff
if ($3 == "*COM*" || $3 == "*ABS*") next;
# If the symbol or reference is weak, we prefix the symbol name with !.
if ($2 ~ /w/) {
weak = 1
sym = "!" $5
} else {
weak = 0
sym = $5
}
if ($3 == "*UND*") {
# Symbol reference. Add file name to refs under this symbol.
if (sym in refs)
refs[sym] = refs[sym] "\t" fname
else
refs[sym] = fname
} else {
if ($2 ~ /F/) {
# Function definition
if (sym in funs)
funs[sym] = funs[sym] "\t" fname
else
funs[sym] = fname
} else {
# Non-function definition
if (sym in objs)
objs[sym] = objs[sym] "\t" fname
else
objs[sym] = fname
}
}
}
END {
# Find strong function definitions defined in more than one file
split("", syms)
for (sym in funs)
if (!(sym ~ /^!/) && (funs[sym] ~ /\t/))
syms[sym] = funs[sym]
if (length(syms) > 0) {
printf "%d duplicate (non-weak) function definitions:\n", length(syms)
for (sym in syms)
printf " %s in %s\n", sym, syms[sym]
printf "\n"
} else {
printf "There are no duplicate (non-weak) function definitions.\n\n"
}
# Find weak function definitions without corresponding strong symbol definitions
split("", syms)
for (wsym in funs) if (wsym ~ /^!/) {
sym = substr(wsym, 2)
if (!(sym in funs))
syms[sym] = funs[wsym]
}
if (length(syms) > 0) {
printf "%d weak functions without strong function definitions:\n", length(syms)
for (sym in syms)
printf " %s defined in %s\n", sym, syms[sym]
printf "\n"
} else {
printf "All weak function definitions have corresponding strong function definitions.\n\n"
}
# Find strong function symbols that are never referenced
split("", syms)
for (sym in funs) if (!(sym ~ /^!/)) {
wsym = "!" sym
if (!(sym in refs) && !(wsym in refs))
syms[sym] = funs[sym]
}
if (length(syms) > 0) {
printf "%d (non-weak) functions that are never referenced:\n", length(syms)
for (sym in syms)
printf " %s defined in %s\n", sym, syms[sym]
printf "\n"
} else {
printf "All (non-weak) functions are referenced at least once.\n\n"
}
# Find weak function symbols that are never referenced
split("", syms)
for (wsym in funs) if (wsym ~ /^!/) {
sym = substr(wsym, 2)
if (!(sym in refs) && !(wsym in refs))
syms[sym] = funs[wsym]
}
if (length(syms) > 0) {
printf "%d weak functions that are never referenced:\n", length(syms)
for (sym in syms)
printf " %s defined in %s\n", sym, syms[sym]
printf "\n"
} else {
printf "All weak functions are referenced at least once.\n\n"
}
# Find references that cannot be resolved
split("", syms)
for (ref in refs) {
if (ref ~ /^!/) {
sym = substr(ref, 2)
wsym = ref
} else {
sym = ref
wsym = "!" ref
}
if (!(sym in funs) && !(wsym in funs) && !(sym in objs) && !(wsym in objs)) {
if (sym in syms)
syms[sym] = syms[sym] "\t" refs[ref]
else
syms[sym] = refs[ref]
}
}
if (length(syms) > 0) {
printf "%d unresolved symbols:\n", length(syms)
for (sym in syms)
printf " %s referenced in %s\n", sym, syms[sym]
printf "\n"
} else {
printf "No unresolved symbols.\n\n"
}
}' "$Work/symbols"
This has only been tested on Linux.
I used
OBJDUMP,
SED, etc. for the corresponding executables' pathnames. The Bash substitution
VAR="${VAR:-default}" uses the already set non-empty value, or
default if none set or empty. So, one can use e.g.
bash -c 'AWK=some-other-awk-variant checksym.sh' to override the script-set value.
Since we parse objdump output, we set the default C locale, to ensure the output is not unexpectedly localized. (Compare e.g.
date and
LANG=C LC_ALL=C date).
printf is a bash builtin, a bit nicer than
echo. It is used to save all object file names, one per line, to
"$Work/objfiles", and all library or archive file names, one per line, to
"$Work/libfiles". These are separated only so that we can differentiate between standalone object files and object files within an archive in our output.
The reason we kick the names to a file, is so that huuuge projects with tens of thousands of files, can be supported. On some systems, the number of command-line parameters is limited, you see. If you encounter that limit, then this initial part of the script can be modified to pass library and object file names in a different way. (
-@file-name would be a common way we could use to specify a file containing object or archive/library file names. We could also specify name patterns looked for in an entire subtree, via
find.)
Application startup causes significant latencies in batch scripts. To minimize this, and to ensure we only dump each library or object file once, we sort the file containing the names of files to be processed, and feed it to
xargs, which executes
objdump with as many parameters as is possible, for all files.
The
objdump -t output is fed through a SED script, that manipulates the output in the following ways:
- /^$/d;
Deletes empty lines - /^SYMBOL TABLE:/d;
Deletes lines beginning with SYMBOL TABLE: - s|^In archive \(.*\): *$|\t\1|;
Replaces lines beginning with In archive and ending with a colon with a TAB and the archive name - s|: \+file format .*$|\t|;
Replaces colon, space(s), followed by "file format", with a single TAB skipping the rest of the line. - s| \([^ \t]\+\t[0-9A-Fa-f]\+\) |\t\1\t| ;
If there is a space followed by a token, tab, a hexadecimal number, and a space, replaces those spaces with TABs - s|^\([0-9A-Fa-f]\+\) |\1\t|
Replaces the space following the first hexadecimal number with a TAB.
The object file sed is similar, except it omits the archive name, and the object file name itself is converted to a single-field record.
Both are emitted to
$Work/symbols, which contains TAB-separated fields, one record per line. It has
- Records with two fields naming an object file within an archive.
If the first field is empty, the second field names a new archive file. If the second field is empty, the first field names the object file.
I use the convention archive-file:object-file for these, in the above script - Records with just one field name a separate object file (not inside any archive).
- Records with five fields specify a symbol table entry.
First and fourth fields are hexadecimal numbers, and not interesting.
Second field contains flag characters:
g: Global
u: Unique global
!: Global and local
w: Weak
F: Function
For others, see man 1 objdump, under -t or --syms option.
Finally, we feed the
$Work/symbols to awk, which tracks the file name (for each record), adding symbol references (those with section
*UND*) to
refs[] array, function definitions to
funs[] array, and other object definitions to
objs[] array. Weak symbols are internally differentiated by adding a ! in front of the symbol name.
Each of the three arrays (
funs,
objs, and
refs) has the symbol name as a key, preceded with a ! if the symbol or reference is weak, and the value is a TAB-separated list of file names where the definition or reference occurs in.
The report is output in the END rule, which is triggered after all input records have been processed.
If any non-weak function symbol in
funs has a value with a TAB in it, it is defined in more than one file.
If there is a weak function symbol in
funs without a corresponding non-weak symbol (say, there is
!foo but no
foo), then we have a weak function symbol definition without a corresponding strong symbol definition.
Each key in
funs must be defined in
refs also, or that key (weak or non-weak function) is not referenced at all. Such functions are either unnecessary, or unnecessarily global. (They might be used in the same object file they are defined in; to resolve these, we'd need to look at the relocation table for that particular object file.)
If there is a key in
refs, but no corresponding (weak or non-weak) key in
funs or
objs, we have a dangling, unresolvable reference.