Author Topic: Architecture of typical STM32 compilers and linker setups (Read 1439 times)

Sherlock Holmes · « **on:** January 10, 2023, 05:12:28 pm »

I'm seeking insights and documentation on the typical structure of object code and linker output code, in the context of a typical mid-end MCU like say the STM32 family.

I know there are a multitude of different compiler implementations and build tools, but do these all share a common OBJ file structure? do all linkers consume a standard format OBJ? what do the linkers generate? I know its some binary file but what format is it? is it specific to the STM32 devices or is it some kind of COFF standard?

What are the tools that go from source code to running code on a board? are the just 1. Compiler, 2. Linker, 3. Utility to load "EXE" file onto board?

Any info much appreciated.

rstofer · « **Reply #1 on:** January 11, 2023, 08:52:57 pm »

There is very probably an assembler pass after compiling and before linking; at least for GCC.

https://stackoverflow.com/questions/14039843/does-a-compiler-always-produce-an-assembly-code

The output of the linker, for uC purposes, is usually a binary image from beginning to end including data and startup code to initialize the .data segment. The code will usually be statically linked so no external library runtime code is assumed. Dynamic linking could happen in a Linux environment because the operating system provides a lot of utility. Same with any other OS... But that's at a much higher level.

https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_3.html

The entire GCC Compiler Collection is very well documented

https://gcc.gnu.org/onlinedocs/

Don't overlook 'binutils'

https://www.gnu.org/software/binutils/

GCC is a heck of a lot more than just a compiler/linker - that's the easy bit.

https://www.sourceware.org/gdb/

Probably need to study up on ELF files.

Nominal Animal · « **Reply #2 on:** January 11, 2023, 10:17:24 pm »

STM32 family of microcontrollers are based on ARM Cortex-M cores.

Cortex-M0 and Cortex-M1 use ARMv6-M architecture, Cortex-M3 uses ARMv7-M, and Cortex-M4 and Cortex-M7 use ARMv7E-M architecture. STM32 family has all of these. The instruction set and certain details are specified by ARM, but the peripherals vary from microcontroller to microcontroller.

To target STM32 microcontrollers, you therefore need a compiler that can target ARMv6-M, ARMv7-M, and/or ARMv7E-M architecture, depending on the STM32 microcontroller. The most commonly used C and C++ ones are GCC, Clang, Keil, and IAR; for a more complete list, see here (includes other programming languages as well).

GCC is a family of compilers, with each language translated to an internal abstract syntax tree data structure, which the common backend then converts to machine code. Each language is a separate frontend, which converts that language to the internal abstract syntax tree. Clang works in a very similar manner, being the C and C++ frontend, using LLVM as the backend. Some of the others are forks of GCC or Clang, or use LLVM backend.

The object file format used is ELF. There are two variants, 32-bit and 64-bit, but only 32-bit is used with Cortex-M cores. Compilers and linkers produce ELF files. As firmware update utilities tend to support some variant of Intel HEX file formats, linkers generate either those directly, or a binary memory image (or images, in case of Harvard architectures, but Cortex-M has a single unified memory addressing scheme) which is then converted to hex or similar format.

The final linkage is controlled by a linker file, which determines how sections are mapped to the memory, and so on. The section model used by ELF files is reflected there, and some features – like collecting information from different object files into a single consecutive array – are either necessary or at least extremely useful for microcontroller firmware development. Even things like interrupt vector arrays can be exposed by the linker, not the compiler (that is, linker determines the memory location, with the compiler only being told what symbol and type to use for it).

ARM also provides an abstraction layer for Cortex-M microcontrollers, called CMSIS (also at github.com/ARM-software/CMSIS_5). Vendors tend to provide compatible additions or variants to CMSIS, describing the peripherals provided by each microcontroller.
Often, vendors also provide a Hardware Abstraction Layer (HAL), but their quality and usefulness varies and is up for debate.

Many microcontrollers with native USB interfaces split their firmware into two parts: a bootloader, and the user firmware. The bootloader exposes an interface for easily replacing the user firmware, sometimes even in cases where the user firmware would normally lock up. The easiest of these is USB mass storage, which looks like a USB memory stick with just one file (the user firmware) in it. When one copies a new firmware file to the USB stick, the bootloader first saves it in RAM, calculates and verifies the checksum, and if acceptable, uploads the firmware.

I personally like to use Teensies (based on various NXP microcontrollers; Teensy LC is Cortex-M0+, Teensy 3.2 is Cortex-M4, Teensy 4.x are Cortex-M7F), and they use a proprietary bootloader, which exposes the Teensy as a HID device, so that the upload program does not need administrator privileges, but it does limit the upload speed a bit. On the other hand, it is much smaller than a USB mass storage driver, so more of the total Flash is available for user firmware.

Without a bootloader, or to upload/update the bootloader, you use a separate programmer with a suitable JTAG adapter for your Cortex-M microcontroller. This also allows you to debug code running on the microcontroller. For STM32s, ST-LINK (/V2 or V3-*) are probably the most commonly used. Mouser sells ST-LINK V3-MINIE for less than 15 USD/EUR, so these do not tend to be expensive, and many are compatible across several manufacturers.

There are also Arduino add-ons and cores for several different Cortex-M microcontrollers, if you want to experiment with that toolchain first. For STM32s, see STM32duino.com, its Wiki, and GitHub sources (C++). It is licensed under permissive licenses.

DiTBho · « **Reply #3 on:** January 11, 2023, 11:05:00 pm »

@Nominal Animal
I have just checked my Yaroze kit (SONY Playstation1), and ... sorry, I was wrong about that in my last PM.

Their CodeWarrior builder actually produces an "exe" file. You build, and boom, "App.exe" appears in your target folder, ready to be "burned" on a CD (the Yaroze has no zone-code check) or downloaded (Yaroze has 2Mb extra ram for this) to the Playstation via Caetla.

It's a structured binary, and it's really ... weird, since ".exe" and ".com" are usually associated with Dos and Windows rather than with a console video game executable file, but that's SONY' choice ...

In my case, I have never cared too much, even because I use a ROM emulator, so even when I use CodeWarrior I don't look at their builder output but rather at the last binary that comes of from the linker, which I then directly upload as "pure binary" rather than a structured .elf, a .coff, a .dwart, or something.

Nominal Animal · « **Reply #4 on:** January 12, 2023, 03:23:36 pm »

Quote from: DiTBho on January 11, 2023, 11:05:00 pm

[Sony Playstation 1 Net Yaroze] CodeWarrior builder actually produces an "exe" file. You build, and boom, "App.exe" appears in your target folder, ready to be "burned" on a CD (the Yaroze has no zone-code check) or downloaded (Yaroze has 2Mb extra ram for this) to the Playstation via Caetla.

It's a structured binary, and it's really ... weird, since ".exe" and ".com" are usually associated with Dos and Windows rather than with a console video game executable file, but that's SONY' choice ...

I think I did have CodeWarrior for PowerPC Mac at some point (got an older copy at a steep discount, IIRC).

I was thinking about getting Net Yaroze for making (actually) educational games via my company in 1997-1998, but because the web stuff was more accessible (using Macromedia Shockwave, using the Macromedia Director 4.0 suite), we concentrated on that instead. The language used by Director, Lingo, although described as "object oriented", was really quite event-driven from the user-developers point of view.

I never did much Flash development, as I closed down the company in 2005 after a bad burnout. Since then, I've done quite a bit of HTML+JS interactive stuff, although more on the server side. (Enough to recommend HTML+JS for embedded UI mockups and simple tools, definitely. Current browsers' JS interpreters are extremely well optimized, and run fast.)

westfw · « **Reply #5 on:** January 14, 2023, 11:29:35 am »

Note that the ELF file usually contains a great deal of symbol and debugging info, so you usually have some additional tool that extracts and/or converts that to only the bytes that actually need to be loaded into your microcontroller (producing .hex, .bin, .uf2, or some other uploader/programmer friendly format.)

Sherlock Holmes · « **Reply #6 on:** January 29, 2023, 02:56:57 pm »

Quote from: westfw on January 14, 2023, 11:29:35 am

Note that the ELF file usually contains a great deal of symbol and debugging info, so you usually have some additional tool that extracts and/or converts that to only the bytes that actually need to be loaded into your microcontroller (producing .hex, .bin, .uf2, or some other uploader/programmer friendly format.)

Thanks, I've been studying LLVM and that offers an API that can optionally generate output as ELF or Windows COFF. I did read that ELF has a richer debug support.

Most annoying though that there seems to be a tradition of never appending a file suffix to ELF object files, like what's wrong with appending .ELF - oh well, that's the nature of this business!

LLVM looks very impressive, it exposes an abstract ISA and from that is able to generate X86/64, ARM, RISC-V and many others. It is aware of the differences and variants of each target ISA too, the IR is RISC-like, a load/store architecture.

This means that a compiler code generator need only generate LLVM "instructions", which is a huge gain. It's large though, lots involved so I'm toying with initially implementing a X86 code generator for COFF, this would enable a working compiler to be created with an LLVM backend as a future update.

I have the X86 code generator already, I wrote that in C and can recode it in C# pretty easily. I'm likely going to focus on LLVM's ISA, I can use Clang as an LLVM assembler so mastering that ISA will eventually be essential.

Also debugging a compiler on Windows is easier than an MCU where there's basically no OS to hold one's hand, so I can achieve a good level of stabillzation first and then explore the MCU side.

I also got some very powerful COFF/ELF explorer tools courtesy of our very own NSA!

eutectique · « **Reply #7 on:** January 29, 2023, 09:02:23 pm »

Quote from: Sherlock Holmes on January 29, 2023, 02:56:57 pm

Most annoying though that there seems to be a tradition of never appending a file suffix to ELF object files, like what's wrong with appending .ELF - oh well, that's the nature of this business!

Are you talking about Unix or Linux? Every embedded project that 1) I was involved in, 2) used gcc, and 3) built around Cortex-M MCU generated the output elf file with .elf suffix, just for clarity. It seems to be a universal unwritten default, IME.

If you don't specify the output file name, gcc linker will produce a.out, by the nature of this business.

DiTBho · « **Reply #8 on:** January 29, 2023, 11:09:46 pm »

Quote from: eutectique on January 29, 2023, 09:02:23 pm

Are you talking about Unix or Linux?

Yes, "coff" was GNU/Linux userland umm, days of kernel =< 2.0, eaerly gcc 2*, then they switched to ELF, developed by Unix System Laboratories, it is now the most widely used format in the Unix world. Several well-known Unix operating systems, such as System V Release 4 and Sun’s Solaris 2, have adopted ELF as their main executable format.

DiTBho · « **Reply #9 on:** January 29, 2023, 11:39:54 pm »

(ironically, it seems that gcc-m88k never had elf support, only aout and coff
I need to check what the OpenBSD guys did for Luna88k
but for me and my m88100 dev-board the last stop is gcc-2.95.3

)

DiTBho · « **Reply #10 on:** January 29, 2023, 11:43:42 pm »

(
.ps
The coff format is limited and very old. We all know.
But the great news is that you can convert COFF to ELF with the "objcopy" tool.
(GNU/binutils)
)

westfw · « **Reply #11 on:** January 29, 2023, 11:46:33 pm »

Aren't gcc .o files also in elf format? All of the normal binutils that read elf format work on them...

Sherlock Holmes · « **Reply #12 on:** January 30, 2023, 02:42:55 am »

Quote from: eutectique on January 29, 2023, 09:02:23 pm

Quote from: Sherlock Holmes on January 29, 2023, 02:56:57 pm
Most annoying though that there seems to be a tradition of never appending a file suffix to ELF object files, like what's wrong with appending .ELF - oh well, that's the nature of this business!

Are you talking about Unix or Linux? Every embedded project that 1) I was involved in, 2) used gcc, and 3) built around Cortex-M MCU generated the output elf file with .elf suffix, just for clarity. It seems to be a universal unwritten default, IME.

If you don't specify the output file name, gcc linker will produce a.out, by the nature of this business.

This is using VisualGDB (an extension for Visual Studio, I work on Windows), the code is compiled with GCC, it's an STM32 F4 development board, no idea other than the generated object file has no suffix, I examined the file too with analysis tools, it is indeed an ELF file.

See:

https://askubuntu.com/a/1188034

Quote

There's no established mandate in the Linux world that there must be extensions. Filesystems for UNIX-like operating systems do not separate the extension metadata from the rest of the file name. The dot character is just another character in the main filename. Instead, internal file metadata is popularly encoded within the beginning of the file to show what kind of app opens it.

Quote

indeed - but there are also files without extension that turn out to be something else - such as ELF 64-bit LSB shared object, ... or that catch-all, data. –
j4nd3r53n
Nov 12, 2019 at 13:25

SiliconWizard · « **Reply #13 on:** January 30, 2023, 07:33:54 pm »

Quote from: westfw on January 29, 2023, 11:46:33 pm

Aren't gcc .o files also in elf format? All of the normal binutils that read elf format work on them...

It depends on how GCC was configured. ELF is the most common these days, but it can be COFF too.
For instance, my GCC compilers for ARM and RISC-V generate ELF objects, but my GCC compilers for Windows generate COFF.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Architecture of typical STM32 compilers and linker setups (Read 1439 times)

Sherlock Holmes

Architecture of typical STM32 compilers and linker setups

rstofer

Re: Architecture of typical STM32 compilers and linker setups

Nominal Animal

Re: Architecture of typical STM32 compilers and linker setups

DiTBho

Re: Architecture of typical STM32 compilers and linker setups

Nominal Animal

Re: Architecture of typical STM32 compilers and linker setups

westfw

Re: Architecture of typical STM32 compilers and linker setups

Sherlock Holmes

Re: Architecture of typical STM32 compilers and linker setups

eutectique

Re: Architecture of typical STM32 compilers and linker setups

DiTBho

Re: Architecture of typical STM32 compilers and linker setups

DiTBho

Re: Architecture of typical STM32 compilers and linker setups

DiTBho

Re: Architecture of typical STM32 compilers and linker setups

westfw

Re: Architecture of typical STM32 compilers and linker setups

Sherlock Holmes

Re: Architecture of typical STM32 compilers and linker setups

SiliconWizard

Re: Architecture of typical STM32 compilers and linker setups

Share me