Kittu20, are you familiar with the term
address space? Simply put, it means that a given address, say address 24, can refer to completely different things, if there is more than one address space.
Let's do a deep dive into the subject. (For those annoyed at my wall of text posts, I apologize; in my defense, I do believe the below information is useful or perhaps even necessary for learners interested in embedded/microcontroller development.)
On x86 and x86-64 (= AMD64), there are two address spaces: memory and I/O. I/O addresses use different machine instructions (
IN and
OUT) compared to normal memory accesses. 32- and 64-bit x86 and x86-64 also support
virtual memory, where a privileged kernel can control what addresses unprivileged code can access, and map different physical memory addresses to the address space the unprivileged code can see. Because of security reasons, current fully-featured 32-bit and 64-bit operating systems use
address space layout randomization, where the actual address ranges used by some code are varied at run time. The binaries they run contain
relocation records, so the run-time linker can "fix" the code and data to refer to the correct randomized addresses.
Aforementioned x86 and x86-64 are based on the
von Neumann architecture, where code and data uses the same address spaces. Older AVRs for example use
Harvard architecture, where code and data use separate address spaces (and even separate machine instructions, LPM/ELPM/SPM for program memory, LD/ST et al for data memory).
Clang supports multiple address spaces for both C and C++, but GCC currently only for C. This is why the Arduino environment is so "messy": it uses its own preprocessor, but
g++ (GNU GCC C++ compiler) for compiling, which only supports von Neumann architectures; but to work for Harvard architectures, the Arduino developers had to cut some corners. For example, strings in RAM and strings in Flash use different functions (
_P suffix in function name). (In my opinion, because of this, C is more powerful language for older AVRs than C++ is, unless you limit yourself to Clang. When the compiler can understand pointer and variable address spaces, you can use much less RAM and Flash, and leave the compiler to optimize the low-level accesses as best it can. I like that.)
ARM Cortex-M microcontrollers use a single address space for everything. Different types of memory and even peripheral devices are accessible at different addresses. It is a
memory-mapped architecture; one subtype of von Neumann architectures.
Most embedded development toolchains nowadays use
ELF format object files. Every code and data object (or "thing" in general) in ELF object files always belong to a specific
section. By default,
.text section corresponds to code,
.data to initialized variables,
.bss to uninitialized variables, and so on; but this is just a convention, not a requirement. When compilers generate code or allocate global variables, they do not assign them a fixed address, but an address relative to the relevant section. The compilers even allow you to specify the section name yourself, if you want. If you want to refer to a fixed address in your C or C++ code, you normally do so via a pointer: you declare a pointer to the code or data at that fixed address, and set the pointer value to that address.
When the different ELF object files are linked to form an executable or an embedded system firmware image, it is the
linker –– often executed automagically by the compiler –– that decides the addresses at which different sections will occupy. The way it does this is controlled by a
linker script. Fortunately, at least GNU/GCC/binutils (
ld and gold) and Clang (
LLVM LLD) linker scripts use the exact same format, documented
here.
In practice, this means that if you declare structure variables describing a memory-mapped peripheral, you can assign it a specific section, say
hw_name, and in the linker script, assign it the correct start address. If you have device variants where the addresses vary but the structures stay identical, you can just create a different linker script for each variant.
As an example, consider Teensy 3.x, programmed in the Arduino environment, using the Teensyduino add-on. The
core support files are here, including the linker scripts for
mkl26z64 (Teensy LC),
mk20dx128 (Teensy 3.0 and 3.1),
mk20dx256 (Teensy 3.2),
mk64fx512 (Teensy 3.5), and
mk66fx1m0 (Teensy 3.6). There are some
#if defined(__MKxxxx__) .. #endif snippets in the C/C++ code accounting for differences in e.g. peripherals, but the same codebase supports all these different Cortex-M0 (LC), Cortex-M4 (3.0, 3.1, 3.2), and Cortex-M4F (3.5, 3.6) microcontrollers.
(
sdcc AKA Small Device C Compiler, used for e.g. Intel MCS51 (8051 et al), HC08, Z80, MOS 6502, Padauk pdk14 and pdk15, et cetera, doesn't really use ELF format object files; its linkers take command-line arguments specifying the details that provide roughly similar but very limited functionality compared to "bigger compilers'" linker scripts.)
In summary, this means that there is no fixed order or mapping for the different sections, at all. Sometimes the processor/microcontroller, bootloader, and/or the board requires specific addresses for specific things, like peripherals at specific address range, RAM at another, Flash at yet another, maybe PSRAM in yet another, and so on, but the rest –– especially RAM use for initialized and uninitialized data, heap, and stack –– is up to the linker script (or equivalent configuration when using e.g.
sdcc that has no explicit linker script).
Footnote:
GCC and Clang compiler options
--ffunction-sections --fdata-sections -Wl,--gc-sections utilize ELF section and relocation information to drop completely unneeded code and data from the final linked object. It does this by using a separate code/data/bss sub-section for every function, variable, and object. Because ELF relocation records are section-specific, simply checking whether there are any relocation records targeting a section, the linker can tell whether the function/variable/object is used or not.
I use ELF section support
extensively, both in embedded development, and when writing POSIX/GNU C for Linux/MacOS/BSDs. The most useful case is declaring structures in separate source files, and have the linker collect them into a single contiguous array. If you use Linux, you might want to examine my
RPN calculator example: in this one, you can create new mathematical operators and declare them in a new C source file, compile it, and link it to the final binary, and the operator will be automagically available.
The one downside to this is that the order of the elements in the final array is unspecified: you cannot rely on the array elements being in any specific order.
I've written a few tools to deal with this –– I'd love to have e.g. an external Python script to be able to sort or reorder the entries, without changing their contents (Python because GDB also supports helper "commands" written in Python); or to add a perfect hash table for the entries (say, for command lookup) –– but I'm still investigating how to make that robust enough for my paranoid tastes.