I have seen a topic on microchip forum there is actually something before main in the XC compiler, for initializing.
It is nothing special, really; it just
feels odd the first time you encounter it. I can explain it all in a single post, if you want.
And there are
two completely separate things before main(), not just one. One is that for dynamic executables, the run time linker will execute things it sees in the executable. That, of coursse, only occurs if you have a run time linker; and you only have those on full-blown OSes like Linux and Windows, and not on embedded devices.
The second is the one you alluded to; for various reasons, it is often called C Runtime, or
crt; and it is part of the set of base libraries –– usually avr-libc or newlib with AVR and ARM targets, respectively; I'm not sure which XC (Microchip compiler) you refer to, but I wouldn't know for sure which base libraries it uses anyway.
If you look at say avr-libc, compiled with either gcc or clang for various AVR targets, the magic happens in
avr-libc/crt1/gcrt1.S. The compiler provides a default linker script (see say
hardware/tools/avr/avr/lib/ldscripts/avr5.x in Arduino sources to see the default linker file for ATmega32u4 – the suffix describes the criteria (a set of compiler and linker options), not any format, by the way), which describes how the various ELF sections are arranged and combined to put together to get an uploadable binary.
Internally, avr-libc uses 21 ELF sections for the critical parts:
.vectors for the array of entries the hardware uses when an interrupt occurs, the first one of which is __init, the one the hardware uses when it starts up;
.init0 through
.init8 for the machine code run before
main() is called,
.init9 doing that call or jump; and
.fini9 through
.fini1 for stuff that needs to be done after
main() returns or
exit() (or its variants) is called, with
.fini0 being responsible for the forever loop or system restart.
The exact same "trick" as I showed in my example above, is used by the linker script to merge the ten .init sections into a single consecutive chunk of machine code that finally jumps into or calls C main() function. Similarly, the .fini sections are merged into a single consecutive chunk of machine code that gets called if main() returns, or one of the exit() functions is called.
So, there is not much to it at all. If you use Arduino, go to Preferences, and make sure you have Show verbose output during: compilation checked, and I suggest also the Compiler warnings: all. Then, when you Verify/Compile a sketch, you can see the temporary directory and ELF object file names and paths it uses in the output. If you look up the
.ino.elf one, you can use the
avr-objdump utility included in Arduino to examine the ELF object file. Use the
-d flag to get the AVR disassembly; other flags show other useful stuff. In the disassembly, find symbol __vectors. On AVRs, the interrupt vectors are actually jump instructions, and the very first one is taken at startup. In Arduino, for AVRs, it is usually to __ctors_end, but may vary. An example one I just checked contains 36 instructions before the call to main():
__ctors_end (writes 0 to I/O port 0x3F, 0x0A to port 0x3E, 0xFF to port 0x3D),
__do_copy_data (copies initialized variable and object data from Flash to RAM),
__do_clear_bss+
__do_clear_bss_loop+
__do_clear_bss_start (clearing RAM corresponding to uninitialized/zero initialized variables), and
__do_global_ctors that calls all ELF constructor functions (via __tablejump2__ helper function, which is just six additional instructions that first loads the address to jump to from 2*Z (Z being the register pair r30:r31 on AVRs), then jumps to that address; return address is in another register pair). Those ELF constructor functions are the ones marked with
__attribute__((constructor)); and for C++ constructors, one that takes no parameters and uses the explicit object address to call the C++ function using the C++ calling convention, will be created automatically by the compiler.
I'm not the best, and although I want to know everything, it's only because I'm curious to a fault. The reason anyone else might want to know the details, is because
sometimes they may come in handy.
For example, if I found I need to get a C function executed in Arduino before its secret main() starts, in an AVR Arduino sketch, then by knowing the above, I'd *know* that
__attribute__((constructor)) static void myfunc(void) { /* do stuff before main */ }would do exactly that. I happen to know that not only does it work with ALL Arduino cores, but it also works in normal hosted environments in Linux and BSDs the exact same way.
Oh, and because XC are derived from either GCC or LLVM/Clang, which both support all of the above as long as ELF object files are used, all of the above applies to XC too.
For an example XC use case, perhaps you have a largeish table in RAM that you don't want to waste Flash for. What you do, is leave it uninitialized, but create the above constructor function to initialize it. Done! For all intents and purposes, by the time main() starts, it will be initialized correctly. If you do need it initialized before other constructor functions get executed (perhaps a C++ object constructor uses that table?), it gets a bit more complicated (you need to check if setting a priority suffices, or whether you need to add a linker script detail, to ensure the order) – say, maybe five minutes of tinkering, and half an hour of testing the results are correct.
Granted, this information is a bit esoteric and not needed by everyone, but there definitely are use cases and reasons why one might care to know this.