See [url=https://github.com/tomstorey/m68k_bare_metal/tree/master]this project[/url] for a simple cross-compiler toolchain setup. 
Read the README - at least up to "Getting Started" section to understand it's purpose.
Although it's for the 68K CPU, it can be a framework for any CPU given that you've got a C cross-compiler for it.
To boot a CPU/MCU into main() requires just four files: Makefile, platform.ld (linker script), crt0.S (C runtime startup), and your custom main.c

As a concrete example, I've provided a zip attachment of the four files plus two additional UART HAL files.  It's for a board with a 68008 CPU, 128K ROM, 32K RAM, and 16550 UART.

The 68008 looks (to software) like a 32bit CPU capable of 32bit integer operations and 32bit addressing except that it only has a (hardware) 8bit data bus and 20bit address bus (1MB).  Address decoding uses just one 74LS138; where each output pin (of 8 ) divides the 1MB address space into 128KB chunks where only 3 pins are used to enable ROM, RAM, or UART at a time. The memory map is: 0x00000:ROM, 0x20000:RAM, 0xA0000:UART    The 68008 doesn't have a separate IO address space like x86 (which uses IN/OUT instructions to access it).  Instead, all IO is memory-mapped.  In this case the UART appears as a memory at 0xA0000 to 0xA0007 (registers internal to the 16550).

By design, the 68K CPU has the vector table in the first 1KB of address space; there are 256 32bit addresses which can be invoked when events occur eg. on reset, hw interrupts, div by 0, etc.  Notice the linker script initializes the vector table at address 0 with the stack address (top of RAM usually) in the first table entry, the first code execution address (_start) as the second table entry, and defaults for the rest of vectors (although it doesn't do all 256 of them - lazy). It then appends all the compiled code following the vector table, then the read-only data (constants), and finally initialized (read-write vars) data.  This is the binary that's created to be burned into the ROM.  The first code that executes (_start in crt0.S) clears the bss area in RAM (for uninitialized vars), copies the initialized (read-write vars) data from the ROM binary into RAM, then jumps to main (in ROM).

The main.c that is provided is just a simple serial terminal echo program.  But it could be a monitor program or a full-blown OS.  The first order of business is to initialize the UART via init_tty() call.  See the two HAL files tty_16550.h and tty_16550.c where the UART driver specifics are kept; addesses of all the registers, code for the interrupt service routine (attached at vector IRQ2() including a systick; a 555 tickling the RI pin @50Hz), line buffer, and putchar()/getchar() primitives which will be called from main.c  Other IO drivers will have .h and .c files with their own init_xxx() and primitives too.

You don't need a bloated, obscuring IDE to do this for you.


To compile:
    make clean
    make all
Produces bmbinary.rom file to be burned onto a ROM.

Also included in the zip are outputs produced via:
    make althexdump
    make dumps
The first is a hex+ascii dump of bmbinary.rom.  Notice the various sections:
    vector table between 0x0000 and 0x03FF, 
    code between 0x0400 and 0x06DF, 
    read-only (constants) data between 0x06E0 and 0x071F, and
    initialized (read-write vars) data between 0x0720 and 0x072F.
The second is a dump of the various sections in the bmbinary elf file produced (incl. assembly and data).

