The ARM itself is pretty straightforward. It fetches initial PC and SP from 0x0, and off it goes. Given even the simplest of C environments, you can probably just start your program with main(). There are a couple of useful peripherals (for Cortex-M, anyway) that are common across all the chips - the interrupt controller and the Systick timer, so those are things you'll want to study in particular.
The complications show up in configuring the other parts of the chips.
1) Most ARM chips come up running on a slow-ish internal clock. To run at maximum speed, you usually have to set up an external Crystal oscillator at some based frequency and a PLL or FLL to multiply that up to the desired final clock rate. And configure the memories for appropriate number of wait states for that clock rate. This is chip-specific. It might be done in code that happens before main() (system_init() and/or board_init()) depending on library setup. The flow is somewhat standardized, but not always implemented.
2) Most ARM chip peripherals initialize to "off" - frequently even things like GPIO that you wouldn't think has an "off." Turning on a peripheral frequently involves "enable", "set clock", and "set up pins", in addition to what you'd think of as normal peripheral initialization. Chip and vendor dependent, of course.
3) Many ARM chips have peripherals that are much more complex than the typical 8-bit microcontroller peripheral, so you need to wade through either the datasheet (sometimes called a "User Manual" or "Technical Reference Manual", leaving little in the "datasheet" but pinout and electrical specs), or the library documentation, or both, to figure out how to actually do anything.