In the early years of ARM7TDMI, the startup code was in crt.S. The code handled the many vectors, loading RAM (.data) and clearing .bss. It also set up the stack and enabled interrupts (if desired) before branching to main().
It is not a new idea to incorporate a number of .S files but one has to be a very good assembly language programmer to generate better code than the compiler writers.
When the Cortex variants came along, startup code could be written in C and that was a vast improvement. Assembly language was a thing of the past.
Still, there are opportunities to tweak the code and get a little better performance.