gcc is very good at high-level "pro forma" elimination. Take this example for ARM:
// Return min, max for two operands of any type T that implements operator<()
template<typename T>
T max(const T& a, const T& b) {
return b < a ? a : b;
}
template<typename T>
T min(const T& a, const T& b) {
return b < a ? b : a;
}
int
foo(const char arg) {
char a = 1;
char b = 2;
char c = 3;
// d = b
const char d = min(max(a, b), c);
return min(d, arg);
}
Including the function and module wrappers, it generates exactly this for ARM7T (g++ 4.6.0):
.cpu arm7tdmi
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 18, 4
.file "foo.cxx"
.text
.align 2
.global _Z3fooc
.type _Z3fooc, %function
_Z3fooc:
.fnstart
.LFB2:
@ Function supports interworking.
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r0, #1
movhi r0, #2
bx lr
.cantunwind
.fnend
.size _Z3fooc, .-_Z3fooc
.ident "GCC: (GNU) 4.6.0"
Basically, R0 is the input and it gets conditionally changed to 2 before it returns via LR. The function is exactly three instructions, indicating the compiler has completely reduced the constant expressions including even template functions with an inferred type parameter. For large software projects this is the truly golden stuff. Small checksumming loops and such are easily hand rolled in inline assembly (which gcc will also nicely inline as it's declarative on what side effects it has). This is even an old compiler. Gcc is an excellent compiler for large software projects, but it has never been good for small 8/16 bitters. People spent a ridiculous amount of effort trying to use it with 8086, to no avail. It wants a 32/64 bit processor with lots of GPRs, a flat address space, and orthogonal instruction architecture.