Just for yuks I tried the same code on 32 bit RISCV. It compiles to 66 bytes of code. Unfortunately, the ABI says the stack pointer must always be 16 byte aligned, so it uses 16 bytes per level or 160 maximum even though it's storing less data than that. So my worst case total is 226 bytes of code+stack.
Using riscv32-unknown-elf-gcc 6.1.0 with -Os
Your function compiles to 146 bytes of code, and uses 32 bytes of stack, total 178 bytes.
00010222 <print_dec>:
10222: 1141 addi sp,sp,-16
10224: c422 sw s0,8(sp)
10226: c226 sw s1,4(sp)
10228: c606 sw ra,12(sp)
1022a: 842a mv s0,a0
1022c: 00055a63 bgez a0,10240 <print_dec+0x1e>
10230: 8581a503 lw a0,-1960(gp) # 1b910 <_impure_ptr>
10234: 02d00593 li a1,45
10238: 40800433 neg s0,s0
1023c: 4510 lw a2,8(a0)
1023e: 3f55 jal 101f2 <__sputc_r>
10240: 45a9 li a1,10
10242: 02b45533 divu a0,s0,a1
10246: 02b47433 remu s0,s0,a1
1024a: 03040413 addi s0,s0,48
1024e: c111 beqz a0,10252 <print_dec+0x30>
10250: 3fc9 jal 10222 <print_dec>
10252: 8581a503 lw a0,-1960(gp) # 1b910 <_impure_ptr>
10256: 85a2 mv a1,s0
10258: 40b2 lw ra,12(sp)
1025a: 4422 lw s0,8(sp)
1025c: 4492 lw s1,4(sp)
1025e: 4510 lw a2,8(a0)
10260: 0141 addi sp,sp,16
10262: bf41 j 101f2 <__sputc_r>
One reason for the 14 bytes more of code is this toolchain inlined putchar to putc, which needs to load stdout from a global and then pass it. I didn't look closely enough to figure it out, but I think that might also be the reason for saving three registers not just two (not that it makes any difference with mandated 16 byte alignment)