If I understand correctly, the return SP address is placed on the stack
No. The aapcs32 ABI for Cortex-m0 says that return address is in LR, R0-R3 contain parameters, and any additional parameters are on stack. It is the caller that needs to reset the stack pointer after a call with additional parameters passed on stack, because the callee, the function being called, must preserve R4-R11 and SP (=R13). There is no way to find out in the callee how many parameters were actually supplied, unless the caller tells it in an always-passed parameter (like the format string).
In your screenshots, the compiler options are such that it just happened to use R7 to store the old value of SP, so that [SP, R7) contained additional parameters. The compiler is absolutely free to save the old value of SP whatever way it wishes. Indeed, in the linked example, the compiler often simply adds to SP the amount of stack space used (less LR), then pops the saved LR value directly to PC. (This is because
bx lr is equivalent to
push {lr},
pop {pc}.)
Put simply, when
printf() gets called using aapcs32 ABI on Cortex-M0, all it knows is that R0 contains the address to the format string, and R1-R3 and possibly stack contain the parameters referred to. Because the parameters are all of basic types (
signed and
unsigned char,
short,
int,
long,
long long,
float,
double) or types mapping to basic types (
size_t, possibly
ssize_t,
intmax_t,
uintmax_t, and
ptrdiff_t), in this ABI they occupy either one or two 32-bit words. The format string is the only one that can tell you how many and what the supplied parameters are.
Essentially, there is no way in aapcs32 ABI on Cortex-M0 (or in most other ABIs) for the
func() function implementation to be able to differentiate between
func(5); func(5, 4); func(5, 4, 3); func(5, 4, 3, 2); func(5, 4, 3, 2, 1); func(5, 4, 3, 2, 1, 0);calls. See
godbolt.org/z/efTc8sKb7 for proof.
I feel as if the existence of va_copy() would require that gcc provide enough information to make a copy of the variadic arguments.
No.
In essence,
va_list could simply be a signed integer, with negative values describing registers, first parameter corresponding to the most negative value, and zero and positive values to stack offsets. Then,
va_start() initializes it to the value corresponding to the first variadic parameter,
va_end() does nothing,
va_arg() returns the one or two-word value at the register or stack offset and advances it accordingly, and
va_copy() copies the current signed integer to another
va_list variable.
Most architectures do it somewhat like this, except that certain types of values may be stored in a separate register file. For example, on SYSV ABI on x86-64, xmm0 to xmm7 registers are used to store the first eight
double parameters. Thus,
va_list may be a pair of indices, or even a bitmap. As the
va_list type variable is passed as the first argument to the
va_ functions, and C passes simple types by value, the exact implementation (in the C library
stdarg.h) can be quite funky; for GCC, these are implemented as compiler built-ins (
__builtin_va_start(list,param),
__builtin_va_arg(list,type),
__builtin_va_copy(listcopy,list), and
__builtin_va_end(list)).
For example, GCC 9.2.1 for aapcs32 and Cortex-M0, tends to
push {r0, r1, r2, r3} at the beginning of variadic function implementation, so that all the parameters are actually on stack in order: r0 at SP, then r1, r2, r3, followed by any parameters pushed to stack by the caller. You can see this clearly in
godbolt.org/z/efTc8sKb7. Note that the reason
r4 is sometimes pushed to stack even when it is not used, only to keep the stack double-word aligned; the number of registers pushed is always even. (Clang tends to use
r7, so it is not necessarily
r4, could be
r4-
r8 or
r10 just as well.) Also note that GCC generates non-optimal code here; it does not need to preserve r0-r3, but sometimes does, using unnecessary amount of stack. Function bodies in aapcs32 must preserve r4-r11 and SP (r9 might be special).
In any case, the
func() implementation will not receive any information it could use to determine how many parameters were supplied, as you can see. That must be provided by the fixed parameter(s), which in the case of
printf() is the format string.
You can see the aapcs32 core register use
here.