In the Linux kernel, the printk() interface is used for kernel logging (because the compiler can verify the conversion pattern against the parameters specified, and complain at compile time if they do not match). However, to extend the formats, they add additional characters
after the conversion specifier.
The full
printf() formatting rules are quite extensive. Many do not realize you can refer to specific parameters by their order, as long as you do it for all parameters. (This is required for things like localization, or localized formatting patterns via e.g.
gettext(), to work.)
Similarly, full string escaping rules in (POSIX) C are nontrivial. Just look at e.g. C17, and check if yours support \L…, \u…, or \U…. Also verify that
'\x123' != '\0443' (even though 0x123 == 0443), because of how octal and hexadecimal escape sequence parsing differs. Hint:
'\0443' == '$3'. See C17 6.4.4.4; it even has examples describing these.
_ _ _ _ _
In C99 or later, even in very memory-constrained situations, it is possible to write a macro that expands arguments (starting at a specific argument) into separate calls, which in turn are expanded via the
_Generic facility to type-specific function calls. For example,
emit(FOO, BAR, BAZ);can be expanded to
emit_init(); emit_int(FOO); emit_float(BAR); emit_str(BAZ);where the
_Generic selects
emit_int() for
FOO,
emit_float() for
BAR, and
emit_str() for
BAZ. You can also have common initial parameters, for example specifying the string buffer to construct the string into. The number of macro arguments is limited, but in practice is several dozen at least.
This can actually save a lot of resources, because you do not parse anything at run time. Something like
emit(buf, "Command ", cmd_id, " complete.\r\n");can be transmogrified by the preprocessor into say
emit_init(buf); emit_str(buf, "Command "); emit_uint(buf, cmd_id); emit_str(buf, " complete.\r\n");which is definitely useful. With a good _Generic statement, you'll even get reasonable error messages if your variable type is not supported.
I also like to have a final
emit_end(), which terminates the string buffer and returns the length of the constructed string if successful, or negative error code if an error occurred –– for example space ran out. In that case,
err = emit(buf, "Command ", cmd_id, " complete.\r\n");actually expands to
err = ( emit_init(buf), emit_str(buf, "Command "), emit_uint(buf, cmd_id), emit_str(buf, " complete.\r\n"), emit_end(buf) );where
err is assigned the return value from
emit_end(buf), so it is basically equivalent to
emit_init(buf), emit_str(buf, "Command "), emit_uint(buf, cmd_id), emit_str(buf, " complete.\r\n"), err = emit_end(buf);The functions are called in the specified order, but the other return values are ignored. This is because of the C comma operator behaviour (see C17 6.5.17).
(If we want to add optional formatting, things get complicated. The best option I've discovered is to wrap the value in a macro, say
FORMAT(variable, fmtspec), which expands to a
((const struct formatted){ .ref = &(variable), .spec = &(fmtspec)}), and thus via
_Generic detecting that structure type to
emit_fmt(buf, ((const struct formatted){ .ref = &(variable), .spec = &(fmtspec) })); assuming the
fmtspec also defines the type of
variable. The
fmtspec itself could be a const structure (with common field specifying variable type, and an union of substructures, one per type), a simple string, a pointer to the formatting function to use, or even a struct initializer (with a small modification to above), depending on what is most useful way to implement
emit_fmt() or its equivalent(s).)
_ _ _ _ _
On 8-bitters that emit such constructed strings via UART or similar, having to construct the string into a linear array first, consumes unnecessary memory resources. Instead, it would be better to record the formatting recipe, and have the UART interrupt generate the next character in the recipe dynamically, using minimal amounts of RAM.
One method is to have two reserved pointer values, END (0) and VALUE (-1), so that the recipe is just a sequence of pointers to ROM/Flash strings (stored in ROM/Flash) terminated with an END pointer. Whenever current sequence pointer is VALUE, the next character in the associated RAM buffer (where the pre-converted values are stored) is emitted; if the current sequence pointer is not END, the next character in that ROM/Flash string is emitted. The UART interrupt overhead should be quite small and not vary too much. In RAM, we need the space for the converted values as strings, one pointer to a ROM/Flash pointer to ROM/Flash, one pointer to ROM/Flash, one pointer to RAM, and a status byte (for quick return when there is nothing to be sent), or so.
Unfortunately, I have NOT found any sensible way for describing such in C99 or later. Preprocessor macros can be magic, but the best I can do thus far is similar to
emit() above, constructing all (including the pointer array) in RAM, which is not optimal. Having the UART state buffer be global, with dedicated
uart_emit() function to append to the state buffer, makes for an acceptable interface, I guess. You do need to reserve some "extra" for the conversion buffers, and take care you interact correctly with the UART TX interrupt, but it is possible to buffer another message while one or more previous ones are still being sent, as long as there is sufficient RAM for the pointers and the stringified values. To avoid sending partially constructed strings, the state needs to have shadow copies for the pointers and status byte, fixed at the
emit_end() phase.