Author Topic: [Solved] Saving "printf" arguments for later? Real-time on slow processor.  (Read 9734 times)

0 Members and 1 Guest are viewing this topic.

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
I've got a situation where I am running a very slow processor that happens to have a lot of spare RAM. The CPU sits idle half of the time. It has some real time processing that it must do on time every handful of seconds. It is just barely fast enough to do its real time work without printfs.

The device firmware interfaces with a huge complicated third party library. This library implements a "debug mode" where it calls printf in various places.

In debug mode it easily generates several kilobytes of formatted number containing text per minute. And, what the library does is so complicated that we really need the debug output to be generated all of the time (both in development and production).

Unfortunately, It turns out that printf is so slow that it causes the system to fail in a variety of subtle yet completely catastrophic ways. (due to it missing real time deadlines and not being able to detect that it has missed them)

Has anyone dealt with this before?

I've reached the conclusion that my best bet is to write a wrapper for printf.
The wrapper would save the printf format string, and variadic arguments to a buffer so that I can process the printfs later when the CPU is idle.
I think I can get away with this since all of the printfs just contain (formatted) numbers and short strings.

I'd be interested to hear other people's thoughts and experiences with this sort of problem.

footnote: it is a shame I can't just spit the format string and raw stack values out the serial port to be interpreted by a PC program later. Have to parse the format string.

Edit: CPU is a ARM Cortex-M0+

edit: see last post, after implementing everything, a test printf with 8 arguments takes 600 cpu clock cycles
« Last Edit: February 04, 2025, 10:38:07 pm by incf »
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #1 on: January 18, 2025, 02:20:52 pm »
If printf is too slow, then don't printf so often nor so much. Don't bother wrapping printf.

Store the information in a RAM buffer fast. Information might be the bytes representing a number (not the human readable version thereof), or for predefined text simply store an integer indexing the text or a pointer to the text, or for an FSM state a pointer to the class implementing the state logic, or for an FSM event, an integer representing the type of the event plus the information in the event arguments.

When time allows, either transmit that information to another processor, or convert that information into a human readable format and then discard the information.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 
The following users thanked this post: RAPo

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #2 on: January 18, 2025, 02:34:05 pm »
I don't have any control over the third party library. It uses printf*, and all of it's debugging functionality is either "on", or "off."

*or rather, it calls a function that I supply that is expected to behave like printf
« Last Edit: January 18, 2025, 02:35:59 pm by incf »
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9528
  • Country: fi
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6466
  • Country: es
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #4 on: January 18, 2025, 02:46:53 pm »
Write a wrapper to _write that copies the string to a queue and runs the uart in the backgroung in interrupt mode, or in main() but sending only one byte at a time, so it doesn't block the program execution.
Also, try increasing the baudrate, it's likely causing a bottleneck.
« Last Edit: January 18, 2025, 02:50:20 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #5 on: January 18, 2025, 02:49:37 pm »
Write a wrapper to _write which copies the data to a queue and runs the uart in the background in interrupt mode.
That won't block the program execution like now.
Also, try increasing the baudrate which is likely the culprit.
I already added buffering to _write and it still was too slow! It is a multi-threaded "real time" system that has a dedicated thread for the UART. I ended up giving each thread its own buffer to avoid contention, but even after that it was still too slow.

Parsing the format string and generating decimal digits in strings appears to be the culprit in terms of CPU cycle usage.
« Last Edit: January 18, 2025, 02:54:25 pm by incf »
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #6 on: January 18, 2025, 02:50:27 pm »
Did you see this thread, it mentioned some ideas:

https://www.eevblog.com/forum/microcontrollers/low-footprint-printf-replacement-for-embedded-dev/?all
That is a good reference. I will likely refer to it when writing a format string parser.
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4931
  • Country: dk
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #7 on: January 18, 2025, 03:12:24 pm »
Write a wrapper to _write which copies the data to a queue and runs the uart in the background in interrupt mode.
That won't block the program execution like now.
Also, try increasing the baudrate which is likely the culprit.
I already added buffering to _write and it still was too slow! It is a multi-threaded "real time" system that has a dedicated thread for the UART. I ended up giving each thread its own buffer to avoid contention, but even after that it was still too slow.

Parsing the format string and generating decimal digits in strings appears to be the culprit in terms of CPU cycle usage.

do you really need the numbers in decimal or can you cheat and just print all the numbers in hex?

 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #8 on: January 18, 2025, 03:20:59 pm »
I don't have any control over the third party library. It uses printf*, and all of it's debugging functionality is either "on", or "off."

*or rather, it calls a function that I supply that is expected to behave like printf

You might have to ditch debugging inside the library, or ditch the library, or use two different debugging mechanisms. All of those would suck.

Does the library "play well" in a multithread/multicore environment? If not, that would be another reason to ditch the library.

"behave like printf" presumably means varargs and the return value, and nothing else. That's the only printf() behaviour visible within that library.

About 40 years ago I re-wrote _putc() so that printf() could be used in a hard realtime system. That system was, however, very well disciplined w.r.t. threads and priorities. I don't think it is a general purpose solution, but with discipline it worked.

Hasn't somebody written a C equivalent of the Java log4j library yet? That enables two-dimensional fine-grained logging to be turned on/off while running. Virtually zero overhead for the bits that are off.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #9 on: January 18, 2025, 03:23:54 pm »
Write a wrapper to _write which copies the data to a queue and runs the uart in the background in interrupt mode.
That won't block the program execution like now.
Also, try increasing the baudrate which is likely the culprit.
I already added buffering to _write and it still was too slow! It is a multi-threaded "real time" system that has a dedicated thread for the UART. I ended up giving each thread its own buffer to avoid contention, but even after that it was still too slow.

Parsing the format string and generating decimal digits in strings appears to be the culprit in terms of CPU cycle usage.

do you really need the numbers in decimal or can you cheat and just print all the numbers in hex?

Store the bits in the foreground threads, mutate them into decimal digits in a background thread. Basic point: printf() sucks rocks in constrained environments; hence printk().

You'll have to be inventive with "behaves like printf" in the foreground threads, but that's not difficult.
« Last Edit: January 18, 2025, 03:27:06 pm by tggzzz »
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #10 on: January 18, 2025, 03:29:26 pm »
.. ditch the library ..

The device's sole purpose is to run this library. The library is very nearly a "black box" from the perspective of the device.

Does the library "play well"

No, it is badly behaved and brittle. It does not tolerate even fractional millisecond delays generated by printf string processing. But it is effectively the only and highest priority thing running, higher priority than even things like the UART,

I fear even the small amount of time spent in timer interrupts (systick) might be too large for the library's thread when it is doing real-time work.
« Last Edit: January 18, 2025, 03:34:16 pm by incf »
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #11 on: January 18, 2025, 03:32:06 pm »
do you really need the numbers in decimal or can you cheat and just print all the numbers in hex?
Unfortunately, most have to be formatted decimal to be of any value in system diagnostics.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #12 on: January 18, 2025, 03:36:06 pm »
... Basic point: printf() sucks rocks in constrained environments; hence printk(). ...

Does printk save the variadic arguments/actually generating the string, for later?
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #13 on: January 18, 2025, 03:43:45 pm »
... Basic point: printf() sucks rocks in constrained environments; hence printk(). ...

Does printk save the variadic arguments/actually generating the string, for later?

No idea; RTFM. Probably irrelevant for your application.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4072
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #14 on: January 18, 2025, 03:45:36 pm »
Does it have to be formatted on device? Can you send binary data and format strings over the uart and have the host do the formatting?

Otherwise processing in a background thread will probably work as long as your CPU is actually fast enough to keep up.  If you already have a preemptive multitasking RTOS it shouldn't be that difficult to implement.
 
The following users thanked this post: 5U4GB

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #15 on: January 18, 2025, 03:46:39 pm »
.. ditch the library ..

The device's sole purpose is to run this library. The library is very nearly a "black box" from the perspective of the device.

Does the library "play well"

No, it is badly behaved and brittle. It does not tolerate even fractional millisecond delays generated by printf string processing. But it is effectively the only and highest priority thing running, higher priority than even things like the UART,

I fear even the small amount of time spent in timer interrupts (systick) might be too large for the library's thread when it is doing real-time work.

With a library that is so brittle in a realtime environment, you need to ditch that library.

If you don't, you will be playing whack-a-mole in the field.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #16 on: January 18, 2025, 03:49:25 pm »
Does it have to be formatted on device? Can you send binary data and format strings over the uart and have the host do the formatting?

Otherwise processing in a background thread will probably work as long as your CPU is actually fast enough to keep up.  If you already have a preemptive multitasking RTOS it shouldn't be that difficult to implement.
I'm free to do anything I want, but only after it calls the printf function that I supply to the library.

It's just a matter of getting the formatting string and all of the arguments out into a buffer someplace fast enough so that I can process it later (either in another thread, or on the PC).
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #17 on: January 18, 2025, 03:53:33 pm »
With a library that is so brittle in a realtime environment, you need to ditch that library.

If you don't, you will be playing whack-a-mole in the field.

I have no choice in the mater. The library, the processor, and the realtime-requirements are not changable in this application.

Best I can do is work diligently to ensure that worse case timing margins are as large as possible
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #18 on: January 18, 2025, 03:57:01 pm »
... two-dimensional fine-grained logging to be turned on/off while running. ...

While I can't change my library. I am curious what is "two-dimensional logging" ? Wiki page for log4j is silent on the matter.
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #19 on: January 18, 2025, 03:57:53 pm »
My (possibly naive) suggestion, which of course you're free to ignore if it's not possible or useful to you:

It sounds as if you can substitute your own printf() function for the one the library uses.
If so, good: what I'd do is in your replacement function, save the printf() arguments to your RAM, in a globally-accessible location.
You'll need a little data structure that includes the # of arguments, easy to implement.
When your printf() gets called, set a global flag indicating "there's data that needs to be printed".
Then, in another process or loop that runs when the CPU is idle, go through your list (array) of data structures and print them with the "back half" of your printf(), the one that actually outputs the data.

Let me know if I'm just plain wrong about this. (Wouldn't be the first time.)
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #20 on: January 18, 2025, 04:07:16 pm »
My (possibly naive) suggestion, which of course you're free to ignore if it's not possible or useful to you:

It sounds as if you can substitute your own printf() function for the one the library uses.
If so, good: what I'd do is in your replacement function, save the printf() arguments to your RAM, in a globally-accessible location.
You'll need a little data structure that includes the # of arguments, easy to implement.
When your printf() gets called, set a global flag indicating "there's data that needs to be printed".
Then, in another process or loop that runs when the CPU is idle, go through your list (array) of data structures and print them with the "back half" of your printf(), the one that actually outputs the data.

Let me know if I'm just plain wrong about this. (Wouldn't be the first time.)

You are correct, I think. I posted this thread to ask about the exact approach you suggested.

Really, I was hoping someone might have insight on making that process as fast as possible.

For example, if it might be possible to copy a chunk of the printf stack? Having to parse the format string (to get the number and type of arguments) and copy the variadic arguments one by one is a step that I would not mind pushing off to a background thread. But I'm not sure if it is even possible.

Edit: if I understand correctly, on gcc, variadic arguments appear to be stored on just like regular function arguments on the stack. Although I don't really understand the specifics, like how a printf like function would get the stack pointer back to where it should be if it doesn't know the number of arguments.



Edit 2: https://stackoverflow.com/a/36881737

I don't think I can make an array of va_list in C.
« Last Edit: January 18, 2025, 05:08:20 pm by incf »
 

Offline edavid

  • Super Contributor
  • ***
  • Posts: 3540
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #21 on: January 18, 2025, 04:37:44 pm »
If the library uses all or mostly static format strings, you could hash the format string address to access a lookup table with the number of arguments etc.  That should be faster than parsing the format string.  Of course you would have to regenerate the hash table when the library was updated.

Or if you only/mostly need to handle numeric arguments rather than strings, you could do some tracing to figure out the maximum number of arguments, then always save that many.

P.S. Is there any way to upgrade your CPU, or add a "printf coprocessor" to the system?

P.P.S. Have you tried telling your management that the library doesn't work right?
« Last Edit: January 18, 2025, 04:39:26 pm by edavid »
 
The following users thanked this post: 5U4GB

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #22 on: January 18, 2025, 04:44:46 pm »
... two-dimensional fine-grained logging to be turned on/off while running. ...

While I can't change my library. I am curious what is "two-dimensional logging" ? Wiki page for log4j is silent on the matter.

One dimension is the logging level, i.e. note, warning, error, panic and similar.

The other dimension is a hierarchical tree of components/subsystems, so a "comms" subsystem might have "uart", "ethernet", "led", lcd" sub components.

Each dimension is independent, so you might have "panic" turned on for all everything, "warning" (and worse) for ethernet and uart, and "note" for lcd.

If a level isn't turned on for a component, the only time penalty is a function call and testing a flag.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #23 on: January 18, 2025, 04:55:28 pm »
With a library that is so brittle in a realtime environment, you need to ditch that library.

If you don't, you will be playing whack-a-mole in the field.

I have no choice in the mater. The library, the processor, and the realtime-requirements are not changable in this application.

Best I can do is work diligently to ensure that worse case timing margins are as large as possible

The only system where I know that can be correctly done in a hard realtime system is xC running on xCORE. The IDE inspects the optimised binaries to tell you the max number of cycles between here and there. None of this "run and hope I've spotted the worst case" rubbish. "Hope" because interrupts and cache misses make the instruction timing in (other) modern processors very variable. The xC+xCORE system has no interrupts, no caches, up to 4000MIPS 32 cores/chip (expandable).

There is a very solid theoretical underpinning to the hardware and software, dating back to the 1980s (hardware, Transputer) and 1970s (software, CSP, Occam).  Buy them at DigiKey. I found them very easy to use, e.g. short manual without errata.

In your system, I hope you can guarantee thread safety, e.g. that printf being called from multiple threads doesn't cause characters to be interleaved in the output stream.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #24 on: January 18, 2025, 04:57:12 pm »
P.P.S. Have you tried telling your management that the library doesn't work right?

I fear the customers will report intermittent unreproducible errors.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #25 on: January 18, 2025, 05:03:22 pm »
P.S. Is there any way to upgrade your CPU, or add a "printf coprocessor" to the system?

With xC+xCORE, I've used one core for each input, one for central controller, one for front panel, and the USB comms library used 8 cores (I think). The RTOS is implemented in hardware :)

Guaranteed by design hard realtime processing fully occupied the two cores processing the inputs (<2% spare cycles), at the same time as bidirectional comms with an application running on the PC.

The first time I tried using xC+xCORE it was so painless that I had the first very simple version doing all that within a day of receiving the hardware. That made hard realtime processing fun again. :)
« Last Edit: January 18, 2025, 05:05:44 pm by tggzzz »
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4072
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #26 on: January 18, 2025, 05:06:43 pm »

For example, if it might be possible to copy a chunk of the printf stack? Having to parse the format string (to get the number and type of arguments) and copy the variadic arguments one by one is a step that I would not mind pushing off to a background thread. But I'm not sure if it is even possible.

You can't do that in portable C as the number and size of arguments is not stored explicitly.  The only portable way to operate on va_list is with the va_start, va_arg, va_copy, and va_end macros, or to pass them to another function that does the same.

Parsing the format string and finding the arguments is a bit of work to code but should be pretty fast.  So I would try that approach first.  The main overhead of printf is formatting the outputs.

If you are OK with non portable code it's likely possible to examine the stack frame to find the beginning and end of the varargs block.  How to do this depends on your platform ABI and likely requires some assembly.
« Last Edit: January 18, 2025, 05:08:30 pm by ejeffrey »
 

Online nctnico

  • Super Contributor
  • ***
  • Posts: 28609
  • Country: nl
    • NCT Developments
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #27 on: January 18, 2025, 05:22:25 pm »
With a library that is so brittle in a realtime environment, you need to ditch that library.

If you don't, you will be playing whack-a-mole in the field.

I have no choice in the mater. The library, the processor, and the realtime-requirements are not changable in this application.
One option is to print the floating point numbers as hex and mark these in the output for processing later on (like prefixing these numbers with a special character). Or go a step further and print every numerical argument as hexadecimal. You can't avoid going through the formatting string because you need to determine the length of the argument given (a double is 8 bytes for example) but you can skip expensive conversions. If you have enough flash or ram, byte to ASCII conversion can be done through a look-up table.
« Last Edit: January 18, 2025, 05:26:35 pm by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #28 on: January 18, 2025, 05:46:45 pm »
In debug mode it easily generates several kilobytes of formatted number containing text per minute. And, what the library does is so complicated that we really need the debug output to be generated all of the time (both in development and production).

Unfortunately, It turns out that printf is so slow that it causes the system to fail in a variety of subtle yet completely catastrophic ways. (due to it missing real time deadlines and not being able to detect that it has missed them)

Has anyone dealt with this before?
Not the exact situation, but I've worked a lot on high-performance computing, especially on classical-type molecular dynamics simulators, which generate a LOT of numerical data.  For example, a colleague ran a large simulation with short snapshot time intervals which produced some 4 Tb of data, basically in Protein Data Bank file format –– essentially formatted text with many numeric fields.

I've specialized in simulator development.  Currently, the main problem is very similar to what you see: instead of interleaving communications and state snapshotting with computation, they compute, then communicate/snapshot, then compute, and so on, wasting a significant fraction of available CPU time.  (Some people have objected to my complaint by saying that if they fully utilized the CPUs in the clusters, it "might overwhelm their cooling capability", which to me is an asinine cop-out.)

The solution here is obvious: instead of formatting the output immediately, you need to store the output data, and optionally format it at leisure.

Because printf() format string parsing takes a surprising amount of unnecessary CPU time, and you have to do that in order to determine the number and type of each parameter, I would recommend reimplementing the output using individual functions per parameter type.  That is, you convert
    printf("Foo %d bar %.4f baz\n", x, y);
into something like
    // Converted from printf("Foo %d bar %.4f baz\n", x, y);
    emit_const("Foo ");
    emit_int(x);
    emit_const(" bar ");
    emit_float(y, 4);
    emit_const(" baz");
    emit_newline();
Note that I'm assuming you use C and not C++ here.  With C++, overloading << is a better approach.

The emit_() family of functions (or the overloaded operator in C++) appends the type of the parameter and the parameter value to a circular buffer.  This takes minimal time.

When the processor is idle, it converts one parameter from the circular buffer to output and returns, minimizing the latency/duration to one single conversion at a time.  You can then choose whether you allow the buffer to overrun, or whether you convert from the circular buffer before adding more to it.  It is even possible to flush this buffer to storage (say microSD card or similar).

In C, I would separate the parameter type into its own circular buffer, and the parameter data to its own fixed-size (32- or 64-bit/4- or 8-byte) records:
Code: [Select]
#define  LOG_SIZE  29127 // Number of elements in the circular buffer; this is 262143 == 256k - 1

union log_item_union {      /* Type, example values only */
                            //      0      reserved for unused item
    char            c[8];   //   1 .. 8,   length
    uint8_t         u8[8];  //   9 .. 16,  count+8
    int8_t          i8[8];  //  17 .. 24,  count+16
    uint16_t        u16[4]; //  25 .. 28,  count+24
    int16_t         i16[4]; //  29 .. 32,  count+28
    uint32_t        u32[2]; //  33 .. 34,  count+32
    int32_t         i32[2]; //  35 .. 36,  count+34
    uint64_t        u64[1]; //     37
    int64_t         i64[1]; //     38
    float           f32[2]; //  39 .. 40,  count+38
    double          f64[1]; //     41
    const char     *cc;     // 42 .. 255, length+41
};

uint8_t                 log_type[LOG_SIZE];
union log_item_union    log_item[LOG_SIZE];
uint_fast16_t           log_head = 0;   // Next free item in buffer
uint_fast16_t           log_tail = 0;   // Oldest item in buffer, unless equal to head
This uses 9 bytes of RAM per log item.  The type specifies the item type in the 64-bit union, and supports various numeric types and vectors (whatever fits into 64 bits), various constant strings stored in Flash, plus short strings of up to 8 bytes/chars long.  The reason for not interleaving the type and the item data is hardware alignment requirements; typically, larger than byte sized items either require an aligned address, or are slower to access from an unaligned address, depending on the hardware architecture used.

Note that the above format would also allow emit_2float(f1,f2), emit_i16(i1), emit_2i16(i1,i2), emit_3i16(i1,i2,i3), emit_4i16(i1,i2,i3,i4), and so on.

Note that you can then choose whether you convert the circular buffer contents to strings at your leisure, or whether you simply write them to some serial stream.  The key is to do it when your CPU is otherwise idle, and keep each conversion/write operation short enough to not block so long as to affect the main computation.  Writing the data synchronously just will not work; the circular buffer here is what makes the prints asynchronous, and lets you handle the actual operations later on in small enough chunks to not tie up your resources for too long.  For floating point types, using a fixed number of decimals and a limited range (say, -99999999.99999999 to +99999999.99999999) means you can speed up the conversion significantly compared to printf() et al.; it is supporting values very close to zero or very large in magnitude that makes generic precise conversion to string slow.
 
The following users thanked this post: helius

Offline magic

  • Super Contributor
  • ***
  • Posts: 7538
  • Country: pl
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #29 on: January 18, 2025, 06:27:48 pm »
If the library only uses integer numeric formats (%d, %u, %x, etc) and the format strings are always statically allocated then it should be simply a matter of saving all arguments to printf to a buffer (as raw 32/64 bit words) and calling real printf on them at a later time.

Not sure if it would work with floats, I'm not familiar with typical calling conventions.

%s could be a problem if the strings passed to the format may be deallocated or overwritten by the library, then your fake print must make a copy.


If you find that some problematic cases exist, you may write your own lightweight format parser which only decides how to copy and store each argument. Maybe you could extract all format strings from the library ahead of time and generate optimized functions for each, then use a hash table to dispatch the functions based on format string pointer (this again assumes that format strings are statically allocated and immutable, which is the usual case).

Have fun :D
And, of course, you shouldn't be using crap libraries in the first place.
 

Online radiolistener

  • Super Contributor
  • ***
  • Posts: 4252
  • Country: 00
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #30 on: January 18, 2025, 06:32:09 pm »
technically you can write your own printf, which will use linked list to store passed arguments (format string and values), then you can process it from separate background thread - format string and write it to slow output. Note, that you're needs to copy string buffer with format string and arguments, because they can be changed when printf call returns control to the caller.

The only issue here is to detect size of passed argument. Usually I'm using such approach on managed languages, where you can check actual data type at runtime, I'm not sure if there is fast and simple alternative for C. I think there is possible problem to detect argument size for %s format, because it may have more than standard 4/8 bytes, like int/float/double

More reliable way is just to store data in queue and then process it in separate thread. But it will be less comfortable than usual printf. This is how I do such tasks in C.
« Last Edit: January 18, 2025, 06:40:45 pm by radiolistener »
 

Offline magic

  • Super Contributor
  • ***
  • Posts: 7538
  • Country: pl
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #31 on: January 18, 2025, 06:34:37 pm »
Don't bother with lists. If the machine has "lots of free RAM" then simply use an array of arrays, each long enough to store the worst case number of parameters.
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4072
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #32 on: January 18, 2025, 06:41:39 pm »
If the library only uses integer numeric formats (%d, %u, %x, etc) and the format strings are always statically allocated then it should be simply a matter of saving all arguments to printf to a buffer (as raw 32/64 bit words) and calling real printf on them at a later time.

"Saving all arguments" is the entire hard part.  The way varargs work in C makes this difficult to do.

However if you can enumerate the possible format strings at compile time, you could build a hash table of format strings -> precomputed # of arguments.  That would save parsing the format strings at runtime. 
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #33 on: January 18, 2025, 07:50:36 pm »
If one uses an ELF-based toolchain, then you can do a "trick" at build time to record all the printf() formats your code uses.

With GCC and clang-based toolchains, you can do e.g.
    #define  logprintf(fmt, ...) \
             { static const char format[] __attribute__((section ("printformats"))) = fmt; } \
             printf(fmt, __VA_ARGS__)
and then compile your firmware; then run
    objcopy -O binary -j printformats input.o output.bin
to get all the format strings in input.o with at least one nul (\0) in between in output.bin.  I like to use something like the following C99 or later program to extract them in their original C string format with the number of bytes it takes:
Code: [Select]
// SPDX-License-Identifier: CC0-1.0
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>

const char special[9][4] = {
    [0] = "\\a",
    [1] = "\\b",
    [2] = "\\t",
    [3] = "\\n",
    [4] = "\\v",
    [5] = "\\f",
    [6] = "\\r",
    [7] = "\\\"",
    [8] = "\\\\",
};

void read_from(FILE *in, FILE *out) {
    while (1) {
        size_t  n = 0;
        int     c = getc(in);

        while (c == 0)
            c = getc(in);

        if (c == EOF)
            return;

        fputc('"', out);
        while (c != EOF && c != 0) {
            n++;
            if (c >= 7 && c <= 13) {
                fputs(special[c - 7], out);
                c = getc(in);
            } else
            if (c == 34) { // '"'
                fputs(special[7], out);
                c = getc(in);
            } else
            if (c == 92) { // '\\'
                fputs(special[8], out);
                c = getc(in);
            } else
            if (c >= 32 && c <= 126) {
                fputc(c, out);
                c = getc(in);
            } else {
                fputc('\\', out);
                fputc('0' + ((c / 64) & 3), out);
                fputc('0' + ((c / 8)  & 7), out);
                fputc('0' + ( c       & 7), out);
                c = getc(in);
            }
        }
        fprintf(out, "\" (%zu bytes)\n", n);
    }
}

static int  stdin_dumped = 0;

static int  dump_stdin(int status) {
    if (stdin_dumped)
        return status;

    stdin_dumped = 1;

    read_from(stdin, stdout);
    fflush(stdout);
    if (ferror(stdin)) {
        fprintf(stderr, "Error reading from standard input.\n");
        status = EXIT_FAILURE;
    }
    if (ferror(stdout)) {
        fprintf(stderr, "Error writing to standard output.\n");
        status = EXIT_FAILURE;
    }

    return status;
}

int main(int argc, char *argv[]) {
    int  status = EXIT_SUCCESS;

    if (argc > 1 && (!strcmp(argv[1], "-h") || !strcmp(argv[1], "--help"))) {
        const char *arg0 = (argc > 0 && argv != NULL && argv[0] != NULL && argv[0][0] != '\0') ? argv[0] : "(this)";

        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s -h | --help\n", arg0);
        fprintf(stderr, "       %s [ INPUT-FILE ... ]\n", arg0);
        fprintf(stderr, "\n");
        fprintf(stderr, "This program will output all null-terminated C strings in C notation\n");
        fprintf(stderr, "from the standard input (\"-\") or specified file(s).\n");
        fprintf(stderr, "\n");

        return EXIT_SUCCESS;
    }

    if (argc > 1) {
        for (int arg = 1; arg < argc; arg++) {
            if (!argv[arg] || argv[arg][0] == '\0' || !strcmp(argv[arg], "-")) {
                status = dump_stdin(status);
            } else {
                FILE *in = fopen(argv[arg], "rb");
                if (!in) {
                    fprintf(stderr, "%s: %s.\n", argv[arg], strerror(errno));
                    status = EXIT_FAILURE;
                    break;
                }
                read_from(in, stdout);
                if (ferror(in)) {
                    fprintf(stderr, "%s: Read error.\n", argv[arg]);
                    status = EXIT_FAILURE;
                    fclose(in);
                } else
                if (fclose(in)) {
                    fprintf(stderr, "%s: %s.\n", argv[arg], strerror(errno));
                    status = EXIT_FAILURE;
                }
                if (ferror(stdout)) {
                    fprintf(stderr, "Error writing to standard output.\n");
                    status = EXIT_FAILURE;
                }
            }
            if (status != EXIT_SUCCESS)
                break;
        }
    } else {
        status = dump_stdin(status);
    }

    return status;
}
In Linux, you can do the above for all object files in your build directory, and filter the output through | sort | uniq, and you get only the unique format strings used:
    find . -name '*.o' -exec objcopy -j printformats -O binary '{}' /dev/stdout ';' | cstrings | sort | uniq

I love the ELF shenanigans I can do in my build scripts and C/C++ sources! ;D
 
The following users thanked this post: 5U4GB

Offline edavid

  • Super Contributor
  • ***
  • Posts: 3540
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #34 on: January 18, 2025, 08:21:01 pm »
However if you can enumerate the possible format strings at compile time, you could build a hash table of format strings -> precomputed # of arguments.  That would save parsing the format strings at runtime.

OP said he doesn't have source to the library.  This is kind of obvious since if he did, he could replace the printf calls with some lighter weight tracing, or just debug the library so he wouldn't need the debug output.

If one uses an ELF-based toolchain, then you can do a "trick" at build time to record all the printf() formats your code uses.

Doesn't anyone read the thread before posting  :-//

 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4072
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #35 on: January 18, 2025, 08:33:34 pm »
However if you can enumerate the possible format strings at compile time, you could build a hash table of format strings -> precomputed # of arguments.  That would save parsing the format strings at runtime.

OP said he doesn't have source to the library.  This is kind of obvious since if he did, he could replace the printf calls with some lighter weight tracing, or just debug the library so he wouldn't need the debug output.

That doesn't mean you can't extract the format strings, although it does make it more difficult.  The OP already said that the format strings were all just formatted numbers and short literals.  At the very worst you can make a printf that only emits the format strings, and run through enough tests to generate all important format strings.
 

Offline edavid

  • Super Contributor
  • ***
  • Posts: 3540
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #36 on: January 18, 2025, 08:35:30 pm »
However if you can enumerate the possible format strings at compile time, you could build a hash table of format strings -> precomputed # of arguments.  That would save parsing the format strings at runtime.

OP said he doesn't have source to the library.  This is kind of obvious since if he did, he could replace the printf calls with some lighter weight tracing, or just debug the library so he wouldn't need the debug output.

That doesn't mean you can't extract the format strings, although it does make it more difficult.  The OP already said that the format strings were all just formatted numbers and short literals.  At the very worst you can make a printf that only emits the format strings, and run through enough tests to generate all important format strings.

Yes, I suggested just that.  What you can't do, is do it at compile time.

(Except sort of, by running strings on the library object file.)
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4072
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #37 on: January 18, 2025, 08:44:47 pm »
Sorry I didn't mean "use the compile to extract the list" I meant get the list ahead of time so you can build a compiled lookup table.  It seems the OP likely already and the list.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #38 on: January 18, 2025, 09:28:59 pm »
Doesn't anyone read the thread before posting  :-//

Given the number of times the same partial solution has been suggested (starting in the first reply!), the answer is a resounding no.

To me it looks like the system+library as described is not "fit for purpose". If the management can't work that out and/or refute it, their customers will.

Place your bets :)
« Last Edit: January 18, 2025, 09:30:38 pm by tggzzz »
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 
The following users thanked this post: abeyer

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #39 on: January 18, 2025, 09:30:10 pm »
If one uses an ELF-based toolchain, then you can do a "trick" at build time to record all the printf() formats your code uses.

With GCC and clang-based toolchains, you can do e.g.
    #define  logprintf(fmt, ...) \
             { static const char format[] __attribute__((section ("printformats"))) = fmt; } \
             printf(fmt, __VA_ARGS__)
and then compile your firmware; then run
    objcopy -O binary -j printformats input.o output.bin
to get all the format strings in input.o with at least one nul (\0) in between in output.bin.  I like to use something like the following C99 or later program to extract them in their original C string format with the number of bytes it takes:

...

I love the ELF shenanigans I can do in my build scripts and C/C++ sources! ;D

That is a fantastic trick. Not sure I can use it on this issue, but I can easily see myself doing something like that in the future.

I'm fairly convinced that I'll end up doing some assembly to copy the stack containing the variadic arguments list and format string into a buffer, and then "reconstitute" it as a call to snprintf later.
« Last Edit: January 18, 2025, 09:34:02 pm by incf »
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6466
  • Country: es
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #40 on: January 18, 2025, 10:50:34 pm »
Do you really need printf?
For simple strings, use your own function.
If only printing integers, you may use itoa() instead...

va_copy() macro looks intesting, though I've never made variadic functions and I don't really know how they work under the hood.

https://learn.microsoft.com/en-en/cpp/c-runtime-library/reference/va-arg-va-copy-va-end-va-start?view=msvc-170.
« Last Edit: January 18, 2025, 10:57:57 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #41 on: January 18, 2025, 11:07:02 pm »
Do you really need printf?
For simple strings, use your own function.
If only printing integers, you may use itoa() instead...

va_copy() macro looks intesting, though I've never made variadic functions and I don't really know how they work under the hood.

https://learn.microsoft.com/en-en/cpp/c-runtime-library/reference/va-arg-va-copy-va-end-va-start?view=msvc-170.

Yes, I really do need something fairly close to printf to execute eventually. And even a minimal implementation would be too slow to run in real time.

I was under the impression that I can't really store va_list due to it having an unknown size at runtime (correct me if I am wrong).
I believe I have to write some assembly instead in order to read the total size of the parameters list (from a combination of the stack pointer and R7 - although, I am not 100% certain)
« Last Edit: January 18, 2025, 11:17:09 pm by incf »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16100
  • Country: fr
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #42 on: January 18, 2025, 11:10:33 pm »
But again, as edavid said, the OP's problem is that this is done in a third-party library the source of which he doesn't have access to, if I got it right. So I guess we should stop suggesting replacing function calls on a source code level, like using macros.

After that, you could resort to doing it on a link level, but that becomes a mess quickly: replacing all references to printf to a different, own symbol for a custom function which has the exact same interface. And, that assumes that the black box library code only uses the "printf" function itself and never any of its variants (printf is a whole family of functions), which is possibly a dead-end.

If most of the time spent in those "printf" calls is not in the formatting code itself, but in sending it to whatever it is redirected to (like an UART?), then you could, much easier, just re-implement the '_write' function and buffer whatever is passed to it in a RAM buffer rather than send it directly to a peripheral.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #43 on: January 18, 2025, 11:21:14 pm »
But again, as edavid said, the OP's problem is that this is done in a third-party library the source of which he doesn't have access to, if I got it right. So I guess we should stop suggesting replacing function calls on a source code level, like using macros.

After that, you could resort to doing it on a link level, but that becomes a mess quickly: replacing all references to printf to a different, own symbol for a custom function which has the exact same interface. And, that assumes that the black box library code only uses the "printf" function itself and never any of its variants (printf is a whole family of functions), which is possibly a dead-end.

If most of the time spent in those "printf" calls is not in the formatting code itself, but in sending it to whatever it is redirected to (like an UART?), then you could, much easier, just re-implement the '_write' function and buffer whatever is passed to it in a RAM buffer rather than send it directly to a peripheral.

I'm already doing two layers of buffering. IO is not an issue, it's a straightforward lack of CPU power doing string formatting operations as far as I can tell with my testing.

Hence the desire to use assembly to save the printf variable arguments list into a buffer for later processing. (Which is unfamiliar territory for me, and probably most people here - variadic arguments + C ABI for ARM cortex M0 + assembly, it's probably more than just architecture specific it is likely compiler specific - I'm using gcc)
« Last Edit: January 18, 2025, 11:29:06 pm by incf »
 

Online mwb1100

  • Frequent Contributor
  • **
  • Posts: 675
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #44 on: January 18, 2025, 11:31:03 pm »
va_copy() macro looks intesting, though I've never made variadic functions and I don't really know how they work under the hood.



I was under the impression that I can't really store va_list due to it having an unknown size at runtime (correct me if I am wrong).

You can't store the data in the va_list using only va_copy - va_copy stores the current state of the source va_list into the destination va_list.  It allows you to access the variables in the va_list again after having modified the original va_list (for example in order to make two passes over the variable arguments).

Copying the data in the va_list would require some parsing of the printf() format string to identify the number of arguments and their size. However if a format string contains a specification that passes data via a pointer, such as "%s", then your wrapper would need to copy the buffer that the pointer points to instead of just the pointer (unless you perhaps determine that the pointer points into flash).  The copy operation wouldn't need to do any actual formatting though, so it's possible that it could be fast enough for your purposes. 

If it isn't fast enough, then I think you will probably need to perform the build time (or pre-build) analysis of format strings extracted from the library mentioned previously.  Keep in mind that format strings can be generated at runtime so you might need to handle that corner case.  But I suspect that for logging it's likely that all format strings you'd see in this application are literals.

« Last Edit: January 19, 2025, 12:08:09 am by mwb1100 »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #45 on: January 18, 2025, 11:32:30 pm »
I'm fairly convinced that I'll end up doing some assembly to copy the stack containing the variadic arguments list and format string into a buffer, and then "reconstitute" it as a call to snprintf later.
That will not work, because the size of each argument varies, and <stdarg.h> does not provide a way to copy the parameters as a "context" for later reuse.  Even va_copy() simply allows you to traverse the argument list more than once in the same block.  This is a fundamental limitation in C, because the callee simply cannot know how many parameters the caller provided; it can only assume the caller provided all that were specified in the prototype.  For variadic arguments, there is absolutely no way in C for the callee to find out how many the caller supplied.  It is easy to work around at the source code level, but not for already compiled code.

What you can do is implement your own printf() that does the conversion I described in my first suggestion internally, via say something like
Code: [Select]
#define  _POSIX_C_SOURCE  200809L
#define  _GNU_SOURCE                // For ssize_t
#include <stdlib.h>
#include <stddef.h>
#include <stdint.h>
#include <wchar.h>      // Only if %lc and %ls are supported (wint_t and wchar_t)
#include <stdarg.h>

int printf(const char *format, ...) {
    va_list     args;

    if (!format || format[0] == '\0')
        return 0;

    va_start(args, format);

    while (*format) {
        if (*format == '%' && format[1] != '%' && format[1] != '\0') {
            // TODO: This simplified implementation ignores position and formatting, and accepts invalid formats!
            //       It does not support variable precision via %* or %*.* either.

            unsigned char  size = 0;
            while (*format != '\0') {
                if (*format == 'h') {
                    size |= ((size & 1) << 1) | 1;
                } else
                if (*format == 'l') {
                    size |= ((size & 4) << 1) | 4;
                } else
                if (*format == 'L') {
                    size |= 12;
                } else
                if (*format == 'z') {
                    size |= 16;
                } else
                if (*format == 'j') {
                    size |= 32;
                } else
                if (*format == 't') {
                    size |= 64;
                } else
                if (*format == 'd' || *format == 'i') {
                    if (size == 0) {
                        int  v = va_arg(args, int);
                        // TODO: Store integer 'v'
                    } else
                    if (size == 1) {
                        short  v = (short)va_arg(args, int);
                        // TODO: Store short 'v'
                    } else
                    if (size == 3) {
                        signed char  v = (signed char)va_arg(args, int);
                        // TODO: Store signed char 'v'
                    } else
                    if (size == 4) {
                        long  v = va_arg(args, long);
                        // TODO: Store long 'v'
                    } else
                    if (size == 12) {
                        long long  v = va_arg(args, long long);
                        // TODO: Store long long 'v'
                    } else
                    if (size == 16) {
                        ssize_t  v = va_arg(args, ssize_t);
                        // TODO: Store ssize_t 'v'
                    } else
                    if (size == 32) {
                        intmax_t  v = va_arg(args, intmax_t);
                        // TODO: Store intmax_t 'v'
                    } else
                    if (size == 64) {
                        ptrdiff_t  v = va_arg(args, ptrdiff_t);
                        // TODO: Store ptrdiff_t 'v'
                    } else {
                        // TODO: Unsupported conversion!
                    }
                    format++;
                    break;
                } else
                if (*format == 'o' || *format == 'u' || *format == 'x' || *format == 'X') {
                    if (size == 0) {
                        unsigned int  v = va_arg(args, unsigned int);
                        // TODO: Store unsigned integer 'v'
                    } else
                    if (size == 1) {
                        unsigned short  v = (unsigned short)va_arg(args, unsigned int);
                        // TODO: Store unsigned short 'v'
                    } else
                    if (size == 3) {
                        unsigned char  v = (unsigned char)va_arg(args, unsigned int);
                        // TODO: Store unsigned char 'v'
                    } else
                    if (size == 4) {
                        unsigned long  v = va_arg(args, unsigned long);
                        // TODO: Store unsigned long 'v'
                    } else
                    if (size == 12) {
                        unsigned long long  v = va_arg(args, unsigned long long);
                        // TODO: Store unsigned long long 'v'
                    } else
                    if (size == 16) {
                        size_t  v = va_arg(args, size_t);
                        // TODO: Store size_t 'v'
                    } else
                    if (size == 32) {
                        uintmax_t  v = va_arg(args, uintmax_t);
                        // TODO: Store uintmax_t 'v'
                    } else {
                        // TODO: Unsupported conversion!
                    }
                    format++;
                    break;
                } else
                if (*format == 'e' || *format == 'E' || *format == 'f' || *format == 'F' ||
                    *format == 'g' || *format == 'G' || *format == 'a' || *format == 'A') {
                    double  v = va_arg(args, double);
                    // TODO: Store double 'v'
                    format++;
                    break;
                } else
                if (*format == 'c') {
                    if (size == 0) {
                        unsigned char  v = (unsigned char)va_arg(args, int);
                        // TODO: Store single character 'v'
                    } else
                    if (size == 4 || size == 12) {
                        wint_t  v = va_arg(args, wint_t);
                        // TODO: Store wide character 'v'
                    } else {
                        // TODO: Unsupported conversion!
                    }
                    format++;
                    break;
                } else
                if (*format == 's') {
                    if (size == 0) {
                        const char  *p = va_arg(args, char *);
                        // TODO: Store string at 'p'
                    } else
                    if (size == 4 || size == 12) {
                        const wchar_t  *p = va_arg(args, wchar_t *);
                        // TODO: Store wide string at 'p'
                    } else {
                        // TODO: Unsupported conversion!
                    }
                    format++;
                    break;
                } else
                if (*format == 'p') {
                    uintptr_t  p = (uintptr_t)va_arg(args, void *);
                    // TODO: Store the address in 'p' (output would be in hexadecimal).
                    format++;
                    break;
                } else
                if (*format == 'n') {
                    // TODO: %n is unsupported!
                    format++;
                    break;
                }

                format++;
            }

        } else {
            const char *ends = format;

            while (*ends != '\0') {
                if (*ends == '%' && ends[1] != '%' && ends[1] != '\0')
                    break;
                else
                    ends++;
            }

            // NOTE: This keeps %% for simplicity.
            // TODO: Store (size_t)(ends - format) -character string starting at format.

            format = ends;
        }
    }

    va_end(args);

    // Fake return value, because we do not actually know the length of the string yet.
    return 0;
}
Essentially, you need to parse the format string, and all format specifications.  The above does not support the %N... format, where N is the parameter index to be converted, nor does it support %n, but something along these lines might suffice for your needs.  Since you store the exact value to be printed, I don't see any particular need to remember how it should be formatted, either.

If you can check all the formatting strings the closed-source code does (you can find them using objdump/objcopy rather easily, since you should have the object files or the library in linkable form; ELF file.a being just a collection of ELF object file.o files; see your toolchain ar documentation), you can probably omit quite a few of the cases.  In particular, %lc and %ls are exceedingly rare in embedded environments, so you can probably omit <wchar.h>, wint_t, and wchar_t * type support.  You may not need double support at all.  And so on.

My own next step in your shoes would be to locate and inspect all the formatting strings used.  I would go as far as decompiling the closed-source file around each call to printf(), examining what the value of the first parameter (register or stack depending on function call ABI used) is, to check if any are dynamically generated, and to ensure I find all possible formatting strings I need to support.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #46 on: January 18, 2025, 11:46:34 pm »
I'm fairly convinced that I'll end up doing some assembly to copy the stack containing the variadic arguments list and format string into a buffer, and then "reconstitute" it as a call to snprintf later.
That will not work, because the size of each argument varies, and <stdarg.h> does not provide a way to copy the parameters as a "context" for later reuse
...

I'm looking at the assembly and I think gcc stores the size of the variable arguments list using SP and R7. Using that size, I might be able to copy it to a buffer someplace. I think.

Although, I'm not sure about the case where there are less than 4 arguments. I guess I just need to spend a bunch of time analyzing the machine code that gcc emits under each possible scenario.

below is some quick and dirty (nonfunctional) test code in godbolt that shows how the printf arguments list get passed around.

R0 is a pointer to the formatting string
R1 is the first argument
R2 is the second
R3 is the third
Then arguments 4 through 8 get put onto the stack
R7 is the stack value that the printf must set its stack to before returning (I think)
SP is the bottom/top of the arguments list.

The stack also seems to contain a copy of what the stack needs set to when the printf-like function returns (I think via R7 - correct me if I am wrong)
That value, minus the stack pointer position when the printf-like function is called should give the size of the variable argument list in bytes.

Then it is just a matter of copying it around, and calling printf with the same register conventions later.
« Last Edit: January 19, 2025, 12:22:09 am by incf »
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #47 on: January 18, 2025, 11:55:52 pm »
The OP hasn't said why that library must be used, nor what the library presumed about the environment in which it is executing.

If the library source is available, tweak that.

If not, talk to the creator and pay them to do a better job.

If not, use a more powerful processor or processors. If the processor is only just fast enough as it is, what happens when a small enhancement is requested. Or the cache misses are pessimal. Or an extra interrupt occurs.

Disassembling the source or what is on the stack seems a very brittle approach. Even if it appears to work now, changes to the library source or compiler or execution environment could subtly break things.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #48 on: January 18, 2025, 11:59:24 pm »
Although, I'm not sure about the case where there are less than 4 arguments. I guess I just need to spend a bunch of time analyzing the machine code that gcc emits under each possible scenario.

You mean this version of the compiler with specified flags. Compiler output changes regularly. Ditto the library code and whatever is used to compile the library.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16100
  • Country: fr
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #49 on: January 19, 2025, 12:01:34 am »
Disassembling the source or what is on the stack seems a very brittle approach. Even if it appears to work now, changes to the library source or compiler or execution environment could subtly break things.

Yes. As I mentioned, printf is a whole family of functions and the library may use a whole bunch of different variants for its printf-formatting needs. Catching all of them looks, as I said, like a dead-end to me, but hey. Good luck.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #50 on: January 19, 2025, 12:13:23 am »
The OP hasn't said why that library must be used, nor what the library presumed about the environment in which it is executing.

If the library source is available, tweak that.

If not, talk to the creator and pay them to do a better job.

If not, use a more powerful processor or processors. If the processor is only just fast enough as it is, what happens when a small enhancement is requested. Or the cache misses are pessimal. Or an extra interrupt occurs.

Disassembling the source or what is on the stack seems a very brittle approach. Even if it appears to work now, changes to the library source or compiler or execution environment could subtly break things.

This is running on some proprietary application specific hardware, and the third party software is very "special" (over 100kLOC, formally verified to be "correct", very big, very complicated, written by someone else, and not at all economically maintainable by us) which makes it effectively immutable. Imagine reading system calls from ROM. "Just rewrite it", "Just use a different processor", etc. is not likely to happen on this particular device. The complexity and cost would be astronomical compared to just writing 20 lines of assembly language.

I have a suspicion that the ARM cortex M0 ABI documents might codify how these variadic calling conventions work. I need to check.

If we have to lock down the toolchain version "forever" (use GCC version x.y.z) that is fine.

We are using memory protection which lets us be fairly comfortable with shenanigans, if the printf process goes wrong, the system will fail gracefully.
« Last Edit: January 19, 2025, 12:42:00 am by incf »
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #51 on: January 19, 2025, 12:52:54 am »
That's the first time I have heard something '...formally verified to be "correct"...' since the 80s. Since you mention "100kLOC" and "complicated", I presume the weasel word is "correct"; maybe it is a meaning unfamiliar to me.

What's the cost of it missing deadlines intermittently, because your testing didn't uncover the absolute worst case timing? More or less than 20 lines of code?

Probably unmaintainable by you, which leaves paying the library author to tweak their code.

Good luck.
« Last Edit: January 19, 2025, 12:57:34 am by tggzzz »
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #52 on: January 19, 2025, 01:01:22 am »
I'm looking at the assembly and I think gcc stores the size of the variable arguments list using SP and R7.
No; SP is the stack pointer, and that version of GCC for that particular code just used R7 to store the original value.

For 32-bit ARM, see Procedure Call Standard for the Arm Architecture (32 bit) aka aapcs32; you can download the PDF under Assets from github.com/ARM-software/abi-aa/releases.

Additionally, because the stuff to be printed are variadic arguments, you also need to check the appropriate C standard (or its final free preprint) for the implicit conversions done to variadic arguments.  Essentially, all signed integer types smaller than int are passed as an int, all unsigned integer types smaller than unsigned int are passed as an unsigned int, and float values are passed as a double.

As an example, consider godbolt.org/z/5eMar68G9, to see the Thumb code generated for ARMv7e-m using AAPCS32 (unknown-abi) for various variadic function calls using GCC 5.4.1 with -O2; or godbolt.org/z/MT934Eer6 for same on ARMv6-m for Cortex-M0.  No parameter count in sight.  The format string is always in R0, and 32-bit parameters are packed in order to R1, R2, R3, and stack; with 64-bit parameters packed to "even" registers and then to stack.

Again, at the source code level this is much simpler.  Not all "third party libraries" are closed-source, so without seeing "closed source" or "source not available" –– "control" can mean anything ––, I will not assume that the sources are not available.
« Last Edit: January 19, 2025, 01:04:32 am by Nominal Animal »
 
The following users thanked this post: SiliconWizard

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4072
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #53 on: January 19, 2025, 01:06:15 am »
Disassembling the source or what is on the stack seems a very brittle approach. Even if it appears to work now, changes to the library source or compiler or execution environment could subtly break things.

Yes. As I mentioned, printf is a whole family of functions and the library may use a whole bunch of different variants for its printf-formatting needs. Catching all of them looks, as I said, like a dead-end to me, but hey. Good luck.

He doesn't need to actually intercept printf.  The library allows him to register a callback which is used for logging.  The callback is required by the library  to have the same signature as printf.

The only question is how to record the va_list parameters for later processing.  The only reliable, portable way is to use the va_* macros which requires at least partial parsing of the format string to identify the number and type/size of arguments.  I think this is the best way to solve this problem. Scanning the format string should be quite fast.  You don't need to care about anything except the data size.  I strongly suggest implementing this first and seeing if it meets the performance goals. 

If that's still not fast enough, then finding a way to memcpy the entire varargs list would be faster.  There is no portable way to do that, and generally not even any reliable way to do it in a non portable way. 
 
The following users thanked this post: Siwastaja

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #54 on: January 19, 2025, 01:09:17 am »
That's the first time I have heard something 'formally verified to be "correct" ' since the 80s. I presume the weasel word is "correct"; maybe a meaning unfamiliar to me.

What's the cost of it missing deadlines intermittently, because your testing didn't uncover the absolute worst case timing? More or less than 20 lines of code?

Good luck.
We are not getting new silicon to work with, the system and the software is what it is, even if it is painfully slow. (writing this mostly to stave off future waves of duplicate suggestions)

I appreciate the concern.

I'd appreciate if future discussion was limited to the proposed assembly language solution dealing with C variadic arguments on GCC and ARM Cortex-M0. I'm fairly confident that all the other avenues have been suggested, thoroughly discussed, and weighed against all the other possibilities. Assembly language "won" as the fastest, easiest, and "best bang for buck" way of solving the problem. Even if it is a bit unpleasant. Edit: they were right

(sidebar conversion will continue no doubt, and many will try to convince me of the error of my ways, but I tired - some methods are slower than others. We need speed at any cost!)
« Last Edit: January 19, 2025, 04:49:35 am by incf »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #55 on: January 19, 2025, 01:10:33 am »
The only reliable, portable way is to use the va_* macros which requires at least partial parsing of the format string to identify the number and type/size of arguments.
My example code in reply #45 does this parsing, for all types supported by basic printf() implementations, except for positional formatting, variadic number of digits (*, .*, *.* format specifiers), and %n.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16100
  • Country: fr
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #56 on: January 19, 2025, 01:18:23 am »
Disassembling the source or what is on the stack seems a very brittle approach. Even if it appears to work now, changes to the library source or compiler or execution environment could subtly break things.

Yes. As I mentioned, printf is a whole family of functions and the library may use a whole bunch of different variants for its printf-formatting needs. Catching all of them looks, as I said, like a dead-end to me, but hey. Good luck.

He doesn't need to actually intercept printf.  The library allows him to register a callback which is used for logging.  The callback is required by the library  to have the same signature as printf.

The only question is how to record the va_list parameters for later processing.  The only reliable, portable way is to use the va_* macros which requires at least partial parsing of the format string to identify the number and type/size of arguments.  I think this is the best way to solve this problem. Scanning the format string should be quite fast.  You don't need to care about anything except the data size.  I strongly suggest implementing this first and seeing if it meets the performance goals. 

Ok, then yes. Just use the standard variadic argument access. Those macros are pretty lightweight, but yes, just like printf and any variadic function, that requires knowing the number and types of arguments in advance.
I agree that it should all be much faster than what the actual standard printf does, but you still have to scan the format to determine the arguments and their types, and then access these arguments (using the va_ macros as you said) and store them somewhere. A mockup callback just logging the format strings themselves will show all the types of arguments they actually use and you can then restrict your final function only looking for a restricted set of types.

Certainly I second using the standard variadic macros and I'm not sure how more efficient you could get using assembly directly. The only "expensive" part will be indeed to scan the format for determining each type, but if you support, say, only a few of them (maybe int, unsigned int, float and string), this should be pretty fast. You can just look for unescaped % in the format string and for each %, look for the character for one of the types you support (which may not be right after the % char), and that should only be a few. I don't see how else it could be done as you can't analyze format strings at compile time in C, but even if you could (in C 2050?), here you don't even have access to the format string literals at compile time, as all you have at your disposal is a parameter passed to a callback function.
« Last Edit: January 19, 2025, 01:20:49 am by SiliconWizard »
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #57 on: January 19, 2025, 01:20:23 am »
I'm looking at the assembly and I think gcc stores the size of the variable arguments list using SP and R7.
No; SP is the stack pointer,
...
Yes, it's the stack pointer. And I think it communicates the size of the variable size argument list in the sample assembly I posted. Correct? (In the specific example that I posted)

I'm fairly sure foo() and call_foo() uses the area between between the stack pointer and R7 as the variable parameter area.

call_foo() in particular makes it clear what is stored where in the stack.

I feel like I need to learn how to setup a QEMU emulator for ARM Cortex-M0+.

)
« Last Edit: January 19, 2025, 01:30:51 am by incf »
 

Offline ledtester

  • Super Contributor
  • ***
  • Posts: 3430
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #58 on: January 19, 2025, 01:34:49 am »
Pretty much the exact same question on Stackoverflow:

https://stackoverflow.com/questions/1562992/best-way-to-store-a-va-list-for-later-use-in-c-c

The last response suggests there is a way to determine the number of arguments passed, but it might be runtime/architecture dependent.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #59 on: January 19, 2025, 01:36:56 am »
I'm fairly sure foo() and call_foo() uses the area between between the stack pointer and R7 as the variable parameter area.
If you look at godbolt.org/z/MT934Eer6, you'll see it does not.

R7 is a register that functions must save (i.e., when they return it must have the same value it had when the function was called), but it has no other special meaning in aapcs32.  In your case, you only see it used as such because of the compiler options you are using.  If you use -Os or -O2 (as is common) and -march=armv6-m -mcpu=cortex-m0 -mthumb (as is usual for Cortex-M0 code), you'll see R7 is no longer used as such.

Of course, you could disassemble the closed-source library object code using printf() and verify, but I believe it would make more sense to do that to find out the string pointed to by R0 at each bl printf, and make sure you parse those correctly.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #60 on: January 19, 2025, 01:46:28 am »
I'm fairly sure foo() and call_foo() uses the area between between the stack pointer and R7 as the variable parameter area.
If you look at godbolt.org/z/MT934Eer6, you'll see it does not.

R7 is a register that functions must save (i.e., when they return it must have the same value it had when the function was called), but it has no other special meaning in aapcs32.  In your case, you only see it used as such because of the compiler options you are using.  If you use -Os or -O2 (as is common) and -march=armv6-m -mcpu=cortex-m0 -mthumb (as is usual for Cortex-M0 code), you'll see R7 is no longer used as such.

Of course, you could disassemble the closed-source library object code using printf() and verify, but I believe it would make more sense to do that to find out the string pointed to by R0 at each bl printf, and make sure you parse those correctly.

I initially only saw your first example for a different ISA than I was using.

If I understand correctly, the return SP address is placed on the stack when there are more than 3 arguments (and my sample code just happened to use R7 at both ends to get the desired return SP value into and out of the stack)
And I suppose SP is not advanced prior to calling if there is exactly 4 arguments.

I think the example you posted follows the same stack format as my gcc 14.2 example? And it shows the stack pointer, and the stack pointer return address stored as part of the call at least on the longer variants. I suppose when there are fewer than 3 arguments, it does not have to do anything to restore the stack pointer.



I'm fairly certain that gcc has to maintain ABI compatibility over long periods of time so that people can link against precompiled libc libraries and successfully call printf regardless of optimization level.

edit: some of my personal description of the register calling conventions might be wrong. I need to spend a day or two playing around to really see how it works.

The different optimization levels have to be doing the same thing. If it explicitly stores the stack return address with low optimization, it must be doing the same at higher optimizations, even if the compiler achieves it in a less obvious/direct way.

edit2: at higher optimization levels gcc appears to moves responsibility to setting up the stack pointer/return address "stuff" from the callee onto the caller. And often decides to "stomp" on the current stack and the fix it after the call returns
« Last Edit: January 19, 2025, 02:31:19 am by incf »
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4072
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #61 on: January 19, 2025, 02:53:07 am »
Caller vs callee saved registers are defined by the platform ABI.  It can be found in the ABI documentation and doesn't depend on optimization level.  Low optimization levels may not use all registers or may redundantly save and restore registers that aren't needed, so it may not be clear what is part of the platform specification and what is the required ABI.  On some ARM platforms the caller saved registers are effectively enforced by the architecture as they are automatically saved on interrupt.

On systems with downward growing stacks, including ARM, on entry to a function, the stack pointer points to the beginning (lowest address) of the parameter passing area.  Since most calling conventions push right-to-left that is "near" the first argument passed (or the first non-register argument).  From there it's an "easy" fixed offset to find the beginning of the variadic arguments.  Finding the end is much harder.

When using a frame pointer, the frame pointer generally points to the bottom of the previous functions stack frame.  In the simplest case this will also be the top of the parameter passing region.  However it's not that simple.  Frame pointers are optional and subject to optimization and even if used there may be other stack data dynamically added below it aside from the function parameters.  For instance parameters for other function calls, alloca() allocations, and other temporary variables spilled from the register file.

 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #62 on: January 19, 2025, 03:06:42 am »
I feel as if the existence of va_copy() would require that gcc provide enough information to make a copy of the variadic arguments.

I also see that each call to a printf always seems to allocate two extra words of "unused" stack. I need to confirm, but I believe those two words are populated later in the compilation process (at link time?) with the real stack bounds for use with gcc's builtin va_* functions.

I sort of wonder if any standards codify it. I also wonder about interoperability of va_copy with other compilers. I feel like my best bet is to understand how gcc's builtin works, and do whatever it does.

edit: fn_abi_va_list and arm_build_builtin_va_list appear to be relevant
« Last Edit: January 19, 2025, 04:24:45 am by incf »
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4072
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #63 on: January 19, 2025, 03:41:13 am »
I feel as if the existence of va_copy() would require that gcc provide enough information to make a copy of the variadic arguments.

va_copy just makes a backup copy of the pointer to the first argument, allowing you to iterate over the arguments twice.  It doesn't copy or even access the arguments themselves at all.
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #64 on: January 19, 2025, 03:55:14 am »
For example, if it might be possible to copy a chunk of the printf stack? Having to parse the format string (to get the number and type of arguments) and copy the variadic arguments one by one is a step that I would not mind pushing off to a background thread.

You shouldn't have to parse anything. It's not as if you're going to be analyzing the text used to make the printf() call:
Code: [Select]
    printf (&formatString, arg1, arg2, arg3);

Since the arguments are pushed on the stack, you only need to look at the stack to retrieve them. Nope; see below. And the format string that tells printf() how to interpret the arguments is just another argument (a pointer to a string), one that you'll be passing to the "real" printf() once you have the time to do so, so no need to parse that either.

Or you could write your own printf() entirely. I did that years back, in assembly language (8086, for PC DOS), and it wasn't at all difficult. I wrote a stripped-down version, since I only needed to handle the following formats:

o %d
o %u
o %x
o %s


and of course various literals (\n, \t, \c, etc.)

If you can't do it in C you could use the microprocessor's assembly language.

Ack; it's been quite some time since I even used my printf(), so I forgot one basic thing that makes it a lot easier:
You don't have to look at the stack to determine how many arguments there are. You only need to look at the format string to see how many format specifiers (%d, %s, etc.) there are. There will be an argument for each specifier. (Assuming, of course, that the format string and argument list are properly defined. If they're not, woe be to the programmer anyhow.)

So you do need to parse the format string, looking for percent signs, but that turns out to be fairly trivial.
I could post the (assembly language) code if you like.
« Last Edit: January 19, 2025, 04:21:11 am by Analog Kid »
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 878
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #65 on: January 19, 2025, 04:16:27 am »
Quote
I already added buffering
Buffer or no buffer, the uart hardware has the ultimate say in the matter. First step is get the uart running as fast as possible. I run any uart debug output at 1MBaud, and any modern mcu like an M0+ will have no problem doing so. Depending on the combination of mcu speed and printf implementation, you will ultimately either be limited by the printf implementation or the uart hardware. If the printf code becomes the limiting factor then a buffer is of little use, if the uart hardware is the limiting factor the a buffer could mean you simply wait 'later' but still wait.

If you have a Segger debugger, external or built-in to a dev board, there is also Segger RTT which can move data quickly. Easy enough to redirect existing printf output to segger rtt. They have blocking (drop) and non-blocking (wait) options.

I think I would first figure out how much debug data is being generated. Create code that simply replaces the printf destination and only counts bytes. With that info you will at least be able to figure out if it is even doable with whatever baud rate you are using. It could be the uart is able to output the data at that rate but maybe the uart hardware buffer is shallow and you end up in a rx tx buffer empty interrupt quite often. If you have DMA, then that would be an option to eliminate a high occurrence of uart interrupts.
« Last Edit: January 19, 2025, 07:40:30 am by cv007 »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #66 on: January 19, 2025, 04:36:31 am »
If I understand correctly, the return SP address is placed on the stack
No.  The aapcs32 ABI for Cortex-m0 says that return address is in LR, R0-R3 contain parameters, and any additional parameters are on stack.  It is the caller that needs to reset the stack pointer after a call with additional parameters passed on stack, because the callee, the function being called, must preserve R4-R11 and SP (=R13).  There is no way to find out in the callee how many parameters were actually supplied, unless the caller tells it in an always-passed parameter (like the format string).

In your screenshots, the compiler options are such that it just happened to use R7 to store the old value of SP, so that [SP, R7) contained additional parameters.  The compiler is absolutely free to save the old value of SP whatever way it wishes.  Indeed, in the linked example, the compiler often simply adds to SP the amount of stack space used (less LR), then pops the saved LR value directly to PC.  (This is because bx lr is equivalent to push {lr}, pop {pc}.)

Put simply, when printf() gets called using aapcs32 ABI on Cortex-M0, all it knows is that R0 contains the address to the format string, and R1-R3 and possibly stack contain the parameters referred to.  Because the parameters are all of basic types (signed and unsigned char, short, int, long, long long, float, double) or types mapping to basic types (size_t, possibly ssize_t, intmax_t, uintmax_t, and ptrdiff_t), in this ABI they occupy either one or two 32-bit words.  The format string is the only one that can tell you how many and what the supplied parameters are.

Essentially, there is no way in aapcs32 ABI on Cortex-M0 (or in most other ABIs) for the func() function implementation to be able to differentiate between
    func(5);
    func(5, 4);
    func(5, 4, 3);
    func(5, 4, 3, 2);
    func(5, 4, 3, 2, 1);
    func(5, 4, 3, 2, 1, 0);
calls.  See godbolt.org/z/efTc8sKb7 for proof.

I feel as if the existence of va_copy() would require that gcc provide enough information to make a copy of the variadic arguments.
No.

In essence, va_list could simply be a signed integer, with negative values describing registers, first parameter corresponding to the most negative value, and zero and positive values to stack offsets.  Then, va_start() initializes it to the value corresponding to the first variadic parameter, va_end() does nothing, va_arg() returns the one or two-word value at the register or stack offset and advances it accordingly, and va_copy() copies the current signed integer to another va_list variable.

Most architectures do it somewhat like this, except that certain types of values may be stored in a separate register file.  For example, on SYSV ABI on x86-64, xmm0 to xmm7 registers are used to store the first eight double parameters.  Thus, va_list may be a pair of indices, or even a bitmap.  As the va_list type variable is passed as the first argument to the va_ functions, and C passes simple types by value, the exact implementation (in the C library stdarg.h) can be quite funky; for GCC, these are implemented as compiler built-ins (__builtin_va_start(list,param), __builtin_va_arg(list,type), __builtin_va_copy(listcopy,list), and __builtin_va_end(list)).

For example, GCC 9.2.1 for aapcs32 and Cortex-M0, tends to push {r0, r1, r2, r3} at the beginning of variadic function implementation, so that all the parameters are actually on stack in order: r0 at SP, then r1, r2, r3, followed by any parameters pushed to stack by the caller.  You can see this clearly in godbolt.org/z/efTc8sKb7.  Note that the reason r4 is sometimes pushed to stack even when it is not used, only to keep the stack double-word aligned; the number of registers pushed is always even.  (Clang tends to use r7, so it is not necessarily r4, could be r4-r8 or r10 just as well.)  Also note that GCC generates non-optimal code here; it does not need to preserve r0-r3, but sometimes does, using unnecessary amount of stack.  Function bodies in aapcs32 must preserve r4-r11 and SP (r9 might be special).
In any case, the func() implementation will not receive any information it could use to determine how many parameters were supplied, as you can see.  That must be provided by the fixed parameter(s), which in the case of printf() is the format string.

You can see the aapcs32 core register use here.
« Last Edit: January 19, 2025, 04:39:39 am by Nominal Animal »
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #67 on: January 19, 2025, 05:08:34 am »
You all are right.

I genuinely have no choice but to spend hundreds of cycles scanning and the format string so that the stack pointer can get set back to the correct position when the function returns (at typical optimization levels)

I am honestly a little bit relieved, and a bit horrified at the same time. The varadic ABI is not what I expected (at high optimization levels). While now I know exactly what hypothetical* things I would have to do to make stack copying possible, the changes would be far more invasive than cobbling together several layers of standards compliant C.

*-finstrument-functions and -fprofile-filter-files=printf_wrapper.c
« Last Edit: January 19, 2025, 11:51:48 am by incf »
 

Offline tellurium

  • Frequent Contributor
  • **
  • Posts: 303
  • Country: ua
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #68 on: January 19, 2025, 09:22:16 am »
What's the main source of slowness though? Is it formatting, or output?
Some answers address the former, some answers address the latter (e.g. _write override). OP, do you have numbers for both?
Open source embedded network library https://github.com/cesanta/mongoose
TCP/IP stack + TLS1.3 + HTTP/WebSocket/MQTT in a single file
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16100
  • Country: fr
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #69 on: January 19, 2025, 09:23:43 am »
But as I said, I think you can implement a pretty efficient scanning of the format strings as all you need is to identify the type of arguments. You could make the scanning much faster if doing it by 32-bit chunks instead of byte by byte, but be aware that the Cortex M0 doesn't support unaligned access, so if the format string isn't 4-byte aligned, you'd have to scan up to 3 bytes first and then 32-bit per 32-bit. I think that's probably worth a shot.
 

Offline magic

  • Super Contributor
  • ***
  • Posts: 7538
  • Country: pl
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #70 on: January 19, 2025, 09:28:22 am »
Certainly I second using the standard variadic macros and I'm not sure how more efficient you could get using assembly directly. The only "expensive" part will be indeed to scan the format for determining each type, but if you support, say, only a few of them (maybe int, unsigned int, float and string), this should be pretty fast. You can just look for unescaped % in the format string and for each %, look for the character for one of the types you support (which may not be right after the % char), and that should only be a few. I don't see how else it could be done as you can't analyze format strings at compile time in C, but even if you could (in C 2050?), here you don't even have access to the format string literals at compile time, as all you have at your disposal is a parameter passed to a callback function.

This can be done ahead of time, if you extract format strings from the binary blob and generate a C function for each of them.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #71 on: January 19, 2025, 12:49:52 pm »
If I had the freedom to change the third party library, now that I've learned what I currently know (the variadic ABI does not pass a return stack address, nor any size or argument count metadata, etc. etc.) the optimal way to solve this would have been:
  • write a program to scan for each type of printf like function call (there are a handful of different flavors) and swap it with a fixed function call. For example my_printf("foo %d bar %x baz %s, 1, 2, "qux") would become my_printf_dxs("foo %d bar %x baz %s, 1, 2, "qux")
  • (optional) write a program to do the reverse to enable one to verify that the modified software is identical to what you started with (you don't necessarily know it worked properly each time on the >100kLOC library unless you can perform a diff against the original source, not to mention that each new release of the library will have to be modified in the same identical way to avoid manually merging/maintaining and inevitably making mistakes)
  • (I assume it could be made to work around some calls being several later macros versus others being straight function calls by storing the original line in a comment or something, lots of multi-line statements)
  • manually implement a bunch of wrappers for my_printf_xxx to store arguments lists in buffers. Which would then be processed later

Now, I'm unlikely to do that since it seems too large to do correctly. A string scanner will likely be adequate.

Side note for future reference: Although, it does occasionally emit about 10kb of text in less than 100ms at one less-critical point. And that would mean probably at least ~50000 extra cycles (5 clocks per string byte to copy and go to jump table) at approx ~10MHz. Not great.
« Last Edit: January 19, 2025, 01:10:32 pm by incf »
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 878
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #72 on: January 19, 2025, 01:35:05 pm »
Quote
occasionally emit about 10kb of text in less than 100ms at one less-critical point
That either means 1Mbaud or 100kbaud depending on what your b in 10kb means.

What is your uart speed, what is your mcu speed, what speed  is your mcu capable of, how is your uart moving the data- dma or interrupt, do you have a Segger debugger in use where Segger RTT is available for use.

Speed is going to be the best solution.
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9528
  • Country: fi
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #73 on: January 19, 2025, 01:47:40 pm »
Maybe something like this:

* Have a buffer in RAM,
* Instead of copying characters from format string, store a pointer to the format string (because it's very likely at statically allocated .data/.rodata address)
* Parse the string enough to determine sizes (as in Nominal's example code); copy the data as is to the buffer
* When you have time, process the buffer by fetching the format string and running fully implemented printf - you can run again your own code which processes the format string to know how much size each argument occupies.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #74 on: January 19, 2025, 02:10:16 pm »
Maybe something like this:

* Have a buffer in RAM,
* Instead of copying characters from format string, store a pointer to the format string (because it's very likely at statically allocated .data/.rodata address)
* Parse the string enough to determine sizes (as in Nominal's example code); copy the data as is to the buffer
* When you have time, process the buffer by fetching the format string and running fully implemented printf - you can run again your own code which processes the format string to know how much size each argument occupies.

At runtime determine the address of a format string.
If it is the first time that address has been seen, parse it and store the definition somewhere in RAM.
If the address has been seen before, use the previously parsed definition.

Of course you have to hope:
  • format strings aren't dynamically created
  • you see all the format strings "early" in the execution run
  • no previously unseen format strings have to be parsed during a panic with short latency requirements
  • you don't care about the jitter introduced by parsing and lookup

Your system isn't hard realtime, and does have sufficient "excess" processing power. Doesn't it?

So, what's the application domain? I hope it is neither military nor healthcare.

My bet is the FinTech industry. They do weird things[1] with strange constraints, and aren't known for being averse to getting things wrong occasionally. I've seen, ahem, less than stellar hardware and software deployed, which matches the "defined to be correct" and "can't modify" constraints .

[1] e.g.
  • buy up microwave telecom towers/masts between Chicago and New York, because the speed of light in air is faster than in glass
  • implement business rules in hardware, i.e. FPGAs
  • lay dedicated transatlantic fibre cables, to ensure latency and bandwidth isn't affected by third parties
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #75 on: January 19, 2025, 02:13:18 pm »
Quote
occasionally emit about 10kb of text in less than 100ms at one less-critical point
That either means 1Mbaud or 100kbaud depending on what your b in 10kb means.

What is your uart speed, what is your mcu speed, what speed  is your mcu capable of, how is your uart moving the data- dma or interrupt, do you have a Segger debugger in use where Segger RTT is available for use.

Speed is going to be the best solution.
We already have a ~10kB thread local buffer (that is larger than the max output) which is copied to the UART thread's ~10kB buffer when real-time stuff is not occurring, IO speed is not an issue.
« Last Edit: January 19, 2025, 04:18:35 pm by incf »
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #76 on: January 19, 2025, 02:39:31 pm »
Quote
Your system isn't hard realtime, and does have sufficient "excess" processing power. Doesn't it?

So, what's the application domain? I hope it is neither military nor healthcare.

Nothing special. Mostly proprietary wireless communication with a small amount of measurement, system control, etc. on networks of battery powered devices that talk to each other. A lot of logging to flash. Slow low power Cortex-M0 type processor. Realtime scheduling of network communications plus a bit of external device control, and crunching of data. (it is not capable of dealing with missed deadlines that occur due events like printfs that occur after it has decided that it is going to do something - unfortunately, it absolutely has to decide and commit to the network that it is going to do something before it actually does anything loggable, which means everything important must occur after it has irreversibly committed itself to meeting a particular deadline)

The device has move a chunk of data through the wireless network at a precisely scheduled time. And that process coexists (at a higher priority than everything else) with a pile of other software and state machines that influence the behavior of the networking process.

It is asleep most of the time. If it did not printf it would work fine, but it has to printf. Interactions between different components in the system/network are complex enough that thorough logging inside the device simply is a requirement. And changing third party software is difficult for technical and nontechnical reasons - and simply won't happen unless all other avenues have been exhausted.
« Last Edit: January 19, 2025, 03:12:41 pm by incf »
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #77 on: January 19, 2025, 03:17:35 pm »
What's the main source of slowness though? Is it formatting, or output?
Some answers address the former, some answers address the latter (e.g. _write override). OP, do you have numbers for both?
It's all formatting and a tiny bit of parsing/copying that occur as part of snprintf to a local exclusive buffer, The IO latency portion has been solved via lots of buffering.
« Last Edit: January 19, 2025, 03:21:28 pm by incf »
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #78 on: January 19, 2025, 03:20:21 pm »
But as I said, I think you can implement a pretty efficient scanning of the format strings as all you need is to identify the type of arguments. You could make the scanning much faster if doing it by 32-bit chunks instead of byte by byte, but be aware that the Cortex M0 doesn't support unaligned access, so if the format string isn't 4-byte aligned, you'd have to scan up to 3 bytes first and then 32-bit per 32-bit. I think that's probably worth a shot.

I agree, it is worth a shot.

(I think most of the suggestions about parsing, buffering, va_list/va_arg, etc. are correct - I appreciate the sample implementations)
« Last Edit: January 19, 2025, 03:21:53 pm by incf »
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #79 on: January 19, 2025, 03:43:04 pm »
Quote
Your system isn't hard realtime, and does have sufficient "excess" processing power. Doesn't it?

So, what's the application domain? I hope it is neither military nor healthcare.

Nothing special. Mostly proprietary wireless communication with a small amount of measurement, system control, etc. on networks of battery powered devices that talk to each other. A lot of logging to flash. Slow low power Cortex-M0 type processor. Realtime scheduling of network communications plus a bit of external device control, and crunching of data. (it is not capable of dealing with missed deadlines that occur due events like printfs that occur after it has decided that it is going to do something - unfortunately, it absolutely has to decide and commit to the network that it is going to do something before it actually does anything loggable, which means everything important must occur after it has irreversibly committed itself to meeting a particular deadline)

The device has move a chunk of data through the wireless network at a precisely scheduled time. And that process coexists (at a higher priority than everything else) with a pile of other software and state machines that influence the behavior of the networking process.

When I've designed such systems, any one of those constraints would be sufficient for me to completely avoid printf! It isn't difficult; just use puts() and putchar() of hex numbers. Let the log consumer format them as decimal digits or pixels on a graph or whatever.

Shame you can't use a cluebat on the library creator.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 878
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #80 on: January 19, 2025, 04:52:38 pm »
I guess you already said in the first post its the printf code that is the problem. I would also assume you are running your M0+ as fast as possible, at least when it is needed.

I didn't see mentioned anywhere which printf library is in use, but if CM0+ maybe its newlib. If the itoa in the printf engine is easily replaced, maybe the number conversion code can be changed to simply always output hex (can add a leading 0x). It would shorten the conversion time as no divide library call would be used (CM0+), and the strings require little processing so can remain as-is.


An example I showed from previous thread, modified to allow for hex only formatting of numbers (and using x86 so online output can be used)-
https://godbolt.org/z/5vrEov3Yq
(end of main does nonumformat example, end of output shows result)

This example simply shows allowing the normal number conversion (in a print function) to bypass the normal number conversion (dec/bin/hex which will use divide) where it only uses bit shift code and a table lookup for the char. In this case the full 32bit hex value is always output and a +/- is added to signify if its signed/unsigned as the function takes in unsigned and uses a flag for negative values. Of course with your own code its easy enough to change as you see fit, where an existing printf library may not have much for available hooks into the process.

For a CM0+, eliminating the divide for each number conversion could be somewhat significant although testing would be needed to get some real values.
« Last Edit: January 19, 2025, 05:35:59 pm by cv007 »
 

Offline rhodges

  • Frequent Contributor
  • **
  • Posts: 358
  • Country: us
  • Available for embedded projects.
    • My public libraries, code samples, and projects for STM8.
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #81 on: January 19, 2025, 10:53:51 pm »
I don't know if this is helpful, but here is my STM32 M0 and M3 binary-decimal library. Just cut out the M3 code. The M0 code does not call divide.
Currently developing embedded RISC-V. Recently STM32 and STM8. All are excellent choices. Past includes 6809, Z80, 8086, PIC, MIPS, PNX1302, and some 8748 and 6805. Check out my public code on github. https://github.com/unfrozen
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #82 on: January 20, 2025, 01:20:56 am »
Just out of interest, does the output include formatting floating-point numbers?  The standard libraries (newlib and others) are extremely slow in formatting and parsing them.

For Cortex-M0 with a "slow" multiplication, even integer formatting is slow in the standard libraries.  (If you have a fast 32×32 multiply that returns the high 32 bits of the 64-bit result, you can divide by ten fast by multiplying by 3435973837 = 0xCCCCCCCD, and shifting the upper 32 bits right by 3 bits, to get the quotient in the high 32 bits.) Assuming ILP32 (aapcs32 on Cortex-M0), with 32-bit int and long, with long-long being 64-bit, repeated subtraction using decimal digits 1 and 3 can be much faster.  Here is an example implementation:
Code: [Select]
// SPDX-License-Identifier: CC0-1.0 (Public Domain)
// Author: Nominal Animal, 2025.
#include <stddef.h> // for NULL
#include <stdint.h>

static const uint32_t  decades_32bit[10][2] = {
    { UINT32_C(1),          UINT32_C(3) },
    { UINT32_C(10),         UINT32_C(30) },
    { UINT32_C(100),        UINT32_C(300) },
    { UINT32_C(1000),       UINT32_C(3000) },
    { UINT32_C(10000),      UINT32_C(30000) },
    { UINT32_C(100000),     UINT32_C(300000) },
    { UINT32_C(1000000),    UINT32_C(3000000) },
    { UINT32_C(10000000),   UINT32_C(30000000) },
    { UINT32_C(100000000),  UINT32_C(300000000) },
    { UINT32_C(1000000000), UINT32_C(3000000000) },
};

static const uint64_t  decades_64bit[11][2] = {
    { UINT64_C(1000000000),           UINT64_C(3000000000) },
    { UINT64_C(10000000000),          UINT64_C(30000000000) },
    { UINT64_C(100000000000),         UINT64_C(300000000000) },
    { UINT64_C(1000000000000),        UINT64_C(3000000000000) },
    { UINT64_C(10000000000000),       UINT64_C(30000000000000) },
    { UINT64_C(100000000000000),      UINT64_C(300000000000000) },
    { UINT64_C(1000000000000000),     UINT64_C(3000000000000000) },
    { UINT64_C(10000000000000000),    UINT64_C(30000000000000000) },
    { UINT64_C(100000000000000000),   UINT64_C(300000000000000000) },
    { UINT64_C(1000000000000000000),  UINT64_C(3000000000000000000) },
    { UINT64_C(10000000000000000000), UINT64_C(                   0) },
};

// Internal 32-bit unsigned integer conversion routine.
static char *do_append_u32(char *buf, char *const end, uint32_t val) {

    // Count the number of decimal digits.
    int_fast8_t  n = 0;
    while (val >= decades_32bit[n+1][0])
        if (++n >= 9)
            break;

    // Verify sufficient room in buffer.
    if (buf + n > end)
        return NULL;

    // Convert to decimal digits via repeated subtraction.
    do {
        char  digit = '0';

        while (val >= decades_32bit[n][1]) {
            val    -= decades_32bit[n][1];
            digit  += 3;
        }
        while (val >= decades_32bit[n][0]) {
            val    -= decades_32bit[n][0];
            digit  += 1;
        }

        *(buf++) = digit;
    } while (n-->0);

    *buf = '\0';
    return buf;
}

// Internal 64-bit unsigned integer conversion routine.
static char *do_append_u64(char *buf, char *const end, uint64_t val) {

    // If fits in 32 bits, treat as 32-bit.
    if ((uint64_t)(uint32_t)(val) == val)
        return do_append_u32(buf, end, (uint32_t)val);

    // Above test ensures val >= decades_64bit[0][0].
    int_fast8_t  n = 0;
    while (val >= decades_64bit[n+1][0])
        if (++n >= 10)
            break;

    // Verify sufficient room in buffer.
    if (buf + n + 9 > end)
        return NULL;

    // The first decimal digit of 2^64-1 is 1, so we need to treat it specially.
    if (n == 10) {
        char  digit = '0';

        while (val >= decades_64bit[10][0]) {
            val    -= decades_64bit[10][0];
            digit++;
        }

        *(buf++) = digit;
        n--;
    }
    do {
        char  digit = '0';

        while (val >= decades_64bit[n][1]) {
            val    -= decades_64bit[n][1];
            digit  += 3;
        }
        while (val >= decades_64bit[n][0]) {
            val    -= decades_64bit[n][0];
            digit  += 1;
        }

        *(buf++) = digit;
    } while (n-->0);

    // Add the nine 32-bit digits
    uint32_t  v32 = (uint32_t)val;
    n = 8;
    do {
        char  digit = '0';

        while (v32 >= decades_32bit[n][1]) {
            v32    -= decades_32bit[n][1];
            digit  += 3;
        }
        while (v32 >= decades_32bit[n][0]) {
            v32    -= decades_32bit[n][0];
            digit  += 1;
        }

        *(buf++) = digit;
    } while (n-->0);

    *buf = '\0';
    return buf;
}

// Convert an unsigned 32-bit integer (%u) to decimal string,
// and store to buf.  Will not write past end (but may write nul to *end).
// Returns a pointer to the string-terminating nul byte.
char *append_u32(char *buf, char *const end, uint32_t val) {

    // Abort if no buffer, or if buffer full.
    if (!buf || buf >= end)
        return NULL;

    return do_append_u32(buf, end, val);
}

// Convert a signed 32-bit integer (%d) to decimal string,
// and store to buf.  Will not write past end (but may write nul to *end).
// Returns a pointer to the string-terminating nul byte.
char *append_i32(char *buf, char *const end, int32_t val) {

    if (val < 0) {
        // Abort if no buffer, or if buffer full.
        if (!buf || buf + 1 >= end)
            return NULL;

        // Prepend negative sign, negate, and treat as unsigned.
        *buf = '-';
        return do_append_u32(buf + 1, end, (uint32_t)(-val));
    } else {
        if (!buf || buf >= end)
            return NULL;

        // Nonnegative, so treat as unsigned.
        return do_append_u32(buf, end, val);
    }
}

// Convert an unsigned 64-bit integer (%llu) to decimal string,
// and store to buf.  Will not write past end, but may write nul to *end.
// Returns a pointer to the string-terminating nul byte.
char *append_u64(char *buf, char *const end, uint64_t val) {

    // Abort if no buffer, or if buffer full.
    if (!buf || buf >= end)
        return NULL;

    return do_append_u64(buf, end, val);
}

// Convert a signed 64-bit integer (%lld) to decimal string,
// and store to buf.  Will not write past end, but may write nul to *end.
// Returns a pointer to the string-terminating nul byte.
char *append_i64(char *buf, char *const end, int64_t val) {

    if (val < 0) {
        // Abort if no buffer, or if buffer full.
        if (!buf || buf + 1 >= end)
            return NULL;

        // Prepend negative sign, negate, and treat as unsigned.
        *buf = '-';
        return do_append_u64(buf + 1, end, (uint64_t)(-val));

    } else {
        // Abort if no buffer, or if buffer full.
        if (!buf || buf >= end)
            return NULL;

        // Nonnegative, so treat as unsigned.
        return do_append_u64(buf, end, val);
    }
}
The idea in the end = append_type(dest, last, value); interface is to efficiently append the decimal value to the buffer.  When the value does not fit, it will return NULL (and append_type(NULL,...) is safe and will also return NULL).  You can always call the function with dest pointing to the next free character in your output buffer, and last pointing to the last character in that buffer, and if the function returns non-NULL, it points to the next dest, otherwise it did not fit.

(That is, you can at any point safely try to append a new substring to your buffer.  If it returns NULL, it didn't modify more than the start character (which should be the first free character in the buffer anyway), and that only when the value is negative; you can easily move the sign to after the digits have been filled, to ensure no modification is done.  You can also remove all the 64-bit stuff, if you don't use long long, uint64_t, int64_t, uintmax_t, or intmax_t types.)

Each digit requires 2.1 iterations on average (0 1 2 1 2 3 2 3 4 3, to be exact, per possible decimal digit), with each iteration consisting of one subtraction and one addition.  There are no multiplications or divisions at all (except for calculating the look-up array addresses, which are bit shifts), so this approach is suitable for slow-multiplication architectures in general, including 8-bitters (although they can benefit from adding 8- and 16-bit converters also).

This is not magic, though.  On x86-64 with fast multiplication, even the standard snprintf() is about 2.4× faster in converting 64-bit unsigned numbers (because it has a fast hardware 64×64=128-bit integer multiplication, so that (x/10) is implemented as (x*0xCCCCCCCCCCCCCCCD)>>67).  On the other hand, for 32-bit unsigned integers (append_u32() and append_i32() versus snprintf("%u") and snprintf("%d")), standard snprintf() is about 2.6× slower only about 1.5× faster (edited due to my laptop frequency scaling skewed the original results).  (These are on a microbenchmark covering all 32-bit unsigned integers uniformly randomly; if you typically print a smaller range of values, expect different results.)
« Last Edit: January 20, 2025, 02:48:48 am by Nominal Animal »
 
The following users thanked this post: SiliconWizard, incf

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #83 on: January 20, 2025, 02:04:41 am »
I believe it mostly fakes floating point by using split fixed integer representation. One integer for the whole number, and another for the decimal places.

While the library authors were ill-advised on the use of printf, they appear to have had enough sense to stay away from floating point in the places that I have looked.

We have fast truncated multiply (32 bit results), but my gut feeling is that printf may not actually use it based off of the performance that I'm seeing (have not measured, nor disassembled libc).

Multiplication as division is an interesting trick. I could easily see myself getting really into optimizing those routines. Although, I'm pretty sure that even the fastest formatting routine won't beat buffering the printf arguments in terms of speed. I am not looking forward to writing the FIFO for lists of variable sized strings and print arguments, but I think I am obligated to take that path since it is likely much faster. (also, it will finally allow me to do stuff like append an accurate timestamp to each printf, which will be useful)

I imagine on cortex M0, that a fairly standard print statement with about 100 characters of text and 9 integers (7 decimal and 2 hex, lots of 6 to 9 digit numbers), that it probably takes thousands of instruction cycles to run the conversions.

...
Each digit requires 2.1 iterations on average (0 1 2 1 2 3 2 3 4 3, to be exact, per possible decimal digit), with each iteration consisting of one subtraction and one addition.  There are no multiplications or divisions at all (except for calculating the look-up array addresses, which are bit shifts), so this approach is suitable for slow-multiplication architectures in general, including 8-bitters (although they can benefit from adding 8- and 16-bit converters also).
...

Whoa... that is faster than I expected.
« Last Edit: January 20, 2025, 11:40:30 am by incf »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #84 on: January 20, 2025, 02:46:13 am »
I do have one Cortex-M0+ (Teensy LC, NXP Kinetis KL26 family, NXP MKL26Z64) I believe I could test the performance on, but it'll likely be somewhat fewer cycles than Cortex-M0 (as M0+ has a two-stage pipeline, M0 a three-stage one).  The newlib/newlib-nano (for comparison to snprintf()) I have for it is version 2.4.0, released in March 2016, so I'm unsure of how useful those results would be...

Formatting floats as two integers, one on each side of the decimal point, is a good approach.  One truncates the value to get the integer part, then subtracts it from the original value to get the fractional part, multiplies by the power of ten matching the desired number of decimals (1..9 – these are all exact for even 32-bit floats), takes the absolute value, and finally rounds to an integer.  If the result is or exceeds the power of ten, you subtract the power of ten from the result, and increment the magnitude of the integer part by one.  Simples!  Of course, in your case, storing the original double or float would be even faster.
« Last Edit: January 20, 2025, 02:52:59 am by Nominal Animal »
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4072
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #85 on: January 20, 2025, 04:22:01 am »
If there are no float/double types and also no long long / (u)int64_t or other large types then the format string parsing becomes almost trivial: all arguments are exactly 32 bit, so you just need to count the number of them and extract that many 32 bit values with va_arg and store them in a buffer. 
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 878
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #86 on: January 20, 2025, 04:54:25 am »
Quote
One truncates the value to get the integer part...
This assumes you are starting with a float/double to begin with, which is probably already being avoided because its a CM0+.

Which means there could also be division in the setup to printf calls, and not much one can do with them other than hope they are not frequently used-
printf( "someval: %d.%u\n", someval/100, __builtin_abs(someval%100) ); //someval is x100

I would get a list of the formatting strings inside the library and see how many formatting options you will have to deal with.
« Last Edit: January 20, 2025, 08:43:14 am by cv007 »
 

Offline 5U4GB

  • Frequent Contributor
  • **
  • Posts: 678
  • Country: au
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #87 on: January 20, 2025, 06:26:12 am »
Unfortunately, It turns out that printf is so slow that it causes the system to fail in a variety of subtle yet completely catastrophic ways. (due to it missing real time deadlines and not being able to detect that it has missed them)

Has anyone dealt with this before?=

Been there, done that.  The solution I used was to write an emulation of printf that only processed the small number of formatting strings the software needed in as fast/minimal a way as possible (use 'strings | grep %' or similar on the binary to find them).  It couldn't do anything other than the exact formats the calling code used, but it could do each of those damn fast.

If there's only a small number of formatting strings you could go even further and memcpy over pre-generated output text with spaces for %d's and whatever, then drop the numeric values into the appropriate locations.  You can also skip non-essential output text, so just turn the printf into a nop if it's producing too much unnecessary output.
« Last Edit: January 20, 2025, 06:40:02 am by 5U4GB »
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #88 on: January 20, 2025, 06:37:51 am »
Which is exactly what I advised several posts above yours.
 

Offline 5U4GB

  • Frequent Contributor
  • **
  • Posts: 678
  • Country: au
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #89 on: January 20, 2025, 07:12:23 am »
That's the first time I have heard something '...formally verified to be "correct"...' since the 80s. Since you mention "100kLOC" and "complicated", I presume the weasel word is "correct"; maybe it is a meaning unfamiliar to me.

It could be something that has to meet functional safety requirements and where the code is auto-generated by some tool that ensures this.  For example there's a form of self-checking code used in some European rail control systems whose name I'm blanking on that I doubt could be written directly by a human, and for which the auto-generation would lead to a huge size expansion, so the 100kLOC could have started as a few kLOC.

OTOH I can't imagine they'd run the result on a Cortex M0.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #90 on: January 20, 2025, 10:03:51 pm »
We have fast truncated multiply (32 bit results)
For 32-bit values 0 .. 81919, inclusive, one can divide v by ten and get the remainder, next decimal digit, by multiplying by 0xCCCD and then shifting right by 19 bits.  This is equivalent to multiplying by 52429/524288 = 0.1000003814697265625.  One can extend the range to 0 .. 262148, inclusive, by multiplying v by 13107, adding one quarter v to the product, and shifting the result right by 17 bits.  Whether these are faster than the repeated subtractions –– noting that each iteration includes a load from Flash, comparison, conditionally a subtraction and an addition, and a jump ––, is an open question.  You can see the Cortex-M0 generated code for these at godbolt.org/z/3osnbxdd7.

Also note that the remainder-after-divide by ten approach calculates the digits from right to left, whereas repeated subtraction from left to right.

Multiplication as division is an interesting trick. I could easily see myself getting really into optimizing those routines.
The math behind this:
$$\left\lfloor \frac{x}{d} \right\rfloor = \left\lfloor \frac{x \cdot m}{2^n} \right\rfloor \quad \text{or} \quad \left\lfloor \frac{p(x)}{2^n} \right\rfloor$$
where \$p(x)\$ is a simple function of \$x\$ containing only multiplications, additions, and integer divisions by a power of two (right shifts), and \$\lfloor \dots \rfloor\$ is the truncation operation, rounding towards zero.  If
$$\frac{m}{2^n} = \frac{1}{d}$$
then this is exact, of course.  Typically you'll want to use \$m \gt 2^n / d\$.  For some ranges of \$x\$ (\$0 \dots N\$), \$p(x) = x \cdot m + C\$ (with \$0 \lt C \lt m\$) suffices.

Division by ten is annoying because it cannot be represented in binary exactly: 0.00011001100... in binary, i.e.
$$\frac{1}{10} = \frac{1}{2 \cdot 5} = \frac{1}{16}+\frac{1}{32}+\frac{1}{256}+\frac{1}{512}+\dots = \sum_{n=0}^\infty \left( \frac{3}{2^{4n + 5}} \right)$$
Fortunately, we don't have to get it exact, only so that it truncates to the correct integer result.
« Last Edit: January 20, 2025, 10:10:29 pm by Nominal Animal »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16100
  • Country: fr
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #91 on: January 20, 2025, 10:14:28 pm »
Yep.

On targets that have a slow hardware multiplication (or none at all, so even slower), the approach of iterating over digits as you showed earlier (and as I remember we discussed in an older thread) with precomputed powers of ten is often much faster. I've used that on 8-bit MCUs and that sped up conversion by a factor of at least 10.
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4503
  • Country: gb
  • Doing electronics since the 1960s...
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #92 on: January 20, 2025, 11:18:44 pm »
To go back a few steps, in coding since before 1980 I have never had a real use for a complete printf. Sure; one uses it because it is convenient, but if there was ever a performance issue, the full range of templates was never ever needed. In practically all embedded applications one is outputting only one number format e.g. 123.45, or only integers, etc. and this can be done far more simply.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #93 on: January 20, 2025, 11:31:51 pm »
To go back a few steps, in coding since before 1980 I have never had a real use for a complete printf. Sure; one uses it because it is convenient, but if there was ever a performance issue, the full range of templates was never ever needed. In practically all embedded applications one is outputting only one number format e.g. 123.45, or only integers, etc. and this can be done far more simply.

Well, there's code size due to handling all those different formats (which isn't an issue here), and then there's execution speed. My guess©® is that the conversion code for the more commonly-used formats (integers, maybe even floats) might be pretty good for the stock library printf().

But I don't know for sure.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16100
  • Country: fr
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #94 on: January 20, 2025, 11:37:27 pm »
We use it because it's convenient, but it's not efficient for sure. The fact that printf is based on analyzing format and parameters at run time kind of bites. The only objective benefit of doing this at run time would be to use formats defined at run time, and while that's something you're allowed to do, it's usually not recommended - pretty dangerous. So with fixed (at compile time) formats, this is merely because of language limitations.

Variadic functions in C are themselves more a run time "trick" than a consistent language feature.
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4503
  • Country: gb
  • Doing electronics since the 1960s...
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #95 on: January 20, 2025, 11:50:14 pm »
The way in which avoiding printf speeds things up is not by taking printf code and stripping it down. Indeed; the execution path of say a %u is fairly short. Most of the time will be spent in the eventual call to itoa or ltoa etc.

The thing which saves exec time is knowing the range of values you will be outputting and writing code to directly implement that. Let's say you know the value of something is from 0.0 to 1000.0 and you are holding it in a 16 bit uint variable, where 1 bit is 0.1 (so the 16 bit value actually holds 0 to 10000). Then you hard-code it so you output it to a 6 byte array (plus a byte for any training 0x00, etc) and then when done you copy the value in element 5 to element 6 and replace element 5 with a '.'. You get the idea...

Assembler programmers always did this kind of thing naturally :)

Many years ago I was outputting HPGL values and managed to speed up the code by about 100x by rewriting the crappy IAR C in assembler, although to be fair one could have got at least a 10x speedup when doing it all in C. This was a 4MHz Z80. In some cases, notably sscanf, one could get a 1000x speedup by this sort of hard-coding.

Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #96 on: January 21, 2025, 01:15:54 am »
Yup, fixed-point arithmetic is often more efficient than floating-point arithmetic, when you don't have hardware floating-point support.

The fractional part can be decimal or binary, too.  That is, in general terms fixed-point arithmetic means represent a real value v using integer q via v=q/R (or equivalently q=v*R), where R is a fixed positive integer value: the base or radix.

When R=10k, the "division" is avoided by simply converting integer q to decimal, and inserting the decimal point before the k least significant decimal digits.  (Wikipedia says this is called decimal scaling.)

When R=2k, the division is a right-shift.  Sign is handled separately, so we only consider positive integers and zero.  (Wikipedia says this is called binary scaling.)  The integer part is obviously q>>k.  The fractional decimal digits we do one by one, starting with just f containing the k least significant bits of q, i.e. M=(1<<k)-1, and f=q&M, followed by iterating
    t = f * 10
    d = t >> k
    f = t & M
once per digit, starting at the digit following the decimal point, where t is a temporary variable that must be able to hold the product, and d is the next digit, between 0 and 9, inclusive.  This can be repeated for as many digits as is desired, or you can stop when f becomes zero –– it will, at some point.
To apply rounding, you compare f to R/2.  If equal, you have a tie, and you need to decide how to break it.  (The common ones are upwards, and towards even.)  If larger, you increment the previous digit, rippling the overflow left, noting that this can lead to incrementing the integer part by 1 (as the fractional part rounds to 0.000...).  If smaller, the result is correctly rounded already.  Thus, it often makes sense to convert the fractional part first, from the decimal point rightwards, and only then the integral part, from the decimal point leftwards.

On an architecture where you have only a truncating 32×32=32-bit multiplication like Cortex-M0, you can do up to 28 fractional bits, while only requiring one 32-bit multiplication, one k-bit right shift, and one 32-bit AND, per fractional decimal digit.  I find it very funny that it is the integer part that takes more work to convert to decimal.
« Last Edit: January 21, 2025, 01:17:44 am by Nominal Animal »
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4503
  • Country: gb
  • Doing electronics since the 1960s...
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #97 on: January 21, 2025, 09:02:49 am »
Apollo used 24 or 32 bit fixed point, so it has to be good :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #98 on: January 21, 2025, 09:57:23 am »
Apollo used 24 or 32 bit fixed point, so it has to be good :)

I'll ignore the ":)"

They had people who knew how to do numerical analysis, which is not a common skill.

A trivial example is af you have two numbers, X and Y, each of which is accurate to 1%, what is the accuracy of X+Y, X-Y, X*Y, X/Y ? Too many people answer 2%, or 1%.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9528
  • Country: fi
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #99 on: January 21, 2025, 10:09:50 am »
Yeah, with any important project, analysis of input ranges and how computational accuracy affects the output is a must. Usually it's not even rocket science so I suggest trying to do it even in non-critical projects. As a side effect of knowing how to analyze your ranges, efficient fixed point number formats become available. (Or turning this around: with fixed point representations you need to think about what is the largest number that needs to be presented, for every step, given worst-case input parameters; which is extra work. But it's really not a bad idea to do that anyway, in which case fixed point is not any more "extra work" after all.)

The rationale for floating point really is convenience; let the machine handle automatically finding the balance between resolution of the smallest step vs. capability of store the largest possible number so that the programmer "doesn't need to think about accuracy and ranges at all". The downside is either a more complex hardware, or slow software implementation, and some bits of "overhead" (like, if you know your ranges and can fix them without extra safety margin, 24-bit integer gives you similar resolution to a 32-bit float).

Another downside of floating point is that in some rare cases programmer should have been thinking about accuracy and ranges after all. Especially the single-precision float gives false sense of security; always use doubles (which is clearly why e.g. C language tends to do that by default) when you don't want to think.
« Last Edit: January 21, 2025, 10:27:26 am by Siwastaja »
 

Offline magic

  • Super Contributor
  • ***
  • Posts: 7538
  • Country: pl
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #100 on: January 21, 2025, 10:48:27 am »
We use it because it's convenient, but it's not efficient for sure. The fact that printf is based on analyzing format and parameters at run time kind of bites. The only objective benefit of doing this at run time would be to use formats defined at run time, and while that's something you're allowed to do, it's usually not recommended - pretty dangerous. So with fixed (at compile time) formats, this is merely because of language limitations.
It's space efficient, though, because "%d" is just two bytes which tends to be less than the assembly it would take to call a function (including a pointer to the function).

So printf is actually quite sensible in MCU applications, or for logging in situations where the CPU overhead is acceptable. (Besides, I'm not sure if scanning the format string significantly adds to the cost of actually printing those numbers, particularly on MCUs lacking hardware MUL/DIV instructions).

I became a fan of printf after seeing what happens to my flash usage after switching to custom print_string, print_int and print_float functions. The functions themselves were smaller than a printf implementations (no surprise, they didn't even have all its functionality) but the flash image became larger overall.

Which brings as back to everyone's favorite topic: R&$# sucks as much as C++ and both are inferior to C :D (at least when it comes to systems and embedded programming).
« Last Edit: January 21, 2025, 10:58:33 am by magic »
 
The following users thanked this post: mikerj, Siwastaja, Nominal Animal

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #101 on: January 21, 2025, 11:07:22 am »
Yeah, with any important project, analysis of input ranges and how computational accuracy affects the output is a must. Usually it's not even rocket science so I suggest trying to do it even in non-critical projects. As a side effect of knowing how to analyze your ranges, efficient fixed point number formats become available. (Or turning this around: with fixed point representations you need to think about what is the largest number that needs to be presented, for every step, given worst-case input parameters; which is extra work. But it's really not a bad idea to do that anyway, in which case fixed point is not any more "extra work" after all.)

The rationale for floating point really is convenience; let the machine handle automatically finding the balance between resolution of the smallest step vs. capability of store the largest possible number so that the programmer "doesn't need to think about accuracy and ranges at all". The downside is either a more complex hardware, or slow software implementation, and some bits of "overhead" (like, if you know your ranges and can fix them without extra safety margin, 24-bit integer gives you similar resolution to a 32-bit float).

Another downside of floating point is that in some rare cases programmer should have been thinking about accuracy and ranges after all. Especially the single-precision float gives false sense of security; always use doubles (which is clearly why e.g. C language tends to do that by default) when you don't want to think.

Unfortunately pre-analysis is "difficult" in any large application or one where the the inputs cannot be well constrained.

An example of the latter is anything involving matrix inversion, e.g. a Spice analysing a circuit. It is far from unknown for apparently trivial changes (e.g. node names or order) to the schematic/netlist to cause simulation to succeed or fail.

It is also amusing to see how how well C coped with floating point computations where the hardware used an 80-bit internal representation, not 64-bit. Not sure whether that still happens; moved away from that mess long ago, thankfully.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #102 on: January 21, 2025, 11:09:54 am »
We use it because it's convenient, but it's not efficient for sure. The fact that printf is based on analyzing format and parameters at run time kind of bites. The only objective benefit of doing this at run time would be to use formats defined at run time, and while that's something you're allowed to do, it's usually not recommended - pretty dangerous. So with fixed (at compile time) formats, this is merely because of language limitations.

I became a fan of printf after seeing what happens to my flash usage after switching to custom print_string, print_int and print_float functions. The functions themselves were smaller than a printf implementations (no surprise, they didn't even have all its functionality) but the flash image became larger overall.

Presumably you did omit the printf image from the a.out file.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline magic

  • Super Contributor
  • ***
  • Posts: 7538
  • Country: pl
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #103 on: January 21, 2025, 11:45:57 am »
I inspected the binaries with objdump and there was no printf implementation in them, but the size of logging functions blew up. IIRC part of it was also register moves, because if you calculate a bunch of values which will be passed to printf later you can place them in the right registers in advance, but if you call print_int repeatedly, all values must be in the same "first" register, so there is no way to avoid moves between calls.

Another factor is that if you call printf("x=%d y=%d z=%d\n") you only call printf once, but if you replace it with a sequence of low-level calls, you end up with four calls to print_string. So further overhead is: three call instructions, three pointers to the print_string function embedded in those instructions, three string pointers, three instructions (or instruction sequences) to load them into some register.

The whole thing happened because the project was at the limits of available flash and I was looking for the smallest printf implementation or something better than printf. Ended up with some off the shelf minimized printf. And it wasn't even printing that much, I think it had a two digit number of % signs overall, so I was quite surprised to see it working in printf's favor. And yes, I was comparing against stripped down printf implementations, but with hundreds of % signs even a fully featured printf would win.
« Last Edit: January 21, 2025, 12:26:01 pm by magic »
 
The following users thanked this post: tggzzz, Siwastaja

Offline AVI-crak

  • Regular Contributor
  • *
  • Posts: 141
  • Country: ru
    • Rtos
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #104 on: January 21, 2025, 05:02:22 pm »
I don't understand the way to solve your problem, I can't make sense of the conversation. It doesn't make sense.
You have a huge log in front of you, and everyone is rolling it around a huge field. The log won't get smaller, because it's a log, it has to be big.
Any variant of the classic printf() is a log. As long as there is a format string - the log will not disappear.
You need speed and code size - then you should give up the format string.

Translated with DeepL.com (free version)
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4072
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #105 on: January 21, 2025, 06:03:22 pm »
It is also amusing to see how how well C coped with floating point computations where the hardware used an 80-bit internal representation, not 64-bit. Not sure whether that still happens; moved away from that mess long ago, thankfully.

No.  The x87 is essentially deprecated and SSE doesn't support 80 bit floats, nor does any current non-intel platform.  Most people decided they would rather have (closer to) deterministic output rather than the best precision.  The way that most C compilers would handle the x87 was that temporary variables would be 80 bit as long as they were stored in registers, but would be truncated to 64 bit when spilled to memory.  This made the extra precision dependent on exactly how the code was compiled.

You can still use long double on most x86_64 platforms to get 80 bit floats that will use the x87.  In that case the memory and register representation will be 80 bits, but you pay a performance hit in most applications due to the more expensive load/store operations.

As partial compensation, we now have FMA operations that can reduce truncation errors in multiply accumulate operations.
 
The following users thanked this post: oPossum

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 878
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #106 on: January 21, 2025, 08:24:30 pm »
Quote
I already added buffering to _write
I searched the thread to see if some other info provided is contrary to this (didn't see anything), but this implies you are using printf and not sprintf. If you are going through printf to get your _write to output to a buffer, maybe you should instead be going to the buffer directly via snprintf. The speed difference between printf and snprintf could be quite a bit depending on what stdio libraries are in use (we still don't know I guess, so just assume newlib nano).

Code: [Select]
//replacement for printf, redirect to vsnprintf
int printf( const char* fmt, ... ){
    va_list ap;
    va_start( ap, fmt );
    //buf and siz according to remaining buffer space
    int rv = vsnprintf( buf, siz, fmt, ap );
    //adjust buffer space used/remaining
    va_end( ap );
    return rv;
}

 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #107 on: January 21, 2025, 08:53:26 pm »
Quote
I already added buffering to _write
I searched the thread to see if some other info provided is contrary to this (didn't see anything), but this implies you are using printf and not sprintf. If you are going through printf to get your _write to output to a buffer, maybe you should instead be going to the buffer directly via snprintf. The speed difference between printf and snprintf could be quite a bit depending on what stdio libraries are in use (we still don't know I guess, so just assume newlib nano).

Code: [Select]
//replacement for printf, redirect to vsnprintf
int printf( const char* fmt, ... ){
    va_list ap;
    va_start( ap, fmt );
    //buf and siz according to remaining buffer space
    int rv = vsnprintf( buf, siz, fmt, ap );
    //adjust buffer space used/remaining
    va_end( ap );
    return rv;
}

Sorry, I should have been more clear. I went through a systemic process of reworking printing in the past leading up to this post. I used to use _write in the past. I'm currently using snprintf for writing to thread local buffers.

I ended up doing a benchmark of the snprintf wrapper that I supply to the library. I give it 8 large 32 bit numbers (1 hex, and 7 decimal). I found it takes 10000 cycles round trip to put the formatted string into the buffer. The library can easily do 10 of these in a row within the "critical section", which would take about 10ms, it is not surprising it was failing to reliably hit a ~1ms window. I would bet different numbers could result in substantially different speeds.

I determined that I actually need to get through the function call in less than 500 cycles - which I think I'll be able to do by saving the format string and all the arguments to a FIFO for later processing. (roughly 64 bytes of format string, and 32 bytes of numbers)
« Last Edit: January 21, 2025, 09:13:56 pm by incf »
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #108 on: January 21, 2025, 09:31:33 pm »
One question: have you figured out exactly which number formats are actually used by your logging code, and which ones you can ignore?

I ask because there's been a lot of stuff posted here about floats and whatnot. If it turns out that you're not logging any floating-point #s then we can just ignore that format entirely.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #109 on: January 21, 2025, 10:16:45 pm »
One question: have you figured out exactly which number formats are actually used by your logging code, and which ones you can ignore?

I ask because there's been a lot of stuff posted here about floats and whatnot. If it turns out that you're not logging any floating-point #s then we can just ignore that format entirely.

It's all large 32 bit integers and strings. Often 8 per snprintf call.

And in 500 CPU cycles, there is not a lot I can do except copy the request into a buffer for later.

The side threads around optimization are very interesting, but not directly applicable for this particular issue.
« Last Edit: January 21, 2025, 10:18:43 pm by incf »
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #110 on: January 21, 2025, 10:46:13 pm »
So another question, hopefully one that won't further muddy the waters:

Your scheme seems sound: on receipt of a logging request, copy the format string and the values to be logged to a buffer. Fine and dandy.

Then when you have a spare moment, retrieve that stuff from the buffer and display it. Also fine and dandy.

But what happens if your displaying (or saving to disk or whatever) code gets interrupted by a high-priority request to do whatever your thingy is doing (monitoring the network, etc.)? You'd have to drop your logging handling code, correct?

Is it OK if you lose some of the logged data, or is every bit of it precious?

Or could you simply suspend the logging thread, then resume it after the high-priority code has completed? You'd have to have coordination between your main processing thread(s) and the logging thread anyway, so you can remove what you output from the buffer.

I'm sure you've already though of some or all of this. But those of us out here are curious ...
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4072
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #111 on: January 21, 2025, 10:48:26 pm »
Quote
It's all large 32 bit integers and strings. Often 8 per snprintf call.

Strings as in literal parts of the format string or %s format specifiers?
« Last Edit: January 21, 2025, 10:53:18 pm by ejeffrey »
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #112 on: January 21, 2025, 10:51:44 pm »
If he's logging strings (%s specifiers), then he'd need to save both the format string ("User: %s") and the string that gets inserted into the format, probably in a memory heap.

That would complicate the buffering scheme, but not unduly.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #113 on: January 21, 2025, 10:56:05 pm »
So another question, hopefully one that won't further muddy the waters:

Your scheme seems sound: on receipt of a logging request, copy the format string and the values to be logged to a buffer. Fine and dandy.

Then when you have a spare moment, retrieve that stuff from the buffer and display it. Also fine and dandy.

But what happens if your displaying (or saving to disk or whatever) code gets interrupted by a high-priority request to do whatever your thingy is doing (monitoring the network, etc.)? You'd have to drop your logging handling code, correct?

Is it OK if you lose some of the logged data, or is every bit of it precious?

Or could you simply suspend the logging thread, then resume it after the high-priority code has completed? You'd have to have coordination between your main processing thread(s) and the logging thread anyway, so you can remove what you output from the buffer.

I'm sure you've already though of some or all of this. But those of us out here are curious ...

The OS and the IPC (interprocess call) scheme handles all of that. Race conditions, data contention, etc. are all completely impossible by design. If I tried to do "something funny" it simply would not compile.

The data is not precious. It would be truncated if a FIFO got full.

There is a lot of idle time, and the buffers are huge. I consider it practically impossible to fill the FIFO enough to lose data.

The device is only awake and actively doing real-time work seemingly less than a 0.1% of the time. The remainder is free for the processor to clean up the mess made by the real-time process before powering down.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #114 on: January 21, 2025, 11:00:31 pm »
Quote
It's all large 32 bit integers and strings. Often 8 per snprintf call.

Strings as in literal parts of the format string or %s format specifiers?
If he's logging strings (%s specifiers), then he'd need to save both the format string ("User: %s") and the string that gets inserted into the format, probably in a memory heap.

That would complicate the buffering scheme, but not unduly.

Yes, there are %s strings that I have to copy into the buffer. It will make things more complicated but is fine.

No heap, we will probably statically allocate some data structure specifically for this. Some sort of ring buffer filled with variable length/layout structures.

The heap scares me, it seems so nondeterministic (at least without looking at how exactly libc decides to implement it). Although it would be so much easier to write if I was using a heap.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16100
  • Country: fr
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #115 on: January 21, 2025, 11:01:03 pm »
Just a question, but is this amount of logging really useful? Can you not ignore at least part of it?
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #116 on: January 21, 2025, 11:08:15 pm »
Just a question, but is this amount of logging really useful? Can you not ignore at least part of it?
Unfortunately, the majority of it is relevant some portion of the time - or rather, the context it provides is even if the data sometimes is redundant. The most valuable logging data comes directly from private/static variables the "real-time" sections. Not to mention the limitations around printing being "all on" or "all off."

I don't really want to build hash tables when a fast buffering scheme will do it (there are too many seemingly identical printf statements in radically different contexts to feasibly sort out).

It's a state machine whose internal state is not visible except through human readable logs (and we don't want to get into the business of string parsing since it's not going to solve out real problem the formatted strings have to be computed first before they can be parsed, and many lines have to be parsed together as a unit for anything useful to be learned, and all of that happens 10-20 milliseconds after we've missed out real-time deadline)
« Last Edit: January 21, 2025, 11:19:14 pm by incf »
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #117 on: January 21, 2025, 11:14:25 pm »
Yes, there are %s strings that I have to copy into the buffer. It will make things more complicated but is fine.

No heap, we will probably statically allocate some data structure specifically for this. Some sort of ring buffer filled with variable length/layout structures.

The heap scares me, it seems so nondeterministic (at least without looking at how exactly libc decides to implement it). Although it would be so much easier to write if I was using a heap.

Are the strings immutable, i.e. stored in ROM? If so, simply pass their address. Immutability is a powerful property in multithreaded environments. Alternatively pass the string the first time it is used in a printf, thereafter pass a pointer or index.

You are right to fear heaps, except for stuff malloc()ed during start-up and never free()d. In general it is much better to rely on a decent modern GC. That is likely to be correct and faster than something thrown together by someone who vaguely remembers refcounts from university lectures based on historic concepts. Irrelevant in your case, of course.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #118 on: January 21, 2025, 11:19:57 pm »
Not to mention the limitations around it being "all on" or "all off."

Ah, you need a "come from" mechanism so your printf can determine the component doing the logging, and hence whether to log a particular message.

Easy in a language with reflection, but you might be able to peek at the stack to see the return address.  >:D
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #119 on: January 21, 2025, 11:20:17 pm »
No heap, we will probably statically allocate some data structure specifically for this. Some sort of ring buffer filled with variable length/layout structures.

The heap scares me, it seems so nondeterministic (at least without looking at how exactly libc decides to implement it). Although it would be so much easier to write if I was using a heap.

I get that; heaps are kinda scary. I've written heap-management routines that worked perfectly well, but it always took a lot of head-scratching and drawing stuff on paper.

If you've got a gazillion bytes of memory then it'll be a lot easier just to allocate fixed-length elements (you do know what the longest strings you'll be dealing with are, right?). Easy-peasy then.
 
The following users thanked this post: 5U4GB

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #120 on: January 21, 2025, 11:28:39 pm »
No heap, we will probably statically allocate some data structure specifically for this. Some sort of ring buffer filled with variable length/layout structures.

The heap scares me, it seems so nondeterministic (at least without looking at how exactly libc decides to implement it). Although it would be so much easier to write if I was using a heap.

I get that; heaps are kinda scary. I've written heap-management routines that worked perfectly well, but it always took a lot of head-scratching and drawing stuff on paper.

If you've got a gazillion bytes of memory then it'll be a lot easier just to allocate fixed-length elements (you do know what the longest strings you'll be dealing with are, right?). Easy-peasy then.

I've got 500 cycles to work with. I'm going with variable sized elements for "reasons".  (namely, a large swaths of prints occasionally seem to be little more than a single %d or %u done in loops to print arrays. Some prints are long strings with 10 fields, other are not)

Allocating 10 chunks of memory* from the heap and copying/populating the data/fields in only 500 clock cycles seems like it would be hard to do without a lot of consideration.

*1x 64 bytes, 1x48 byte, and 8x 16 bytes (or similar, sometimes there will be strings)
« Last Edit: January 21, 2025, 11:37:22 pm by incf »
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #121 on: January 21, 2025, 11:32:20 pm »
I'm going with variable sized elements for "reasons".  (namely, a large swaths of prints occasionally seem to be little more than a single %d or %u done in loops to print arrays)

So I take it your variable-sized elements include a size prefix (like those non-ASCIIZ C strings), yes?
If you used fixed-length allocations you'd have a lot less bookkeeping to do.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #122 on: January 22, 2025, 12:12:41 am »
A temporary sidetrack:

what is the accuracy of X+Y, X-Y, X*Y, X/Y ?
There is an easy but very informative and useful programming exercise here.

Implement X and Y as random numbers, with some distribution around some median.  Error bounds are by convention those that encompass 68.3% of all values symmetrically around the median, 34.15% above the median, and 34.15% below the median.  Repeat one of the operations above, creating a histogram of the results.  After a sufficient number of iterations, find the median, and the error bounds.  (For a normal distribution, the error bounds are exactly one standard deviation below and above the median.)

The same approach can be applied to investigate "black box" multivariate functions numerically, that for some reason or another cannot be solved analytically.  Each result distribution per fixed input argument the represents one data point.  The function can then be characterized in the desired input argument domain to any desired fidelity, by having enough data points.

To generate random numbers in any distribution \$P(x)\$, first calculate or find out the cumulative distribution function \$F_x(x)\$,
$$F_x(x) = \int_{-\inf}^x P(\chi) \, d\chi$$
noting that \$F_x(-\inf) = 0\$ and \$F_x(+\inf) = 1\$.  To generate a random number \$x\$, you first generate an uniform random number \$p\$, \$0 \le p \le 1\$, and then find \$x\$ such that \$F_x(x) = p\$.  You can either calculate or approximate the inverse function, or you can use a binary search, because \$F_x(x)\$ is a nondecreasing function in \$x\$.

For a normal distribution with median/average \$\mu\$ and standard deviation \$\sigma\$,
$$F_x(x) = \frac{1}{2} + \frac{1}{2} \operatorname{erf}\left( \frac{x - \mu}{\sqrt{2} \sigma} \right)$$
where \$\operatorname{erf}()\$ is the error function.  Often the Box–Muller method is preferable, though.

A variant is where you model the distribution as a simple interval, as a rectangular distribution.  Within this interval, each value is equally likely; outside the range the probability is zero.  This is called interval arithmetic.  Simply put, each variable Z actually refers to a continuous interval, (Zmin..Zmax).  Arithmetic binary operators are applied all four ways, with the result consisting of the minimum and the maximum of those results.

Because this distribution is rather unrealistic, it gives different results than when one uses the standard distribution (Gaussian) or any other more realistic distribution, but it can nevertheless be useful in some cases.  For example, given X = 200±1% = 198..202 and Y = 100±1% = 99..101, interval arithmetic yields
  • X + Y = 297 .. 303 = 300±1%
  • X - Y = 97 .. 103 = 100±3%
  • X * Y = 19602 .. 20402 ≃ 20002±2%
  • X / Y ≃ 1.96 .. 2.04 = 2±2%

I like to point out these kind of programming experiments, because it is how I personally gain a more intuitive understanding of how imprecise values behave.  One does not need to be a mathematician to be able to understand them and to use them to solve problems.
« Last Edit: January 22, 2025, 12:15:18 am by Nominal Animal »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #123 on: January 22, 2025, 12:27:17 am »
It's all large 32 bit integers and strings. Often 8 per snprintf call.

And in 500 CPU cycles, there is not a lot I can do except copy the request into a buffer for later.
Tight.

One option is to create a hash table of formats you have already parsed, in RAM at run time, with the target specifying what parameters you need to copy to the circular buffer.  If the format strings are all in Flash, they don't need to be copied; otherwise, the string itself has to be copied to the circular buffer.  If a %s points to Flash, only the pointer needs to be copied; otherwise you'll need to copy the contents also.  Essentially, you'll need to copy (from the variadic argument list) native words including pointers, native doublewords (if 64-bit values are supported), and RAM-based strings (because those are likely overwritten after the call).  If you use a single 32-bit number, two bits per item suffices, with 00 indicating end, allowing you to support up to 16-parameter formats.  This means that the circular buffer entries should have variable length, so start with the length of each record.

If the format is not in the hash table, you parse it, adding the data to the circular buffer, plus constructing the 32-bit number.  If you had fewer than 17 parameters to format, then you add the mapping to the hash table.  If there are just one or two parameters formatted, it might be faster to just re-parse it each time, and leave the limited hash table for the larger format strings.

This does mean that you end up parsing the format strings twice, but in 500 cycles, I think it is a compromise you'll have to do.
« Last Edit: January 22, 2025, 12:29:06 am by Nominal Animal »
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #124 on: January 22, 2025, 01:19:44 am »
Essentially, you'll need to copy (from the variadic argument list) native words including pointers, native doublewords (if 64-bit values are supported), and RAM-based strings (because those are likely overwritten after the call).

You might want to read the posts here more carefully. The OP has already written
Quote
It's all large 32 bit integers and strings. Often 8 per snprintf call.

No 64-bit values.
 

Offline 5U4GB

  • Frequent Contributor
  • **
  • Posts: 678
  • Country: au
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #125 on: January 22, 2025, 02:14:42 am »
The OS and the IPC (interprocess call) scheme handles all of that. Race conditions, data contention, etc. are all completely impossible by design. If I tried to do "something funny" it simply would not compile.

Now I'm really curious, that sounds like a SIL/ASIL x device or similar, are you allowed to share any more details on it?  PM is fine, there's so little published about real-world use of these things that I'm always interested in examples.
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #126 on: January 22, 2025, 02:21:14 am »
More TLAs:
  • SIL: safety integrity level
  • ASIL: automotive safety integrity level

(OK, that last one is a FLA)
 

Offline 5U4GB

  • Frequent Contributor
  • **
  • Posts: 678
  • Country: au
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #127 on: January 22, 2025, 02:22:31 am »
Allocating 10 chunks of memory from the heap and copying/populating the data/fields in only 500 clock cycles seems like it would be hard to do without a lot of consideration.

I like @Analog Kids's suggestion of a circular buffer, if you've got a ton of RAM then just divide it into fixed-size blocks and advance a counter to the next one on each printf call.  Dump the integer values as ASCII ('A' + nibble, one addition op per 4 bits) and postprocess them later into decimal integer values.
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #128 on: January 22, 2025, 04:14:26 am »
To be honest I wasn't even thinking of a circular buffer but just a plain ol' buffer; since there's so much memory available, simply fill it, flush it and then start from the beginning again.

You could implement a circular buffer if you really wanted to, with a bit more overhead.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #129 on: January 22, 2025, 04:16:19 am »
Essentially, you'll need to copy (from the variadic argument list) native words including pointers, native doublewords (if 64-bit values are supported), and RAM-based strings (because those are likely overwritten after the call).
You might want to read the posts here more carefully. The OP has already written
Quote
It's all large 32 bit integers and strings. Often 8 per snprintf call.
No 64-bit values.
And you might want to stop replying to my posts.  You obviously do not grasp how carefully I do read the posts; the parenthesized note is exactly because I'm aware the limitation OP has, but which does not apply in general in the technique I outlined.

Just because you do not understand something, you should not assume others do not either, because your ability to understand seems to be extremely limited.

Everyone else already knows that I do not like to limit the discussion to the minimum bounds set by the asker, but always explore how to solve similar problems.  Solving the stated problem alone is not worth discussing, but finding how it can be solved, and not for just the asker, but anyone reading the thread afterwards and dealing with similar type problems, is what makes the discussion worth having, and separates this forum from Q&A sites.

If you want answers to your questions without understanding how and why, limited to the exact question asked without any extension of the subject, go to StackExchange or Quora, or hire your own Mechanical Turk for solving your problems for you.  Here, we discuss.
 
The following users thanked this post: eliocor, SiliconWizard

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #130 on: January 22, 2025, 04:17:14 am »
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #131 on: January 22, 2025, 04:20:45 am »
Just because you do not understand something, you should not assume others do not either, because your ability to understand seems to be extremely limited.

Likewise, your ability to address the points which would be of most help to the questions being asked by the OP without introducing all sorts of extraneous matters which may or may not be germane, but which you apparently like to pontificate about, seems to be quite limited.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #132 on: January 22, 2025, 06:24:59 am »
your ability to address the points which would be of most help
Based on your track record, I do not believe you have any idea whatsoever what that help might be.

Are you sure you're not a chatbot?
 
The following users thanked this post: eliocor, tggzzz

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #133 on: January 22, 2025, 09:16:19 am »
If you want answers to your questions without understanding how and why, limited to the exact question asked without any extension of the subject, go to StackExchange or Quora, or hire your own Mechanical Turk for solving your problems for you.  Here, we discuss.

That indicates why I like this forum.

The answers discussions here are much more interesting and informative. "Give a man a fish" vs "teach a man to fish".
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 
The following users thanked this post: Siwastaja, newbrain, 5U4GB

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #134 on: January 22, 2025, 09:17:55 am »
I like @Analog Kids's suggestion
Wow, 5U4GB.

This thread does have more "triumphant reinventions/repetitons" of previous responses than most threads.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #135 on: January 22, 2025, 09:21:05 am »
Just because you do not understand something, you should not assume others do not either, because your ability to understand seems to be extremely limited.

Likewise, your ability to address the points which would be of most help to the questions being asked by the OP without introducing all sorts of extraneous matters which may or may not be germane, but which you apparently like to pontificate about, seems to be quite limited.

Nominal Animal is well aware of his style and is frequently very open about his characteristics. I wish all people were so self-aware.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline magic

  • Super Contributor
  • ***
  • Posts: 7538
  • Country: pl
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #136 on: January 22, 2025, 10:40:06 am »
Are you sure you're not a chatbot?
These days you just ask for a solution to something impossible... ;)
 

Offline AVI-crak

  • Regular Contributor
  • *
  • Posts: 141
  • Country: ru
    • Rtos
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #137 on: January 22, 2025, 11:42:07 am »
I wonder when the physical interface of the connection stopped being a bottleneck. 200 MHz core, data flow of 0.5 megabytes per second of ready text from double-precision numbers. It is possible to create a string of 17 characters from a double-precision number in 300 cycles. That's it, the ring buffer is choked, the external interface works in non-stop mode.
Why is it different for you?

https://github.com/AVI-crak/Rtos_cortex/tree/master/printo
 

Offline 5U4GB

  • Frequent Contributor
  • **
  • Posts: 678
  • Country: au
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #138 on: January 22, 2025, 12:01:40 pm »
This thread does have more "triumphant reinventions/repetitons" of previous responses than most threads.

That's because it's fairly active and respondents are spread across time zones, so there may be 20 more responses they haven't read yet following the one they're replying to.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #139 on: January 22, 2025, 12:10:24 pm »
This thread does have more "triumphant reinventions/repetitons" of previous responses than most threads.

That's because it's fairly active and respondents are spread across time zones, so there may be 20 more responses they haven't read yet following the one they're replying to.

Oh, I understand those are of some of the causes, but I get the feeling some posters haven't read the all posts before the one they are replying to. ("Feeling" because I'm not going to spend time checking and supplying the evidence :) )
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #140 on: January 23, 2025, 12:38:52 am »
I like @Analog Kids's suggestion
Wow, 5U4GB.
This thread does have more "triumphant reinventions/repetitons" of previous responses than most threads.
Mis-attributing suggestions to someone else is just one of the things that makes me absolutely :rant:.  I just cannot ignore it in any context.  Call it a personal fault, stemming from extensive personal negative experiences with that.  I call it a desire for honesty.

I know many users skip my posts due to their length, and that's okay, but when it's something only I mention (#28 including an implementation example, next in #123, and nobody else), and then in #127 5U4GB posts how they like the idea specifically and carefully attributing it to someone else, it feels like a deliberate personal insult.

Or perhaps I am reading the posts too carefully, eh?  Anyways, since my contributions here are not appreciated, I'll get my coat.  I'm out.
« Last Edit: January 23, 2025, 12:40:42 am by Nominal Animal »
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #141 on: January 23, 2025, 12:47:01 am »
I know many users skip my posts due to their length, and that's okay, but when it's something only I mention (#28 including an implementation example, next in #123, and nobody else), and then in #127 5U4GB posts how they like the idea specifically and carefully attributing it to someone else, it feels like a deliberate personal insult.

Yeah, I'm sure 5U4GB intended that as a personal insult; couldn't have been just an honest mis-attribution.

Sheesh; anyone ever tell you that you might be just as narcissistic, or perhaps even more, than Donald Trump?

On top of being a pontificating kibbitzer ...
 
The following users thanked this post: 5U4GB

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #142 on: January 23, 2025, 12:54:48 am »
 :(

I reran my benchmarks and a single "big" printf (8x 32-bit arguments) actually takes me 30000 instruction cycles with certain numbers. At around 10MHz that's a whole 3ms for a line of text! Good luck hitting a real time window within +/-500us of an event with dozens or hundreds of those floating around in the critical path. (thankfully I know exactly how to solve the problem and get it below 500 cycles thanks to this thread)

I honestly had no idea that the core string formatting operations were so expensive  - I seem to have thought that it magically did it all of that work in less than 10% of the time it actually takes. I should have guessed if it takes 1024 bytes of stack space, it probably takes 10-30x as many CPU cycles as stack bytes doing work.
« Last Edit: January 23, 2025, 12:58:58 am by incf »
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #143 on: January 23, 2025, 12:55:50 am »
I know many users skip my posts due to their length, and that's okay, but when it's something only I mention (#28 including an implementation example, next in #123, and nobody else), and then in #127 5U4GB posts how they like the idea specifically and carefully attributing it to someone else, it feels like a deliberate personal insult.

Yeah, I'm sure 5U4GB intended that as a personal insult; couldn't have been just an honest mis-attribution.

Sheesh; anyone ever tell you that you might be just as narcissistic, or perhaps even more, than Donald Trump?

On top of being a pontificating kibbitzer ...

I've no idea what a kibbitzer might be, but we can all imagine personal characteristics that make it less likely that a person will be able/willing to follow detailed thoughtful reasoned posts.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #144 on: January 23, 2025, 01:02:15 am »
:(

I reran my benchmarks and a single "big" printf (8x 32-bit arguments) actually takes me 30000 instruction cycles with certain numbers. At around 10MHz that's a whole 3ms for a line of text! Good luck hitting a real time window within +/-500us of an event with dozens or hundreds of those floating around in the critical path. (thankfully I know exactly how to solve it thanks to this thread)

I honestly had no idea that the core string formatting operations were so expensive  - I seem to have thought that it magically did it all of that work in less than 10% of the time it actually takes. I should have guessed if it take a 1024 bytes of stack space, it probably takes 10-30x as many CPU cycles as stack bytes doing work.

A little measurement of known unknowns beats any amount of hope and guesswork! Knowing which things are the key unknowns requires skill, judgement and experience.

The timing doesn't surprise me. One of the reasons IBM big iron (think 360, 370, ...) was fast at business type processing wasn't the central CPU, but the many peripheral I/O processors that could do simple formatting plus peripheral hardware control.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16100
  • Country: fr
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #145 on: January 23, 2025, 01:04:24 am »
I like @Analog Kids's suggestion
Wow, 5U4GB.
This thread does have more "triumphant reinventions/repetitons" of previous responses than most threads.
Mis-attributing suggestions to someone else is just one of the things that makes me absolutely :rant:.  I just cannot ignore it in any context.  Call it a personal fault, stemming from extensive personal negative experiences with that.  I call it a desire for honesty.

I know many users skip my posts due to their length, and that's okay, but when it's something only I mention (#28 including an implementation example, next in #123, and nobody else), and then in #127 5U4GB posts how they like the idea specifically and carefully attributing it to someone else, it feels like a deliberate personal insult.

This is something that's relatively common on social media in general. It's very common to see some users post the exact same idea as a previous one, and manage to get the credit. I've seen that numerous times.
You can't always be certain this is deliberate - sometimes others will come up with a similar idea and may not have read the other post already exposing it. But when not just the core idea, but even some specific words are used similarly, then the doubt is hard to justify.

Even so, I'm not even sure this is done completely deliberately - I think it's not uncommon on social media, where the discussions can be long and go in all directions, that people will have read about some particular idea in a previous post, but can't remember who said it or even if they actually read it in the same thread, and will just post the same idea later on.

And yes, most often, the posts that tend to catch people's attention are the ones that are shorter and well, "catchy". That's unfortunately how the human mind works, and Youtubers know that very well. That leads to this kind of situations where some will get all the credit by actually appropriating other people's work.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #146 on: January 23, 2025, 01:09:27 am »
I wonder when the physical interface of the connection stopped being a bottleneck. 200 MHz core, data flow of 0.5 megabytes per second of ready text from double-precision numbers. It is possible to create a string of 17 characters from a double-precision number in 300 cycles. That's it, the ring buffer is choked, the external interface works in non-stop mode.
Why is it different for you?

https://github.com/AVI-crak/Rtos_cortex/tree/master/printo

It takes me on the order of 3ms for me to generate 90 bytes of string with printf. My UART is too fast and the buffer is too big.
 

Offline 5U4GB

  • Frequent Contributor
  • **
  • Posts: 678
  • Country: au
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #147 on: January 23, 2025, 01:17:17 am »
Yeah, I'm sure 5U4GB intended that as a personal insult; couldn't have been just an honest mis-attribution.

It wasn't even a mis-attribution, Analog Kid made the observation a few messages before the one I was responding to, and incf then quoted Analog Kid in a response.  I trimmed the quoted stuff a bit so there wouldn't be masses of repeated text to go through before getting to my response.

If it's really that big a deal, I'm happy to attribute all ideas and quotes to the Queen of Sheba, and everyone can decide for themselves whether I'm referring to them or not :P.
 
The following users thanked this post: Analog Kid

Offline AVI-crak

  • Regular Contributor
  • *
  • Posts: 141
  • Country: ru
    • Rtos
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #148 on: January 23, 2025, 09:49:03 am »
I should have guessed if it takes 1024 bytes of stack space, it probably takes 10-30x as many CPU cycles as stack bytes doing work.
Don't touch the stack, it's sacred. Classic printf() will not work without a stack. In fact, 1k is not enough, more often 100k bytes are allocated.
 

Offline macboy

  • Super Contributor
  • ***
  • Posts: 2319
  • Country: ca
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #149 on: January 23, 2025, 05:27:27 pm »
I've worked with embedded real time systems for decades. In my industry, hardware turnover is slow, and the systems I use currently have only a few hundred MHz CPU (32bit PowerPC) so it isn't a powerhouse. We do sometimes - too often actually - need to print logs from real time tasks or the kernel, or even from ISRs(!). I'll describe how we do it. Maybe it will be useful to some.

We have a low priority task which handles the printing of these logs, just +1 above the idle task. The real time task or kernel code which wants to output a log will call a function (e.g. "log_message()") which places the raw data into a circular/ring buffer. The function is simple, it takes exactly seven arguments, and it just copies those (plus timestamp and calling task ID) to the ring buffer. The first arg is a pointer to the printf-style format string. The other six args are just ints (32 bit int). There are some restrictions you would not have directly calling printf(): The pointer to the format string must point to some static/persistent memory, which is to say, absolutely not on the stack nor heap, and it (the string) must be const. The caller of log_message() doesn't know when the log will be printed... it could be soon, or seconds, or minutes from now. So the format string must not change, because we don't copy the string into the ring buffer, we just copy the pointer to it. As Dave would say, that's a trap for young players. The six int args are nominally int but they can in fact be any 32 bit value which can be cast to an int. Printf will interpret them according to the format string. If the caller needs fewer than six args, then it fills the rest with zeros. If the format string has fewer than six fields specified, then printf doesn't look at those extra zero args. Args greater than 32b size (double FP) are not supported.

When the low priority log_task eventually runs (usually very soon after someone calls log_message, it is a real time system after all), it pulls one log of data from the buffer. It converts the timestamp to human readable format, and looks up the task name. It calls printf() with that information, then calls printf() again using the supplied format string and the six args. When some unusual event or error occurs, several tasks may collectively print dozens of logs in a short time (within a millisecond). RT performance is sufficiently maintained, and that is the most critical point for our systems.

Because printf is called from the low priority task, the substantial time it spends formatting the output string and sending it to the UART doesn't cause real-time performance impact. The stack space consumed is also only on that task, not the potentially many other tasks "printing" logs. Also, since only one task is ever calling printf, any reentrancy issues are avoided. I don't claim this is a truly lightweight solution; we do need substantial memory for the ring buffer, substantial idle CPU time for the log task to run, and substantial code space for the printf implementation.

In general, these logs are printed as the happen, but in theory, one could capture the ring buffer, and use a copy of the .elf image to obtain the format strings, and print the logs post-mortem. The task ids are the only thing that would not be able to be interpreted offline into the task name, unless one also captured this data. I guess you could also send the raw data out the UART in real-time, and use printf on a remote system to generate the text log, further reducing demand on the embedded system resources.
 
The following users thanked this post: 5U4GB

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 21682
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #150 on: January 23, 2025, 05:37:39 pm »
All those suggestions are sound :) And have already been made several times in this thread! :(
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #151 on: January 23, 2025, 05:49:25 pm »
All those suggestions are sound :) And have already been made several times in this thread! :(

Yes. I think the OP has a pretty good handle on what needs to be done.
 

Offline magic

  • Super Contributor
  • ***
  • Posts: 7538
  • Country: pl
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #152 on: January 23, 2025, 06:07:28 pm »
It's the obvious thing to do if you can modify the logging code and don't need to accept the crazy printf ABI.

That being said, I'd expect the ABI not to be too crazy if there are no floats and/or long types involved, so writing a fake printf which simply saves the first 10 varargs (or whatever the worst case number is) without having any idea what they really are might be doable on most, if not all, architectures.

But I'm repeating myself, and possibly others.
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1560
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #153 on: January 23, 2025, 06:14:24 pm »
It's already been established, to those who bother to actually read the posts in this thread, that there are no floats here. All 32-bit ints and strings.
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4503
  • Country: gb
  • Doing electronics since the 1960s...
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #154 on: January 23, 2025, 07:56:30 pm »
Quote
I honestly had no idea that the core string formatting operations were so expensive  - I seem to have thought that it magically did it all of that work in less than 10% of the time it actually takes. I should have guessed if it takes 1024 bytes of stack space, it probably takes 10-30x as many CPU cycles as stack bytes doing work.

One of the problems with "slow runtime libs" is that they are written in C :) That's OK; you want portability. But note that printf(%f) is defined as double, and a lot of CPUs have no hardware doubles. In fact the software double float lib in the 32F4 GCC stuff was written in C! That makes it probably 1000x slower than single floats.

Another problem is that a lot of runtime libs were knocked up in a hurry, to get the tool to the market. So you get crappy code, e.g. a sscanf which with say a 10 digit number string does 10 double float divisions by 10, as well as a ton of other crap.

I know the OP was after 32 bit ints but you get the same stuff there. Some crappy C source, built to get the tool out. Anyone remembers YACC? :)

The 32F4 stdlib memcpy for example is just a byte moving loop. No attempt to optimise for a multiple of 4 length, alignment, etc. They did that presumably because some of the CPUs will hard fault with unaligned 32 bit ops, etc.

Quote
more often 100k bytes are allocated.

That would be ridiculous.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16100
  • Country: fr
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #155 on: January 23, 2025, 10:28:00 pm »
I should have guessed if it takes 1024 bytes of stack space, it probably takes 10-30x as many CPU cycles as stack bytes doing work.
Don't touch the stack, it's sacred. Classic printf() will not work without a stack. In fact, 1k is not enough, more often 100k bytes are allocated.

Oh really. :-/O
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9528
  • Country: fi
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #156 on: January 24, 2025, 07:29:48 am »
But note that printf(%f) is defined as double, and a lot of CPUs have no hardware doubles.

This "double precision trap" is all around the C language - not just printf but literals (f suffix needed to force single precision), implicit conversion rules, standard functions with well-known names using doubles and less well-known f-suffixed versions needed for single precision. So one needs to be really careful with SP FP hardware if they want to leverage the performance. Then again, with SP FP, one needs to be careful with accuracy too since the assumption "it just works" does not always hold like it practically does with doubles.

So I guess HW single precision FP implementations are often pretty poorly used.

Quote
Another problem is that a lot of runtime libs were knocked up in a hurry, to get the tool to the market. So you get crappy code, e.g. a sscanf which with say a 10 digit number string does 10 double float divisions by 10, as well as a ton of other crap.
...
The 32F4 stdlib memcpy for example is just a byte moving loop. No attempt to optimise for a multiple of 4 length, alignment, etc.

I agree. There are two things which surprise me in the microcontroller world:
* Lack (or difficulty of finding) of simple examples leading to the wrong idea of complexity (leading to use of very complicated/difficult hardware abstraction libraries)
* Low-effort, inefficient and large standard libraries. One could think that when you have APIs which have not changed for 40 years, and CPUs which have not changed much for 15 years, and hundreds of millions of end-users (and tens of thousands of programmers and thousands of companies) world-wide, optimized, well thought of solutions would be in use by now. But nope. Still some random ancient stuff which is functionally correct but no more than that.

It's weird that one needs to do a lot of "homework" to sidestep these two issues. Talk about duplicated effort!
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4503
  • Country: gb
  • Doing electronics since the 1960s...
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #157 on: January 24, 2025, 08:50:01 am »
Some time ago I spent hours putting "f" after a lot of numbers in some old code I had. Didn't make any difference because the 168MHz 32F4 is so fast it doesn't matter, but it would make a %f printf run perhaps 100x to 1000x faster... except that unless you dive into the printf source itself and fix it, it will still be using doubles internally.

For single floats to be a problem precision-wise, one must be doing something severely wrong, because 24 bits is more bits than the precision to which most natural constants are known, and is way more bits than the accuracy of any ADC or any analog circuit you can build.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 878
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #158 on: January 24, 2025, 11:49:30 am »
Quote
At around 10MHz
Is that an mcu limit, library limit or self imposed limit? That is quite a speed limit for an M0+, if that indeed is the speed limit.

Quote
I should have guessed if it takes 1024 bytes of stack space
You could mention what printf library you are using.



Not that this would help for the case at hand, but...

I used the following code to test what improvements a tweak to the divmod code would look like if the hex was optimized instead of blindly calling divmod for any base (mcu is an stm32g031). A class was created to be a buffer, inheriting the Print class so not dealing with the uart hardware for the test of speeds. Another base enum was added and the divmod loop in the Print class was set to handle that case by using bitand and shift. The difference in this test was 1.25ms vs 0.760ms so is at least something. To take advantage one would have to use hex and then either live with that or convert somewhere else such as on the pc side. Most likely not worth the trouble when what you really want is decimal, but it is an easy speed improvement with hardly any cost.

For something like newlib, instead of mucking around in printf/nano-printf one could make a simple replacement for the divmod library code where it would add a simple check for a divisor of 16 and handle that quickly without having to make the __udivsi3 library call. A one trick pony, but still one nice trick which would see some use in the case of printf and hex formatted numbers. Again, probably not a major improvement but is essentially free (unless you are hammering out lots of division elsewhere, then maybe the quick divisor test starts to add up). I know there are realities one may have to deal with that will not allow this, but if division was my game I would be looking to move up from an M0.

I show the test code only because I think its pretty neat what you can do with C++. The Print class is a single header, duplicates the 'real' C++ cout style pretty well, is actually quite easy to write once you get the big picture. It also ends up being pretty small, but at some point it will start to exceed the size of a standard printf but you will have a good head start. The following code, with a system timer (lptimer), uart, gpio, using the chrono library, etc., compiles to 4.3k. The biggest plus is you have code you wrote, understand what its doing, and can change as the need arises (such as adding the test code to improve the divmod/16).
Code: [Select]
                while(1){

                    u32 somenums[8]{
                        0xFFFFFFFF, 0xFFFFFFFE, 0xFFFFFFFD, 0xFFFFFFFC,
                        0xFFFFFFFB, 0xFFFFFFFA, 0xFFFFFFF8, 0xFFFFFFF7
                        };

                    auto n{ 0 };
                    auto t1 = now(), t2 = now();;               

                    auto summary = [&](u8 base, u32 count){
                        board.uart, right,
                            fg(WHITE), setwf(10,' '), "base: ", base, endl,
                            fg(WHITE), setwf(10,' '), "cpuHz: ", System::cpuHz(), endl,
                            fg(WHITE), setwf(10,' '), "et: ", fg(LIME), Lptim1ClockLSI::time_point(t2-t1), endl,                       
                            fg(WHITE), setwf(10,' '), "chars: ", fg(LIME), count, endl2;
                        };

                    auto test = [&](FMT::FMT_BASE base){
                        n = 0;
                        sbuf, countclr, base;
                        t1 = now();
                        for( auto& v : somenums ){ sbuf, '[', n++, "] ", v, endl; }
                        t2 = now();
                        summary( base == dec ? 10 : 16, sbuf.count() );
                        };

                    test(dec);                   
                    test(hex);
                    test(hexopt);

                    delay(10s);

                    }

Code: [Select]
    base: 10
   cpuHz: 16000000
      et:    0d00:00:00.001496
   chars: 120

    base: 16
   cpuHz: 16000000
      et:    0d00:00:00.001251
   chars: 104

    base: 16
   cpuHz: 16000000
      et:    0d00:00:00.000763
   chars: 104
 

Offline kevinpt

  • Contributor
  • Posts: 14
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #159 on: February 04, 2025, 10:09:06 pm »
Printf() may not be the problem at all. I didn't see an indication of where prints to stdout are going to. Is it direct to a UART or using a semi-hosting approach over a debugger?

Regardless, many vendor libraries implement default C stdio very poorly with blocking writes per character. This kills performance as your code is throttled by waiting for IO to clear, making it impossible to print from timing sensitive code. You will get better performance if you can offload the work to an interrupt driven IO system.

C stdio has its own buffering that will block whenever it can't write because it is designed to be lossless. This approach isn't the best for unhosted embedded platforms. The first step is to disable the library buffering with setvbuf(). Then re-plumb the stdlib IO backend to direct all writes on stdout into your own circular buffer that can be accessed safely from an ISR without concurrency issues. Configure a TX ISR to pull data from this buffer whenever it is available. Critically, if the buffer is ever full you will arrange to drop data, ensuring nothing with higher priority ever blocks. It is on you to ensure you have a large enough buffer and don't dump too much to it at once. This will work well for a UART. With semi-hosting you may be at the mercy of your vendor's design flaws.

You can also substitute lighter weight printf() implementations that strip down features in exchange for faster execution. If that is still too slow find strategies to print less information at better times or use simpler methods for generating output text.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #160 on: February 04, 2025, 10:30:43 pm »
Update: I fixed it

On a Cortex-M0 type CPU a test printf with 8 arguments takes about 600 CPU clock cycles when using argument buffering. (Our goal was 500 cycles)

It turns out that is was necessary to copy pointers whereever possible. And that copying strings, let alone parsing them for doing va_args is too slow.
We ended up opting to programmatically replace printfs with precomputed calls that use constant strings that can be passed via pointer.
« Last Edit: February 04, 2025, 10:32:41 pm by incf »
 
The following users thanked this post: SiliconWizard, rhodges

Offline rhodges

  • Frequent Contributor
  • **
  • Posts: 358
  • Country: us
  • Available for embedded projects.
    • My public libraries, code samples, and projects for STM8.
Re: [Solved] Saving "printf" arguments for later? Real-time on slow processor.
« Reply #161 on: February 05, 2025, 01:33:18 am »
I have followed this and thought this was the only good option. I thank everyone for the very good and very bad advice.
Currently developing embedded RISC-V. Recently STM32 and STM8. All are excellent choices. Past includes 6809, Z80, 8086, PIC, MIPS, PNX1302, and some 8748 and 6805. Check out my public code on github. https://github.com/unfrozen
 

Online mwb1100

  • Frequent Contributor
  • **
  • Posts: 675
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #162 on: February 05, 2025, 02:04:37 am »
We ended up opting to programmatically replace printfs with precomputed calls that use constant strings that can be passed via pointer.

How does this work with the 3rd party library which could not be modified?
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: [Solved] Saving "printf" arguments for later? Real-time on slow processor.
« Reply #163 on: February 05, 2025, 10:21:18 pm »
I addressed that a little in older replies.

Being able to satisfactorily prove that the requirements are impossible 'opened the door' to us doing things we couldn't/wouldn't/shouldn't do otherwise.

We took on a little bit more technical debt/maintenance and test burden than we had hoped for - but that's a problem for the future.
 
The following users thanked this post: mwb1100

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9528
  • Country: fi
Re: [Solved] Saving "printf" arguments for later? Real-time on slow processor.
« Reply #164 on: February 06, 2025, 06:29:24 am »
So to wrap it up, you tried the suggestions that were made based on your original constraints, and found out even the best ones would not do it, showing that the original constraint (of link-time binary compatibility with external library) was too tight.

This often happens, and indeed getting too fixated over the constraints is a bad idea.

I like this forum exactly because this isn't a Q&A site. This thread showed a good mix of ideas which tried to stay within the original constraints (a mandatory requirement for a Q&A site), and those that didn't (which ended up being valuable this time).

Good job.  :-+
 
The following users thanked this post: davep238, 5U4GB

Offline eTobey

  • Super Contributor
  • ***
  • Posts: 1293
  • Country: de
  • Virtual Features for the SDS800XHD -> My website
    • Virtual feature script
Re: [Solved] Saving "printf" arguments for later? Real-time on slow processor.
« Reply #165 on: February 10, 2025, 08:25:14 am »
I have done my debug system like this:
Store an error code, plus an value into an array, and worry about outputting elsewhere. Havent implemented a convenient output yet, but its pretty fast. The only problem arises, when more data gets stored, than gets output. But the DMA will do wonders here, once implemented.

BTW:
Sometimes its a good thing to talk the boss out of something. If there is not much headroom you may end with some headache some day!
"Sometimes, after talking with a person, you want to pet a dog, wave at a monkey, and take off your hat to an elephant."(Maxim Gorki)

SDS800X HD bugs/issues/workarounds (Updated 17. Feb. 2025)
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf