Author Topic: [Solved] Saving "printf" arguments for later? Real-time on slow processor.  (Read 10027 times)

0 Members and 1 Guest are viewing this topic.

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 21750
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #150 on: January 23, 2025, 05:37:39 pm »
All those suggestions are sound :) And have already been made several times in this thread! :(
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1622
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #151 on: January 23, 2025, 05:49:25 pm »
All those suggestions are sound :) And have already been made several times in this thread! :(

Yes. I think the OP has a pretty good handle on what needs to be done.
 

Online magic

  • Super Contributor
  • ***
  • Posts: 7550
  • Country: pl
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #152 on: January 23, 2025, 06:07:28 pm »
It's the obvious thing to do if you can modify the logging code and don't need to accept the crazy printf ABI.

That being said, I'd expect the ABI not to be too crazy if there are no floats and/or long types involved, so writing a fake printf which simply saves the first 10 varargs (or whatever the worst case number is) without having any idea what they really are might be doable on most, if not all, architectures.

But I'm repeating myself, and possibly others.
 

Online Analog Kid

  • Super Contributor
  • ***
  • Posts: 1622
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #153 on: January 23, 2025, 06:14:24 pm »
It's already been established, to those who bother to actually read the posts in this thread, that there are no floats here. All 32-bit ints and strings.
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 4519
  • Country: gb
  • Doing electronics since the 1960s...
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #154 on: January 23, 2025, 07:56:30 pm »
Quote
I honestly had no idea that the core string formatting operations were so expensive  - I seem to have thought that it magically did it all of that work in less than 10% of the time it actually takes. I should have guessed if it takes 1024 bytes of stack space, it probably takes 10-30x as many CPU cycles as stack bytes doing work.

One of the problems with "slow runtime libs" is that they are written in C :) That's OK; you want portability. But note that printf(%f) is defined as double, and a lot of CPUs have no hardware doubles. In fact the software double float lib in the 32F4 GCC stuff was written in C! That makes it probably 1000x slower than single floats.

Another problem is that a lot of runtime libs were knocked up in a hurry, to get the tool to the market. So you get crappy code, e.g. a sscanf which with say a 10 digit number string does 10 double float divisions by 10, as well as a ton of other crap.

I know the OP was after 32 bit ints but you get the same stuff there. Some crappy C source, built to get the tool out. Anyone remembers YACC? :)

The 32F4 stdlib memcpy for example is just a byte moving loop. No attempt to optimise for a multiple of 4 length, alignment, etc. They did that presumably because some of the CPUs will hard fault with unaligned 32 bit ops, etc.

Quote
more often 100k bytes are allocated.

That would be ridiculous.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16132
  • Country: fr
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #155 on: January 23, 2025, 10:28:00 pm »
I should have guessed if it takes 1024 bytes of stack space, it probably takes 10-30x as many CPU cycles as stack bytes doing work.
Don't touch the stack, it's sacred. Classic printf() will not work without a stack. In fact, 1k is not enough, more often 100k bytes are allocated.

Oh really. :-/O
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 9551
  • Country: fi
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #156 on: January 24, 2025, 07:29:48 am »
But note that printf(%f) is defined as double, and a lot of CPUs have no hardware doubles.

This "double precision trap" is all around the C language - not just printf but literals (f suffix needed to force single precision), implicit conversion rules, standard functions with well-known names using doubles and less well-known f-suffixed versions needed for single precision. So one needs to be really careful with SP FP hardware if they want to leverage the performance. Then again, with SP FP, one needs to be careful with accuracy too since the assumption "it just works" does not always hold like it practically does with doubles.

So I guess HW single precision FP implementations are often pretty poorly used.

Quote
Another problem is that a lot of runtime libs were knocked up in a hurry, to get the tool to the market. So you get crappy code, e.g. a sscanf which with say a 10 digit number string does 10 double float divisions by 10, as well as a ton of other crap.
...
The 32F4 stdlib memcpy for example is just a byte moving loop. No attempt to optimise for a multiple of 4 length, alignment, etc.

I agree. There are two things which surprise me in the microcontroller world:
* Lack (or difficulty of finding) of simple examples leading to the wrong idea of complexity (leading to use of very complicated/difficult hardware abstraction libraries)
* Low-effort, inefficient and large standard libraries. One could think that when you have APIs which have not changed for 40 years, and CPUs which have not changed much for 15 years, and hundreds of millions of end-users (and tens of thousands of programmers and thousands of companies) world-wide, optimized, well thought of solutions would be in use by now. But nope. Still some random ancient stuff which is functionally correct but no more than that.

It's weird that one needs to do a lot of "homework" to sidestep these two issues. Talk about duplicated effort!
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 4519
  • Country: gb
  • Doing electronics since the 1960s...
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #157 on: January 24, 2025, 08:50:01 am »
Some time ago I spent hours putting "f" after a lot of numbers in some old code I had. Didn't make any difference because the 168MHz 32F4 is so fast it doesn't matter, but it would make a %f printf run perhaps 100x to 1000x faster... except that unless you dive into the printf source itself and fix it, it will still be using doubles internally.

For single floats to be a problem precision-wise, one must be doing something severely wrong, because 24 bits is more bits than the precision to which most natural constants are known, and is way more bits than the accuracy of any ADC or any analog circuit you can build.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 878
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #158 on: January 24, 2025, 11:49:30 am »
Quote
At around 10MHz
Is that an mcu limit, library limit or self imposed limit? That is quite a speed limit for an M0+, if that indeed is the speed limit.

Quote
I should have guessed if it takes 1024 bytes of stack space
You could mention what printf library you are using.



Not that this would help for the case at hand, but...

I used the following code to test what improvements a tweak to the divmod code would look like if the hex was optimized instead of blindly calling divmod for any base (mcu is an stm32g031). A class was created to be a buffer, inheriting the Print class so not dealing with the uart hardware for the test of speeds. Another base enum was added and the divmod loop in the Print class was set to handle that case by using bitand and shift. The difference in this test was 1.25ms vs 0.760ms so is at least something. To take advantage one would have to use hex and then either live with that or convert somewhere else such as on the pc side. Most likely not worth the trouble when what you really want is decimal, but it is an easy speed improvement with hardly any cost.

For something like newlib, instead of mucking around in printf/nano-printf one could make a simple replacement for the divmod library code where it would add a simple check for a divisor of 16 and handle that quickly without having to make the __udivsi3 library call. A one trick pony, but still one nice trick which would see some use in the case of printf and hex formatted numbers. Again, probably not a major improvement but is essentially free (unless you are hammering out lots of division elsewhere, then maybe the quick divisor test starts to add up). I know there are realities one may have to deal with that will not allow this, but if division was my game I would be looking to move up from an M0.

I show the test code only because I think its pretty neat what you can do with C++. The Print class is a single header, duplicates the 'real' C++ cout style pretty well, is actually quite easy to write once you get the big picture. It also ends up being pretty small, but at some point it will start to exceed the size of a standard printf but you will have a good head start. The following code, with a system timer (lptimer), uart, gpio, using the chrono library, etc., compiles to 4.3k. The biggest plus is you have code you wrote, understand what its doing, and can change as the need arises (such as adding the test code to improve the divmod/16).
Code: [Select]
                while(1){

                    u32 somenums[8]{
                        0xFFFFFFFF, 0xFFFFFFFE, 0xFFFFFFFD, 0xFFFFFFFC,
                        0xFFFFFFFB, 0xFFFFFFFA, 0xFFFFFFF8, 0xFFFFFFF7
                        };

                    auto n{ 0 };
                    auto t1 = now(), t2 = now();;               

                    auto summary = [&](u8 base, u32 count){
                        board.uart, right,
                            fg(WHITE), setwf(10,' '), "base: ", base, endl,
                            fg(WHITE), setwf(10,' '), "cpuHz: ", System::cpuHz(), endl,
                            fg(WHITE), setwf(10,' '), "et: ", fg(LIME), Lptim1ClockLSI::time_point(t2-t1), endl,                       
                            fg(WHITE), setwf(10,' '), "chars: ", fg(LIME), count, endl2;
                        };

                    auto test = [&](FMT::FMT_BASE base){
                        n = 0;
                        sbuf, countclr, base;
                        t1 = now();
                        for( auto& v : somenums ){ sbuf, '[', n++, "] ", v, endl; }
                        t2 = now();
                        summary( base == dec ? 10 : 16, sbuf.count() );
                        };

                    test(dec);                   
                    test(hex);
                    test(hexopt);

                    delay(10s);

                    }

Code: [Select]
    base: 10
   cpuHz: 16000000
      et:    0d00:00:00.001496
   chars: 120

    base: 16
   cpuHz: 16000000
      et:    0d00:00:00.001251
   chars: 104

    base: 16
   cpuHz: 16000000
      et:    0d00:00:00.000763
   chars: 104
 

Offline kevinpt

  • Contributor
  • Posts: 14
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #159 on: February 04, 2025, 10:09:06 pm »
Printf() may not be the problem at all. I didn't see an indication of where prints to stdout are going to. Is it direct to a UART or using a semi-hosting approach over a debugger?

Regardless, many vendor libraries implement default C stdio very poorly with blocking writes per character. This kills performance as your code is throttled by waiting for IO to clear, making it impossible to print from timing sensitive code. You will get better performance if you can offload the work to an interrupt driven IO system.

C stdio has its own buffering that will block whenever it can't write because it is designed to be lossless. This approach isn't the best for unhosted embedded platforms. The first step is to disable the library buffering with setvbuf(). Then re-plumb the stdlib IO backend to direct all writes on stdout into your own circular buffer that can be accessed safely from an ISR without concurrency issues. Configure a TX ISR to pull data from this buffer whenever it is available. Critically, if the buffer is ever full you will arrange to drop data, ensuring nothing with higher priority ever blocks. It is on you to ensure you have a large enough buffer and don't dump too much to it at once. This will work well for a UART. With semi-hosting you may be at the mercy of your vendor's design flaws.

You can also substitute lighter weight printf() implementations that strip down features in exchange for faster execution. If that is still too slow find strategies to print less information at better times or use simpler methods for generating output text.
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 172
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #160 on: February 04, 2025, 10:30:43 pm »
Update: I fixed it

On a Cortex-M0 type CPU a test printf with 8 arguments takes about 600 CPU clock cycles when using argument buffering. (Our goal was 500 cycles)

It turns out that is was necessary to copy pointers whereever possible. And that copying strings, let alone parsing them for doing va_args is too slow.
We ended up opting to programmatically replace printfs with precomputed calls that use constant strings that can be passed via pointer.
« Last Edit: February 04, 2025, 10:32:41 pm by incf »
 
The following users thanked this post: SiliconWizard, rhodges

Offline rhodges

  • Frequent Contributor
  • **
  • Posts: 358
  • Country: us
  • Available for embedded projects.
    • My public libraries, code samples, and projects for STM8.
Re: [Solved] Saving "printf" arguments for later? Real-time on slow processor.
« Reply #161 on: February 05, 2025, 01:33:18 am »
I have followed this and thought this was the only good option. I thank everyone for the very good and very bad advice.
Currently developing embedded RISC-V. Recently STM32 and STM8. All are excellent choices. Past includes 6809, Z80, 8086, PIC, MIPS, PNX1302, and some 8748 and 6805. Check out my public code on github. https://github.com/unfrozen
 

Online mwb1100

  • Frequent Contributor
  • **
  • Posts: 687
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #162 on: February 05, 2025, 02:04:37 am »
We ended up opting to programmatically replace printfs with precomputed calls that use constant strings that can be passed via pointer.

How does this work with the 3rd party library which could not be modified?
 

Offline incfTopic starter

  • Regular Contributor
  • *
  • Posts: 172
  • Country: us
  • ASCII > UTF8
Re: [Solved] Saving "printf" arguments for later? Real-time on slow processor.
« Reply #163 on: February 05, 2025, 10:21:18 pm »
I addressed that a little in older replies.

Being able to satisfactorily prove that the requirements are impossible 'opened the door' to us doing things we couldn't/wouldn't/shouldn't do otherwise.

We took on a little bit more technical debt/maintenance and test burden than we had hoped for - but that's a problem for the future.
 
The following users thanked this post: mwb1100

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 9551
  • Country: fi
Re: [Solved] Saving "printf" arguments for later? Real-time on slow processor.
« Reply #164 on: February 06, 2025, 06:29:24 am »
So to wrap it up, you tried the suggestions that were made based on your original constraints, and found out even the best ones would not do it, showing that the original constraint (of link-time binary compatibility with external library) was too tight.

This often happens, and indeed getting too fixated over the constraints is a bad idea.

I like this forum exactly because this isn't a Q&A site. This thread showed a good mix of ideas which tried to stay within the original constraints (a mandatory requirement for a Q&A site), and those that didn't (which ended up being valuable this time).

Good job.  :-+
 
The following users thanked this post: davep238, 5U4GB

Offline eTobey

  • Super Contributor
  • ***
  • Posts: 1302
  • Country: de
  • Virtual Features for the SDS800XHD -> My website
    • Virtual feature script
Re: [Solved] Saving "printf" arguments for later? Real-time on slow processor.
« Reply #165 on: February 10, 2025, 08:25:14 am »
I have done my debug system like this:
Store an error code, plus an value into an array, and worry about outputting elsewhere. Havent implemented a convenient output yet, but its pretty fast. The only problem arises, when more data gets stored, than gets output. But the DMA will do wonders here, once implemented.

BTW:
Sometimes its a good thing to talk the boss out of something. If there is not much headroom you may end with some headache some day!
"Sometimes, after talking with a person, you want to pet a dog, wave at a monkey, and take off your hat to an elephant."(Maxim Gorki)

SDS800X HD bugs/issues/workarounds (Updated 17. Feb. 2025)
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf