Author Topic: [Solved] Saving "printf" arguments for later? Real-time on slow processor.  (Read 9731 times)

0 Members and 1 Guest are viewing this topic.

Online incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #75 on: January 19, 2025, 02:13:18 pm »
Quote
occasionally emit about 10kb of text in less than 100ms at one less-critical point
That either means 1Mbaud or 100kbaud depending on what your b in 10kb means.

What is your uart speed, what is your mcu speed, what speed  is your mcu capable of, how is your uart moving the data- dma or interrupt, do you have a Segger debugger in use where Segger RTT is available for use.

Speed is going to be the best solution.
We already have a ~10kB thread local buffer (that is larger than the max output) which is copied to the UART thread's ~10kB buffer when real-time stuff is not occurring, IO speed is not an issue.
« Last Edit: January 19, 2025, 04:18:35 pm by incf »
 

Online incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #76 on: January 19, 2025, 02:39:31 pm »
Quote
Your system isn't hard realtime, and does have sufficient "excess" processing power. Doesn't it?

So, what's the application domain? I hope it is neither military nor healthcare.

Nothing special. Mostly proprietary wireless communication with a small amount of measurement, system control, etc. on networks of battery powered devices that talk to each other. A lot of logging to flash. Slow low power Cortex-M0 type processor. Realtime scheduling of network communications plus a bit of external device control, and crunching of data. (it is not capable of dealing with missed deadlines that occur due events like printfs that occur after it has decided that it is going to do something - unfortunately, it absolutely has to decide and commit to the network that it is going to do something before it actually does anything loggable, which means everything important must occur after it has irreversibly committed itself to meeting a particular deadline)

The device has move a chunk of data through the wireless network at a precisely scheduled time. And that process coexists (at a higher priority than everything else) with a pile of other software and state machines that influence the behavior of the networking process.

It is asleep most of the time. If it did not printf it would work fine, but it has to printf. Interactions between different components in the system/network are complex enough that thorough logging inside the device simply is a requirement. And changing third party software is difficult for technical and nontechnical reasons - and simply won't happen unless all other avenues have been exhausted.
« Last Edit: January 19, 2025, 03:12:41 pm by incf »
 

Online incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #77 on: January 19, 2025, 03:17:35 pm »
What's the main source of slowness though? Is it formatting, or output?
Some answers address the former, some answers address the latter (e.g. _write override). OP, do you have numbers for both?
It's all formatting and a tiny bit of parsing/copying that occur as part of snprintf to a local exclusive buffer, The IO latency portion has been solved via lots of buffering.
« Last Edit: January 19, 2025, 03:21:28 pm by incf »
 

Online incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #78 on: January 19, 2025, 03:20:21 pm »
But as I said, I think you can implement a pretty efficient scanning of the format strings as all you need is to identify the type of arguments. You could make the scanning much faster if doing it by 32-bit chunks instead of byte by byte, but be aware that the Cortex M0 doesn't support unaligned access, so if the format string isn't 4-byte aligned, you'd have to scan up to 3 bytes first and then 32-bit per 32-bit. I think that's probably worth a shot.

I agree, it is worth a shot.

(I think most of the suggestions about parsing, buffering, va_list/va_arg, etc. are correct - I appreciate the sample implementations)
« Last Edit: January 19, 2025, 03:21:53 pm by incf »
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 21679
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #79 on: January 19, 2025, 03:43:04 pm »
Quote
Your system isn't hard realtime, and does have sufficient "excess" processing power. Doesn't it?

So, what's the application domain? I hope it is neither military nor healthcare.

Nothing special. Mostly proprietary wireless communication with a small amount of measurement, system control, etc. on networks of battery powered devices that talk to each other. A lot of logging to flash. Slow low power Cortex-M0 type processor. Realtime scheduling of network communications plus a bit of external device control, and crunching of data. (it is not capable of dealing with missed deadlines that occur due events like printfs that occur after it has decided that it is going to do something - unfortunately, it absolutely has to decide and commit to the network that it is going to do something before it actually does anything loggable, which means everything important must occur after it has irreversibly committed itself to meeting a particular deadline)

The device has move a chunk of data through the wireless network at a precisely scheduled time. And that process coexists (at a higher priority than everything else) with a pile of other software and state machines that influence the behavior of the networking process.

When I've designed such systems, any one of those constraints would be sufficient for me to completely avoid printf! It isn't difficult; just use puts() and putchar() of hex numbers. Let the log consumer format them as decimal digits or pixels on a graph or whatever.

Shame you can't use a cluebat on the library creator.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online cv007

  • Frequent Contributor
  • **
  • Posts: 878
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #80 on: January 19, 2025, 04:52:38 pm »
I guess you already said in the first post its the printf code that is the problem. I would also assume you are running your M0+ as fast as possible, at least when it is needed.

I didn't see mentioned anywhere which printf library is in use, but if CM0+ maybe its newlib. If the itoa in the printf engine is easily replaced, maybe the number conversion code can be changed to simply always output hex (can add a leading 0x). It would shorten the conversion time as no divide library call would be used (CM0+), and the strings require little processing so can remain as-is.


An example I showed from previous thread, modified to allow for hex only formatting of numbers (and using x86 so online output can be used)-
https://godbolt.org/z/5vrEov3Yq
(end of main does nonumformat example, end of output shows result)

This example simply shows allowing the normal number conversion (in a print function) to bypass the normal number conversion (dec/bin/hex which will use divide) where it only uses bit shift code and a table lookup for the char. In this case the full 32bit hex value is always output and a +/- is added to signify if its signed/unsigned as the function takes in unsigned and uses a flag for negative values. Of course with your own code its easy enough to change as you see fit, where an existing printf library may not have much for available hooks into the process.

For a CM0+, eliminating the divide for each number conversion could be somewhat significant although testing would be needed to get some real values.
« Last Edit: January 19, 2025, 05:35:59 pm by cv007 »
 

Offline rhodges

  • Frequent Contributor
  • **
  • Posts: 358
  • Country: us
  • Available for embedded projects.
    • My public libraries, code samples, and projects for STM8.
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #81 on: January 19, 2025, 10:53:51 pm »
I don't know if this is helpful, but here is my STM32 M0 and M3 binary-decimal library. Just cut out the M3 code. The M0 code does not call divide.
Currently developing embedded RISC-V. Recently STM32 and STM8. All are excellent choices. Past includes 6809, Z80, 8086, PIC, MIPS, PNX1302, and some 8748 and 6805. Check out my public code on github. https://github.com/unfrozen
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #82 on: January 20, 2025, 01:20:56 am »
Just out of interest, does the output include formatting floating-point numbers?  The standard libraries (newlib and others) are extremely slow in formatting and parsing them.

For Cortex-M0 with a "slow" multiplication, even integer formatting is slow in the standard libraries.  (If you have a fast 32×32 multiply that returns the high 32 bits of the 64-bit result, you can divide by ten fast by multiplying by 3435973837 = 0xCCCCCCCD, and shifting the upper 32 bits right by 3 bits, to get the quotient in the high 32 bits.) Assuming ILP32 (aapcs32 on Cortex-M0), with 32-bit int and long, with long-long being 64-bit, repeated subtraction using decimal digits 1 and 3 can be much faster.  Here is an example implementation:
Code: [Select]
// SPDX-License-Identifier: CC0-1.0 (Public Domain)
// Author: Nominal Animal, 2025.
#include <stddef.h> // for NULL
#include <stdint.h>

static const uint32_t  decades_32bit[10][2] = {
    { UINT32_C(1),          UINT32_C(3) },
    { UINT32_C(10),         UINT32_C(30) },
    { UINT32_C(100),        UINT32_C(300) },
    { UINT32_C(1000),       UINT32_C(3000) },
    { UINT32_C(10000),      UINT32_C(30000) },
    { UINT32_C(100000),     UINT32_C(300000) },
    { UINT32_C(1000000),    UINT32_C(3000000) },
    { UINT32_C(10000000),   UINT32_C(30000000) },
    { UINT32_C(100000000),  UINT32_C(300000000) },
    { UINT32_C(1000000000), UINT32_C(3000000000) },
};

static const uint64_t  decades_64bit[11][2] = {
    { UINT64_C(1000000000),           UINT64_C(3000000000) },
    { UINT64_C(10000000000),          UINT64_C(30000000000) },
    { UINT64_C(100000000000),         UINT64_C(300000000000) },
    { UINT64_C(1000000000000),        UINT64_C(3000000000000) },
    { UINT64_C(10000000000000),       UINT64_C(30000000000000) },
    { UINT64_C(100000000000000),      UINT64_C(300000000000000) },
    { UINT64_C(1000000000000000),     UINT64_C(3000000000000000) },
    { UINT64_C(10000000000000000),    UINT64_C(30000000000000000) },
    { UINT64_C(100000000000000000),   UINT64_C(300000000000000000) },
    { UINT64_C(1000000000000000000),  UINT64_C(3000000000000000000) },
    { UINT64_C(10000000000000000000), UINT64_C(                   0) },
};

// Internal 32-bit unsigned integer conversion routine.
static char *do_append_u32(char *buf, char *const end, uint32_t val) {

    // Count the number of decimal digits.
    int_fast8_t  n = 0;
    while (val >= decades_32bit[n+1][0])
        if (++n >= 9)
            break;

    // Verify sufficient room in buffer.
    if (buf + n > end)
        return NULL;

    // Convert to decimal digits via repeated subtraction.
    do {
        char  digit = '0';

        while (val >= decades_32bit[n][1]) {
            val    -= decades_32bit[n][1];
            digit  += 3;
        }
        while (val >= decades_32bit[n][0]) {
            val    -= decades_32bit[n][0];
            digit  += 1;
        }

        *(buf++) = digit;
    } while (n-->0);

    *buf = '\0';
    return buf;
}

// Internal 64-bit unsigned integer conversion routine.
static char *do_append_u64(char *buf, char *const end, uint64_t val) {

    // If fits in 32 bits, treat as 32-bit.
    if ((uint64_t)(uint32_t)(val) == val)
        return do_append_u32(buf, end, (uint32_t)val);

    // Above test ensures val >= decades_64bit[0][0].
    int_fast8_t  n = 0;
    while (val >= decades_64bit[n+1][0])
        if (++n >= 10)
            break;

    // Verify sufficient room in buffer.
    if (buf + n + 9 > end)
        return NULL;

    // The first decimal digit of 2^64-1 is 1, so we need to treat it specially.
    if (n == 10) {
        char  digit = '0';

        while (val >= decades_64bit[10][0]) {
            val    -= decades_64bit[10][0];
            digit++;
        }

        *(buf++) = digit;
        n--;
    }
    do {
        char  digit = '0';

        while (val >= decades_64bit[n][1]) {
            val    -= decades_64bit[n][1];
            digit  += 3;
        }
        while (val >= decades_64bit[n][0]) {
            val    -= decades_64bit[n][0];
            digit  += 1;
        }

        *(buf++) = digit;
    } while (n-->0);

    // Add the nine 32-bit digits
    uint32_t  v32 = (uint32_t)val;
    n = 8;
    do {
        char  digit = '0';

        while (v32 >= decades_32bit[n][1]) {
            v32    -= decades_32bit[n][1];
            digit  += 3;
        }
        while (v32 >= decades_32bit[n][0]) {
            v32    -= decades_32bit[n][0];
            digit  += 1;
        }

        *(buf++) = digit;
    } while (n-->0);

    *buf = '\0';
    return buf;
}

// Convert an unsigned 32-bit integer (%u) to decimal string,
// and store to buf.  Will not write past end (but may write nul to *end).
// Returns a pointer to the string-terminating nul byte.
char *append_u32(char *buf, char *const end, uint32_t val) {

    // Abort if no buffer, or if buffer full.
    if (!buf || buf >= end)
        return NULL;

    return do_append_u32(buf, end, val);
}

// Convert a signed 32-bit integer (%d) to decimal string,
// and store to buf.  Will not write past end (but may write nul to *end).
// Returns a pointer to the string-terminating nul byte.
char *append_i32(char *buf, char *const end, int32_t val) {

    if (val < 0) {
        // Abort if no buffer, or if buffer full.
        if (!buf || buf + 1 >= end)
            return NULL;

        // Prepend negative sign, negate, and treat as unsigned.
        *buf = '-';
        return do_append_u32(buf + 1, end, (uint32_t)(-val));
    } else {
        if (!buf || buf >= end)
            return NULL;

        // Nonnegative, so treat as unsigned.
        return do_append_u32(buf, end, val);
    }
}

// Convert an unsigned 64-bit integer (%llu) to decimal string,
// and store to buf.  Will not write past end, but may write nul to *end.
// Returns a pointer to the string-terminating nul byte.
char *append_u64(char *buf, char *const end, uint64_t val) {

    // Abort if no buffer, or if buffer full.
    if (!buf || buf >= end)
        return NULL;

    return do_append_u64(buf, end, val);
}

// Convert a signed 64-bit integer (%lld) to decimal string,
// and store to buf.  Will not write past end, but may write nul to *end.
// Returns a pointer to the string-terminating nul byte.
char *append_i64(char *buf, char *const end, int64_t val) {

    if (val < 0) {
        // Abort if no buffer, or if buffer full.
        if (!buf || buf + 1 >= end)
            return NULL;

        // Prepend negative sign, negate, and treat as unsigned.
        *buf = '-';
        return do_append_u64(buf + 1, end, (uint64_t)(-val));

    } else {
        // Abort if no buffer, or if buffer full.
        if (!buf || buf >= end)
            return NULL;

        // Nonnegative, so treat as unsigned.
        return do_append_u64(buf, end, val);
    }
}
The idea in the end = append_type(dest, last, value); interface is to efficiently append the decimal value to the buffer.  When the value does not fit, it will return NULL (and append_type(NULL,...) is safe and will also return NULL).  You can always call the function with dest pointing to the next free character in your output buffer, and last pointing to the last character in that buffer, and if the function returns non-NULL, it points to the next dest, otherwise it did not fit.

(That is, you can at any point safely try to append a new substring to your buffer.  If it returns NULL, it didn't modify more than the start character (which should be the first free character in the buffer anyway), and that only when the value is negative; you can easily move the sign to after the digits have been filled, to ensure no modification is done.  You can also remove all the 64-bit stuff, if you don't use long long, uint64_t, int64_t, uintmax_t, or intmax_t types.)

Each digit requires 2.1 iterations on average (0 1 2 1 2 3 2 3 4 3, to be exact, per possible decimal digit), with each iteration consisting of one subtraction and one addition.  There are no multiplications or divisions at all (except for calculating the look-up array addresses, which are bit shifts), so this approach is suitable for slow-multiplication architectures in general, including 8-bitters (although they can benefit from adding 8- and 16-bit converters also).

This is not magic, though.  On x86-64 with fast multiplication, even the standard snprintf() is about 2.4× faster in converting 64-bit unsigned numbers (because it has a fast hardware 64×64=128-bit integer multiplication, so that (x/10) is implemented as (x*0xCCCCCCCCCCCCCCCD)>>67).  On the other hand, for 32-bit unsigned integers (append_u32() and append_i32() versus snprintf("%u") and snprintf("%d")), standard snprintf() is about 2.6× slower only about 1.5× faster (edited due to my laptop frequency scaling skewed the original results).  (These are on a microbenchmark covering all 32-bit unsigned integers uniformly randomly; if you typically print a smaller range of values, expect different results.)
« Last Edit: January 20, 2025, 02:48:48 am by Nominal Animal »
 
The following users thanked this post: SiliconWizard, incf

Online incfTopic starter

  • Regular Contributor
  • *
  • Posts: 154
  • Country: us
  • ASCII > UTF8
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #83 on: January 20, 2025, 02:04:41 am »
I believe it mostly fakes floating point by using split fixed integer representation. One integer for the whole number, and another for the decimal places.

While the library authors were ill-advised on the use of printf, they appear to have had enough sense to stay away from floating point in the places that I have looked.

We have fast truncated multiply (32 bit results), but my gut feeling is that printf may not actually use it based off of the performance that I'm seeing (have not measured, nor disassembled libc).

Multiplication as division is an interesting trick. I could easily see myself getting really into optimizing those routines. Although, I'm pretty sure that even the fastest formatting routine won't beat buffering the printf arguments in terms of speed. I am not looking forward to writing the FIFO for lists of variable sized strings and print arguments, but I think I am obligated to take that path since it is likely much faster. (also, it will finally allow me to do stuff like append an accurate timestamp to each printf, which will be useful)

I imagine on cortex M0, that a fairly standard print statement with about 100 characters of text and 9 integers (7 decimal and 2 hex, lots of 6 to 9 digit numbers), that it probably takes thousands of instruction cycles to run the conversions.

...
Each digit requires 2.1 iterations on average (0 1 2 1 2 3 2 3 4 3, to be exact, per possible decimal digit), with each iteration consisting of one subtraction and one addition.  There are no multiplications or divisions at all (except for calculating the look-up array addresses, which are bit shifts), so this approach is suitable for slow-multiplication architectures in general, including 8-bitters (although they can benefit from adding 8- and 16-bit converters also).
...

Whoa... that is faster than I expected.
« Last Edit: January 20, 2025, 11:40:30 am by incf »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #84 on: January 20, 2025, 02:46:13 am »
I do have one Cortex-M0+ (Teensy LC, NXP Kinetis KL26 family, NXP MKL26Z64) I believe I could test the performance on, but it'll likely be somewhat fewer cycles than Cortex-M0 (as M0+ has a two-stage pipeline, M0 a three-stage one).  The newlib/newlib-nano (for comparison to snprintf()) I have for it is version 2.4.0, released in March 2016, so I'm unsure of how useful those results would be...

Formatting floats as two integers, one on each side of the decimal point, is a good approach.  One truncates the value to get the integer part, then subtracts it from the original value to get the fractional part, multiplies by the power of ten matching the desired number of decimals (1..9 – these are all exact for even 32-bit floats), takes the absolute value, and finally rounds to an integer.  If the result is or exceeds the power of ten, you subtract the power of ten from the result, and increment the magnitude of the integer part by one.  Simples!  Of course, in your case, storing the original double or float would be even faster.
« Last Edit: January 20, 2025, 02:52:59 am by Nominal Animal »
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4072
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #85 on: January 20, 2025, 04:22:01 am »
If there are no float/double types and also no long long / (u)int64_t or other large types then the format string parsing becomes almost trivial: all arguments are exactly 32 bit, so you just need to count the number of them and extract that many 32 bit values with va_arg and store them in a buffer. 
 

Online cv007

  • Frequent Contributor
  • **
  • Posts: 878
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #86 on: January 20, 2025, 04:54:25 am »
Quote
One truncates the value to get the integer part...
This assumes you are starting with a float/double to begin with, which is probably already being avoided because its a CM0+.

Which means there could also be division in the setup to printf calls, and not much one can do with them other than hope they are not frequently used-
printf( "someval: %d.%u\n", someval/100, __builtin_abs(someval%100) ); //someval is x100

I would get a list of the formatting strings inside the library and see how many formatting options you will have to deal with.
« Last Edit: January 20, 2025, 08:43:14 am by cv007 »
 

Offline 5U4GB

  • Frequent Contributor
  • **
  • Posts: 678
  • Country: au
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #87 on: January 20, 2025, 06:26:12 am »
Unfortunately, It turns out that printf is so slow that it causes the system to fail in a variety of subtle yet completely catastrophic ways. (due to it missing real time deadlines and not being able to detect that it has missed them)

Has anyone dealt with this before?=

Been there, done that.  The solution I used was to write an emulation of printf that only processed the small number of formatting strings the software needed in as fast/minimal a way as possible (use 'strings | grep %' or similar on the binary to find them).  It couldn't do anything other than the exact formats the calling code used, but it could do each of those damn fast.

If there's only a small number of formatting strings you could go even further and memcpy over pre-generated output text with spaces for %d's and whatever, then drop the numeric values into the appropriate locations.  You can also skip non-essential output text, so just turn the printf into a nop if it's producing too much unnecessary output.
« Last Edit: January 20, 2025, 06:40:02 am by 5U4GB »
 

Offline Analog Kid

  • Super Contributor
  • ***
  • Posts: 1558
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #88 on: January 20, 2025, 06:37:51 am »
Which is exactly what I advised several posts above yours.
 

Offline 5U4GB

  • Frequent Contributor
  • **
  • Posts: 678
  • Country: au
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #89 on: January 20, 2025, 07:12:23 am »
That's the first time I have heard something '...formally verified to be "correct"...' since the 80s. Since you mention "100kLOC" and "complicated", I presume the weasel word is "correct"; maybe it is a meaning unfamiliar to me.

It could be something that has to meet functional safety requirements and where the code is auto-generated by some tool that ensures this.  For example there's a form of self-checking code used in some European rail control systems whose name I'm blanking on that I doubt could be written directly by a human, and for which the auto-generation would lead to a huge size expansion, so the 100kLOC could have started as a few kLOC.

OTOH I can't imagine they'd run the result on a Cortex M0.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #90 on: January 20, 2025, 10:03:51 pm »
We have fast truncated multiply (32 bit results)
For 32-bit values 0 .. 81919, inclusive, one can divide v by ten and get the remainder, next decimal digit, by multiplying by 0xCCCD and then shifting right by 19 bits.  This is equivalent to multiplying by 52429/524288 = 0.1000003814697265625.  One can extend the range to 0 .. 262148, inclusive, by multiplying v by 13107, adding one quarter v to the product, and shifting the result right by 17 bits.  Whether these are faster than the repeated subtractions –– noting that each iteration includes a load from Flash, comparison, conditionally a subtraction and an addition, and a jump ––, is an open question.  You can see the Cortex-M0 generated code for these at godbolt.org/z/3osnbxdd7.

Also note that the remainder-after-divide by ten approach calculates the digits from right to left, whereas repeated subtraction from left to right.

Multiplication as division is an interesting trick. I could easily see myself getting really into optimizing those routines.
The math behind this:
$$\left\lfloor \frac{x}{d} \right\rfloor = \left\lfloor \frac{x \cdot m}{2^n} \right\rfloor \quad \text{or} \quad \left\lfloor \frac{p(x)}{2^n} \right\rfloor$$
where \$p(x)\$ is a simple function of \$x\$ containing only multiplications, additions, and integer divisions by a power of two (right shifts), and \$\lfloor \dots \rfloor\$ is the truncation operation, rounding towards zero.  If
$$\frac{m}{2^n} = \frac{1}{d}$$
then this is exact, of course.  Typically you'll want to use \$m \gt 2^n / d\$.  For some ranges of \$x\$ (\$0 \dots N\$), \$p(x) = x \cdot m + C\$ (with \$0 \lt C \lt m\$) suffices.

Division by ten is annoying because it cannot be represented in binary exactly: 0.00011001100... in binary, i.e.
$$\frac{1}{10} = \frac{1}{2 \cdot 5} = \frac{1}{16}+\frac{1}{32}+\frac{1}{256}+\frac{1}{512}+\dots = \sum_{n=0}^\infty \left( \frac{3}{2^{4n + 5}} \right)$$
Fortunately, we don't have to get it exact, only so that it truncates to the correct integer result.
« Last Edit: January 20, 2025, 10:10:29 pm by Nominal Animal »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16098
  • Country: fr
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #91 on: January 20, 2025, 10:14:28 pm »
Yep.

On targets that have a slow hardware multiplication (or none at all, so even slower), the approach of iterating over digits as you showed earlier (and as I remember we discussed in an older thread) with precomputed powers of ten is often much faster. I've used that on 8-bit MCUs and that sped up conversion by a factor of at least 10.
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4502
  • Country: gb
  • Doing electronics since the 1960s...
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #92 on: January 20, 2025, 11:18:44 pm »
To go back a few steps, in coding since before 1980 I have never had a real use for a complete printf. Sure; one uses it because it is convenient, but if there was ever a performance issue, the full range of templates was never ever needed. In practically all embedded applications one is outputting only one number format e.g. 123.45, or only integers, etc. and this can be done far more simply.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Analog Kid

  • Super Contributor
  • ***
  • Posts: 1558
  • Country: us
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #93 on: January 20, 2025, 11:31:51 pm »
To go back a few steps, in coding since before 1980 I have never had a real use for a complete printf. Sure; one uses it because it is convenient, but if there was ever a performance issue, the full range of templates was never ever needed. In practically all embedded applications one is outputting only one number format e.g. 123.45, or only integers, etc. and this can be done far more simply.

Well, there's code size due to handling all those different formats (which isn't an issue here), and then there's execution speed. My guess©® is that the conversion code for the more commonly-used formats (integers, maybe even floats) might be pretty good for the stock library printf().

But I don't know for sure.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16098
  • Country: fr
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #94 on: January 20, 2025, 11:37:27 pm »
We use it because it's convenient, but it's not efficient for sure. The fact that printf is based on analyzing format and parameters at run time kind of bites. The only objective benefit of doing this at run time would be to use formats defined at run time, and while that's something you're allowed to do, it's usually not recommended - pretty dangerous. So with fixed (at compile time) formats, this is merely because of language limitations.

Variadic functions in C are themselves more a run time "trick" than a consistent language feature.
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4502
  • Country: gb
  • Doing electronics since the 1960s...
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #95 on: January 20, 2025, 11:50:14 pm »
The way in which avoiding printf speeds things up is not by taking printf code and stripping it down. Indeed; the execution path of say a %u is fairly short. Most of the time will be spent in the eventual call to itoa or ltoa etc.

The thing which saves exec time is knowing the range of values you will be outputting and writing code to directly implement that. Let's say you know the value of something is from 0.0 to 1000.0 and you are holding it in a 16 bit uint variable, where 1 bit is 0.1 (so the 16 bit value actually holds 0 to 10000). Then you hard-code it so you output it to a 6 byte array (plus a byte for any training 0x00, etc) and then when done you copy the value in element 5 to element 6 and replace element 5 with a '.'. You get the idea...

Assembler programmers always did this kind of thing naturally :)

Many years ago I was outputting HPGL values and managed to speed up the code by about 100x by rewriting the crappy IAR C in assembler, although to be fair one could have got at least a 10x speedup when doing it all in C. This was a 4MHz Z80. In some cases, notably sscanf, one could get a 1000x speedup by this sort of hard-coding.

Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7441
  • Country: fi
    • My home page and email address
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #96 on: January 21, 2025, 01:15:54 am »
Yup, fixed-point arithmetic is often more efficient than floating-point arithmetic, when you don't have hardware floating-point support.

The fractional part can be decimal or binary, too.  That is, in general terms fixed-point arithmetic means represent a real value v using integer q via v=q/R (or equivalently q=v*R), where R is a fixed positive integer value: the base or radix.

When R=10k, the "division" is avoided by simply converting integer q to decimal, and inserting the decimal point before the k least significant decimal digits.  (Wikipedia says this is called decimal scaling.)

When R=2k, the division is a right-shift.  Sign is handled separately, so we only consider positive integers and zero.  (Wikipedia says this is called binary scaling.)  The integer part is obviously q>>k.  The fractional decimal digits we do one by one, starting with just f containing the k least significant bits of q, i.e. M=(1<<k)-1, and f=q&M, followed by iterating
    t = f * 10
    d = t >> k
    f = t & M
once per digit, starting at the digit following the decimal point, where t is a temporary variable that must be able to hold the product, and d is the next digit, between 0 and 9, inclusive.  This can be repeated for as many digits as is desired, or you can stop when f becomes zero –– it will, at some point.
To apply rounding, you compare f to R/2.  If equal, you have a tie, and you need to decide how to break it.  (The common ones are upwards, and towards even.)  If larger, you increment the previous digit, rippling the overflow left, noting that this can lead to incrementing the integer part by 1 (as the fractional part rounds to 0.000...).  If smaller, the result is correctly rounded already.  Thus, it often makes sense to convert the fractional part first, from the decimal point rightwards, and only then the integral part, from the decimal point leftwards.

On an architecture where you have only a truncating 32×32=32-bit multiplication like Cortex-M0, you can do up to 28 fractional bits, while only requiring one 32-bit multiplication, one k-bit right shift, and one 32-bit AND, per fractional decimal digit.  I find it very funny that it is the integer part that takes more work to convert to decimal.
« Last Edit: January 21, 2025, 01:17:44 am by Nominal Animal »
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4502
  • Country: gb
  • Doing electronics since the 1960s...
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #97 on: January 21, 2025, 09:02:49 am »
Apollo used 24 or 32 bit fixed point, so it has to be good :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 21679
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #98 on: January 21, 2025, 09:57:23 am »
Apollo used 24 or 32 bit fixed point, so it has to be good :)

I'll ignore the ":)"

They had people who knew how to do numerical analysis, which is not a common skill.

A trivial example is af you have two numbers, X and Y, each of which is accurate to 1%, what is the accuracy of X+Y, X-Y, X*Y, X/Y ? Too many people answer 2%, or 1%.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 9528
  • Country: fi
Re: Saving "printf" arguments for later? Real-time on slow processor.
« Reply #99 on: January 21, 2025, 10:09:50 am »
Yeah, with any important project, analysis of input ranges and how computational accuracy affects the output is a must. Usually it's not even rocket science so I suggest trying to do it even in non-critical projects. As a side effect of knowing how to analyze your ranges, efficient fixed point number formats become available. (Or turning this around: with fixed point representations you need to think about what is the largest number that needs to be presented, for every step, given worst-case input parameters; which is extra work. But it's really not a bad idea to do that anyway, in which case fixed point is not any more "extra work" after all.)

The rationale for floating point really is convenience; let the machine handle automatically finding the balance between resolution of the smallest step vs. capability of store the largest possible number so that the programmer "doesn't need to think about accuracy and ranges at all". The downside is either a more complex hardware, or slow software implementation, and some bits of "overhead" (like, if you know your ranges and can fix them without extra safety margin, 24-bit integer gives you similar resolution to a 32-bit float).

Another downside of floating point is that in some rare cases programmer should have been thinking about accuracy and ranges after all. Especially the single-precision float gives false sense of security; always use doubles (which is clearly why e.g. C language tends to do that by default) when you don't want to think.
« Last Edit: January 21, 2025, 10:27:26 am by Siwastaja »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf