Author Topic: Best thread-safe "printf", and why does printf need the heap for %f etc? (Read 16247 times)

peter-h · « **on:** July 28, 2022, 07:19:15 am »

Context:
https://www.eevblog.com/forum/programming/a-question-on-mutexes-normal-v-recursive-and-printf/msg4324318/#msg4324318

I am looking for a "full standard" implementation.

I fail to understand this. For decades, C compilers for embedded work came with a printf lib and that never used the heap. Even those which used IEEE standard floats (which were several times slower than those which used proprietary float representation) didn't need the heap, or lots of RAM.

But this code from 1990, which appears to be a full implementation, uses not only mutexes to make itself thread-safe but also uses the heap
https://sourceware.org/git?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdio/vfprintf.c;h=6a198e2c657e8cf44b720c8bec76b1121921a42d;hb=HEAD#l866

I read somewhere that this bloat is the result of a special implementation which avoids stuff like 1+1=1.99999999 (well that's assuming 2 does have an exact representation in IEEE floats; I don't know enough about this). But that's pointless; with floats the whole idea of == disappears.

I replaced the above with https://github.com/MaJerle/lwprintf and https://github.com/eyalroz/printf has also been recommended.

rstofer · « **Reply #1 on:** August 12, 2022, 02:58:55 pm »

Quote from: peter-h on July 28, 2022, 07:19:15 am

I read somewhere that this bloat is the result of a special implementation which avoids stuff like 1+1=1.99999999 (well that's assuming 2 does have an exact representation in IEEE floats; I don't know enough about this). But that's pointless; with floats the whole idea of == disappears.

The idea that floats are seldom equal and almost never exactly 0.0... is the reason that Fortran has an arithmetic IF statement with 3 targets: Less than, equal to, greater than. If I want to branch to a statement on >= then I write something like IF (expression) 10, 20, 20 I wind up at statement 20 if the expression is >= 0.0 else statement 10.

http://www-linac.kek.jp/cont/langinfo/web/Fortran/docs/lrm/lrm0128.htm

FWIW, this was in the first release of FORTRAN back in '57 so this isn't a new issue.

We have the block IF statement beginning around Fortran 77 and the early version, though still acceptable, is likely seldom used because programmers have walked away from statement numbers.

http://ponce.sdsu.edu/fortranbook04.html

eutectique · « **Reply #2 on:** August 12, 2022, 03:35:35 pm »

Quote from: peter-h on July 28, 2022, 07:19:15 am

But that's pointless; with floats the whole idea of == disappears.

This is the reason why all these epsilon constants exist in float.h:

Code: [Select]

#undef FLT_EPSILON
#undef DBL_EPSILON
#undef LDBL_EPSILON
#define FLT_EPSILON     __FLT_EPSILON__
#define DBL_EPSILON     __DBL_EPSILON__
#define LDBL_EPSILON    __LDBL_EPSILON__

The correct way of comparing for equality would be

Code: [Select]

bool relativelyEqual(float a, float b)
{
    float maxRelativeDiff = FLT_EPSILON;
    float difference = fabs(a - b);

    // Scale to the largest value.
    a = fabs(a);
    b = fabs(b);
    float scaledEpsilon = maxRelativeDiff * max(a, b);

    return difference <= scaledEpsilon;
}

ejeffrey · « **Reply #3 on:** August 12, 2022, 08:47:52 pm »

My guess is that this is because multi threaded applications often have tiny tiny stacks. Using dozens of bytes of stack space for automatic buffers could be a problem. Especially if you expect users to rarely if ever actually request maximum precision.

brucehoult · « **Reply #4 on:** August 12, 2022, 10:38:37 pm »

printf() doesn't need to use the heap.

Some printf()s do anyway, but they don't *need* to.

Steele & White's 1990 paper How to Print Floating-Point Numbers Accurately says that you need arbitrary-sized bignums to do the job properly. I proved in 2006 that you don't need to, that in fact the size is bounded by, IIRC, something like 144 bytes, and you need several variables this size. I had correspondence with a local university HoD who put me in touch with Steele, who agreed with my analysis. I expect I still have the correspondence somewhere. However my boss forbade me to publish, so the only implementation was on our Java-to-ARM-native (on BREW) compiler & runtime library "AlcheMo".

The product was obsoleted by iPhone and later Android, and the company is long gone, so I should probably resurrect that analysis and code, if no one else has rediscovered in the meantime.

But, anyway, you don't need heap. You can allocate the variables on the stack, or statically if you don't care about multithreading or using that space for something else the rest of the time.

The goal of this algorithm, by the way, is that you can print any floating point number to a decimal string, and then convert the string back to a floating point number, and it will have the EXACT SAME bits as the original one, every time. And also produce in every case the shortest string for which this is true.

Nominal Animal · « **Reply #5 on:** August 13, 2022, 07:44:57 am »

Quote from: brucehoult on August 12, 2022, 10:38:37 pm

I proved in 2006 that you don't need to, that in fact the size is bounded by, IIRC, something like 144 bytes, and you need several variables this size. I had correspondence with a local university HoD who put me in touch with Steele, who agreed with my analysis.
[...]
The goal of this algorithm, by the way, is that you can print any floating point number to a decimal string, and then convert the string back to a floating point number, and it will have the EXACT SAME bits as the original one, every time. And also produce in every case the shortest string for which this is true.

Interesting!

I hope you can publish at minimum an outline, because it sounds extremely useful and interesting.

I've investigated the opposite case, parsing decimal number strings, using bignums with per-limb radix for the fractional part. For example, to exactly parse 32 fractional bits (say, Q31.32) on an 32-bit architecture, I would use five 32-bit limbs, with radixes 2¹⁶, 2¹⁷, 5¹¹, 5¹¹, and 5¹¹. 0.5 ULP = 2^-33 = {0,1,0,0,0}, 10^-33 = {0,0,0,0,1}, and for example 0.1 = {6553,78643,9765625,0,0}.
The idea is to be able to describe all nonzero decimal digits in the decimal representation of 0.5 ULP, exactly. In the case of 32 fractional bits, 2^-33×10³³ = 116415321826934814453125 = 5³³, exactly; and it is easiest to represent on an 32-bit architecture using three limbs of radix 5¹¹ each, since 5¹¹ ≃ 0.7×2²⁶, leaving enough room for decimal carry/overflow handling in each limb.
Round towards zero mode uses the first and all but the least significant bit in the second limb. Round halfway up mode examines the LSB in the second limb, and if it is set, increments the second limb by one; then uses the first and all but the least significant bit in the second limb. Third to fifth limbs are only needed to obtain exact representation before rounding.
Adding two such values together means each limb can only carry by one, so comparison-subtraction-increment rippling from the least significant limb to most significant limb suffices. Adding a decimal digit involves multiply-by-up-to-ten-and-add, so those can overflow by up to 19×radix; but since the radix is a constant for each limb, that can be handled either by a loop or by division-by-radix-with-remainder.
These conversions are also exact, and match e.g. IEEE 754 rules.

Back to topic at hand, converting floating-point or fixed-point numbers to decimal representation:

The fractional part is often implemented using a multiply-by-10ⁿ, then extracting and subtracting the n decimal digits from the integer part. For example, 1/8 = 0.125, using n=1:
10×1/8 = 10/8 = 1 + 2/8, yielding '1'
10×2/8 = 20/8 = 2 + 4/8, yielding '2'
10×4/8 = 40/8 = 5 + 0/8, yielding '5'
Any further iterations yield '0'
The "problem" is that if you are to output a specific number of decimal digits, and you end up with remainder that is at least one half (say, if printing 1/8 with just two decimals), you will need to increment the decimal result by one, rippling it possibly up to the previously yielded digits, including into whole decimals.
As an example, consider printing 255/256 using just two decimals. The first yields '9' with remainder 246, the second '9' with remainder 156. If we are to round halfway up, 156/256 ≥ 1/2, thus we need to increment the previous digit. That turns it into '0', with the need to increment the one before that. That too turns '0', leading to the need to increment the integer part by one too.

It is not a problem at all, if you can convert the fractional part into a string buffer, and do this before constructing the integer part.

To speed up conversion when the exponent of the floating-point mantissa is small, one can simply use larger n. Technically, we should have a mantissa with up to 1024 bits in it (for double precision), because each multiplication by 10ⁿ is a shift left by n bits, and a multiplication by 5ⁿ, increasing the width of the nonzero part of the mantissa by almost 2.322n bits. (It also shifts the mantissa up by one bit, and thus eliminates about 3.322n intervening bits.)

For example, if you start with a 53-bit mantissa, and there are say 44 zero bits between the MSB of the mantissa and the decimal point (so the LSB of the mantissa has value 2^-66), you can do n=13: multiply the 53-bit mantissa by 5¹³ = 1220703125, a 31-bit unsigned number. Your mantissa is now 84 bits wide, and since we also do a logical shift left by n=13 bits, this gets rid of 44 intervening zero bits in one step. Each such step adds 31 bits to the mantissa, getting rid of 44 intervening zero bits. For double precision, there can be up to 1023 of them, so 23 rounds of this yields an exact mantissa of 766 bits, producing the 299 decimal zero digits to the output buffer.

If one uses a straightforward buffer of 1024 fractional bits, then one can track where the least and most significant bits set are, and only operate on the related limbs, implementing at least a couple of different n steps (with n=1 mandatory).
Another option is to just convert the fractional part to radix-10ⁿ limbs, and discard/ignore the zero ones (as they represent n consecutive zeroes in the decimal representation).

Also, if you use floating point types to implement this, do note that the obtained representation is not exact when the exponent is negative enough; i.e. when there are zero bits between the top of the mantissa and the decimal point. This is because each multiply-by-10ⁿ generates a result that is 2.322n bits wider than the original mantissa, and when using floating point, you'll discard those extra bits.

Of course, none of this is as smart as what Bruce described, because this only generates the desired number of decimals, and does not know how many suffice. (And an implementation based on this would treat "%f" as "%.6f".) But it does kinda-sorta prove that if you don't mind being somewhat slow, you can definitely do it for double precision floating-point with something like 256 bytes of RAM plus a char array large enough to hold the entire result.
For single precision (AKA float), something like 40 bytes of RAM plus a char array large enough should suffice.
The more ROM/Flash you can spare for power of ten tables and similar, the faster you can make it.

I've elaborated somewhat on related ideas in this thread I started a year ago; some of you might find it interesting. My key point in that is that the *printf() interface in C is actually not that well suited for many tasks, and I've been investigating alternatives, especially those useful for systems programming and memory-constrained embedded development. In particular, I've found that I usually do have a buffer I want to put the result into, and snprintf() makes that annoying/difficult, because I have to very carefully track the fill point myself. For thread-safety and re-entrancy, any temporary memory needed should be provided by the caller, which also means the requirements must be known at compile time.

DiTBho · « **Reply #6 on:** August 13, 2022, 08:58:47 am »

With myC "printf" is banned.
You must not use it!

do you want to show a uint32 item? uint32_show(uint32_t item, string_t format)
do you want to show a uin16 item? uin16_show(uin16_t item, string_t format)
do you want to show a uint8 item? uint8_show(uint8_t item, string_t format)
do you want to show a sint32 item? sint32_show(sint32_t item, string_t format)
do you want to show a sin16 item? sin16_show(sin16_t item, string_t format)
do you want to show a sint8 item? sint8_show(sint8_t item, string_t format)
do you want to show a string item? string_show(string_t item, string_t format)
do you want to show a char item? char_show(char_t item, string_t format)
do you want to show a boolean item? boolean_show(boolean_t item, string_t format)
do you want to show a fp32 item? fp32_show(fp32_t item, string_t format)
do you want to show a fp64 item? fp64_show(fp64_t item, string_t format)
do you want to show a fx1616 item? fx1616_show(fx1616_t item, string_t format)
do you want to show a fx32 item? fx32_show(fx32_t item, string_t format)
do you want to show a cplx_fp32 item? cplx_fp32_show(cplx_fp32_t item, string_t format)
do you want to show a cplx_fp64 item? cplx_fp64_show(cplx_fp64_t item, string_t format)
do you want to show a cplx_fx1616 item? cplx_fx1616_show(cplx_fx1616_t item, string_t format)
do you want to show a cplx_fx32 item? cplx_fx32_show(cplx_fx32_t item, string_t format)
do you want to show a p_uint32 item? p_uint32_show(p_uint32_t item, string_t format)
do you want to show a p_uin16 item? p_uin16_show(p_uin16_t item, string_t format)
do you want to show a p_uint8 item? p_uint8_show(p_uint8_t item, string_t format)
do you want to show a p_sint32 item? p_sint32_show(p_sint32_t item, string_t format)
do you want to show a p_sin16 item? p_sin16_show(p_sin16_t item, string_t format)
do you want to show a p_sint8 item? p_sint8_show(p_sint8_t item, string_t format)
do you want to show a p_string item? p_string_show(p_string_t item, string_t format)
do you want to show a p_char item? p_char_show(p_char_t item, string_t format)
do you want to show a p_boolean item? p_boolean_show(p_boolean_t item, string_t format)
do you want to show a p_fp32 item? p_fp32_show(p_fp32_t item, string_t format)
do you want to show a p_fp64 item? p_fp64_show(p_fp64_t item, string_t format)
do you want to show a p_fx1616 item? p_fx1616_show(p_fx1616_t item, string_t format)
do you want to show a p_fx32 item? p_fx32_show(p_fx32_t item, string_t format)
do you want to show a p_cplx_fp32 item? p_cplx_fp32_show(p_cplx_fp32_t item, string_t format)
do you want to show a p_cplx_fp64 item? p_cplx_fp64_show(p_cplx_fp64_t item, string_t format)
do you want to show a p_cplx_fx1616 item? p_cplx_fx1616_show(p_cplx_fx1616_t item, string_t format)
do you want to show a p_cplx_fx32 item? p_cplx_fx32_show(p_cplx_fx32_t item, string_t format)
do you want to show a p_pointer32 item? p_pointer32_show(p_pointer32_t item, string_t format)
do you want to show a p_pointer64 item? p_pointer64_show(p_pointer64_t item, string_t format)
do you want to show a p_function item? p_function_show(p_function_t item, string_t format)

Code: [Select]

void item_show
(
    autorefrence item,
    string_t format
)
{
    type_t type;

    type=typeof(item);
    switch(type)
    {
         case uint32
         {
              uint32_show(item, format)
         }
         case uin16
         {
              uin16_show(item, format)
         }
         case uint8
         {
              uint8_show(item, format)
         }
         case sint32
         {
              sint32_show(item, format)
         }
         case sin16
         {
              sin16_show(item, format)
         }
         case sint8
         {
              sint8_show(item, format)
         }
         case string
         {
              string_show(item, format)
         }
         case char
         {
              char_show(item, format)
         }
         case boolean
         {
              boolean_show(item, format)
         }
         case fp32
         {
              fp32_show(item, format)
         }
         case fp64
         {
              fp64_show(item, format)
         }
         case fx1616
         {
              fx1616_show(item, format)
         }
         case fx32
         {
              fx32_show(item, format)
         }
         case cplx_fp32
         {
              cplx_fp32_show(item, format)
         }
         case cplx_fp64
         {
              cplx_fp64_show(item, format)
         }
         case cplx_fx1616
         {
              cplx_fx1616_show(item, format)
         }
         case cplx_fx32
         {
              cplx_fx32_show(item, format)
         }
         case p_uint32
         {
              p_uint32_show(item, format)
         }
         case p_uin16
         {
              p_uin16_show(item, format)
         }
         case p_uint8
         {
              p_uint8_show(item, format)
         }
         case p_sint32
         {
              p_sint32_show(item, format)
         }
         case p_sin16
         {
              p_sin16_show(item, format)
         }
         case p_sint8
         {
              p_sint8_show(item, format)
         }
         case p_string
         {
              p_string_show(item, format)
         }
         case p_char
         {
              p_char_show(item, format)
         }
         case p_boolean
         {
              p_boolean_show(item, format)
         }
         case p_fp32
         {
              p_fp32_show(item, format)
         }
         case p_fp64
         {
              p_fp64_show(item, format)
         }
         case p_fx1616
         {
              p_fx1616_show(item, format)
         }
         case p_fx32
         {
              p_fx32_show(item, format)
         }
         case p_cplx_fp32
         {
              p_cplx_fp32_show(item, format)
         }
         case p_cplx_fp64
         {
              p_cplx_fp64_show(item, format)
         }
         case p_cplx_fx1616
         {
              p_cplx_fx1616_show(item, format)
         }
         case p_cplx_fx32
         {
              p_cplx_fx32_show(item, format)
         }
         case p_pointer32
         {
              p_pointer32_show(item, format)
         }
         case p_pointer64
         {
              p_pointer64_show(item, format)
         }
         case p_function
         {
              p_function_show(item, format)
         }
         default
         {
              panic(module, id, "unknown type" );
         }
    }
}

show(whatever type, format), and you get the job done!

This is how a "data show" should be written!

brucehoult · « **Reply #7 on:** August 13, 2022, 11:17:09 am »

Quote from: Nominal Animal on August 13, 2022, 07:44:57 am

Of course, none of this is as smart as what Bruce described, because this only generates the desired number of decimals, and does not know how many suffice. (And an implementation based on this would treat "%f" as "%.6f".) But it does kinda-sorta prove that if you don't mind being somewhat slow, you can definitely do it for double precision floating-point with something like 256 bytes of RAM plus a char array large enough to hold the entire result.
For single precision (AKA float), something like 40 bytes of RAM plus a char array large enough should suffice.
The more ROM/Flash you can spare for power of ten tables and similar, the faster you can make it.

You're along very much the right lines.

The worst case is numbers with large exponents. For positive exponents, just convert the number to an explicit integer. For negative exponents (and small positive ones less than the significand size) the key is to multiply them by 10 enough times to turn them into an integer (and remember how many times it took). Digging deeeeeeep into memory, the upper limit on the number of bits needed in the bignum is the maximum exponent, plus twice the number of fraction bits (including the implied leading 1).

So for IEEE Double it is 1024 + 2*54 = 1132 bits, which rounded up to whole 32 bit or 64 bit words is 1152 bits or 144 bytes. Or maybe that 1024 should be 1023 or 1022. It doesn't affect the rounded answer :-)

For IEEE Float it is 128 + 2*24 = 176, which rounded up to whole 32 bit or 64 bit words is 192 bits or 24 bytes.

These are multiple precision integers, but as bignums go they aren't big. Especially for converting single precision it's probably not worth keeping track of the actual used length, especially if code size is of more concern than a small speed increment. Keeping output devices saturated is seldom a problem, at least with 32 bit machines.

Deciding how many digits to print is conceptually easy. Make two numbers, one 0.5 ULP less than the actual number, and one 0.5 ULP bigger and convert them to decimal in parallel. Stop outputting decimal digits when two digits differ. Of course you don't actually have to keep both numbers in full, but that's the conceptual level.

Converting from a string to binary is usually very simple. Most numbers in text don't use the full number of significant digits/bits, especially in program code. Just take all the digits and convert them to a binary integer. Convert this integer to an (always exact) floating point number, then do a floating point multiply or divide by the appropriate power of ten, either from a table, or calculated on the fly. Make sure you calculate the power of ten FIRST (which will be exact), and then a single multiply or divide with the significant digits, so you get only a single 0.5 ULP error.

If the integer from the decimal digits would exceed 2^23 (Float) or 2^53 (Double) then take as many as fit. In fact take one more digit than fits, then shift right until you have 24 or 54 bits, and remember the number of shifts as a final adjustment to the FP exponent later. Do the same multiply or divide by the appropriate power of ten as before.

No matter how many decimal digits you ignored, the result you have now is either the correct result of converting the entire decimal number (even if it is a million digits long!), or else you need to increase the LSB by 1. You can decide which by adding 0.5 ULP, converting to decimal (using the first algorithm), and compare the generated digits to the original string you were given. When a digit differs you know whether to add 1 ULP or not.

peter-h · « **Reply #8 on:** August 13, 2022, 05:00:49 pm »

Quote

My guess is that this is because multi threaded applications often have tiny tiny stacks. Using dozens of bytes of stack space for automatic buffers could be a problem. Especially if you expect users to rarely if ever actually request maximum precision.

It is true that in an RTOS environment if you run say a printf() which uses 1k on the stack, from each of 15 RTOS tasks, each of those tasks will need a 1k stack headroom just for that, so you need 15k (and a re-entrant printf) whereas if you structured your code as one big state machine or loop (etc etc) then you would need just the 1 x 1k (and the printf can uses statics etc). But in reality this is not an issue. My current project has ~15 tasks and this is not an issue; most of the stack space needed in each task is actually stuff specific to that task. Total space allocated to RTOS stacks is 64k, out of 192k total RAM. But then I don't have a printf() using a lot of stack because I got rid of the one I did have

And there are solutions if you are pushed e.g. having just one instance of the printf, in its own RTOS task, and send it the stuff to output via RTOS messages.

Quote

The goal of this algorithm, by the way, is that you can print any floating point number to a decimal string, and then convert the string back to a floating point number, and it will have the EXACT SAME bits as the original one, every time. And also produce in every case the shortest string for which this is true.

Doesn't that imply that the library contains a complementary printf() and scanf() ? From tracing code, I have not found evidence of my scanf() (actually sscanf()) using mutexes and mallocs like that notorious ex-1990 printf library does. But then I have never found the scanf source. The likely printf source URL is in my #1 post.

Quote

Steele & White's 1990 paper How to Print Floating-Point Numbers Accurately says that you need arbitrary-sized bignums to do the job properly.

ISTM that that paper was used in just about every printf implementation since 1990, for "big machines" with no memory issues. I can't tell if the printf source mentioned above uses it; it is too convoluted.

I went looking for the "1990" scanf source. Maybe here https://sourceware.org/cgi-bin/search.cgi?q=scanf+1990 and interesting hits and #3 and #4. Could be this one
https://code.woboq.org/userspace/glibc/stdio-common/vfscanf-internal.c.html#__vfscanf_internal
which is full of mallocs! But these are much newer than 1990 and I have my doubts anyway because a breakpoint on _sbrk (which is always called by malloc) is not getting hit. And this code is certainly not running a private heap. It might use statics.

SiliconWizard · « **Reply #9 on:** August 13, 2022, 05:58:24 pm »

Quote from: DiTBho on August 13, 2022, 08:58:47 am

(...)
show(whatever type, format), and you get the job done!

Note that with C11, you can have the compiler select the right function at compile time instead of having this large switch/case function, thanks to _Generic.
But I'm kind of guessing that you don't use C11.

DiTBho · « **Reply #10 on:** August 13, 2022, 06:56:13 pm »

Quote from: SiliconWizard on August 13, 2022, 05:58:24 pm

Note that with C11, you can have the compiler select the right function at compile time instead of having this large switch/case function, thanks to _Generic.
But I'm kind of guessing that you don't use C11.

No, but this might be * very * interesting for those using C / 11

On C11, _Generic basically works like a kind of switch whose labels are type names which are tested against the type of the first expression, so the result becomes the result of evaluating the _Generic().

Some months ago, I read this blog.

Nominal Animal · « **Reply #11 on:** August 13, 2022, 09:58:03 pm »

Quote from: brucehoult on August 13, 2022, 11:17:09 am

Converting from a string to binary is usually very simple. Most numbers in text don't use the full number of significant digits/bits, especially in program code. Just take all the digits and convert them to a binary integer. Convert this integer to an (always exact) floating point number, then do a floating point multiply or divide by the appropriate power of ten, either from a table, or calculated on the fly. Make sure you calculate the power of ten FIRST (which will be exact), and then a single multiply or divide with the significant digits, so you get only a single 0.5 ULP error.

I've never proven to myself that this is so. In fact, I do believe I have an unfortunate counterexample below.

To keep everyone along: If we have a b-bit fraction (whose most significant bit corresponds to 0.5), and wish to convert it to a d-digit decimal number, we need to multiply by 10^d=2^d5^d and divide by 2^b, where d≥1 and b≥1. This simplifies to a multiplication by 5^d and a bit shift by d-b bits (positive up, negative down).

For exact rounding, we need to be able to detect the case when the resulting fraction, before any rounding, is exactly half. This is the tie case, and can be solved either by rounding ties to even (most common) or away from zero (what I normally use). Aside from the tie case, we only need one extra bit to get rounding right.

The problem is that multiplying an n-bit number by 5^d yields an n+2.322d-bit number (where 2.322 ≳ log₂5, still assuming d≥1). When using IEEE 754 Binary32/64 floating point numbers (which we almost certainly are if using float or double nowadays, except for exotic hardware like DSPs), this is the point where rounding error may be introduced, because those extra bits need to be rounded out.

Now, here's the point I have a problem with:

If that rounding results in a floating-point value that is exactly k+½, we may end up doing the tie-breaking twice, when turning that into an integer.

For example, round(5⁶×144115188075.85596) = round(2251799813685249 + 3/8) = 2251799813685249, but with double-precision floating point the multiplication itself yields round(2251799813685249.5) = 2251799813685250, which is off by 5/8 when ULP = 1/2, i.e. the error is 1.25 ULP.
Ouch.

(Yes, I know: these cases are very, very rare. I had to go search for one deliberately, going backwards and examining FP bit patterns. Many people think this kind of rare error is perfectly acceptable. But, it isn't acceptable if we abide by IEEE 754 rules; and C libraries printf() et al. are designed for correctness, not for speed/efficiency.)

What we need, is an extra tri-state flag from the multiplication: result exact with no rounding; result was rounded towards zero; result was rounded away from zero. If we had this knowledge in hand, we could handle rounding correctly. Tie-breaking only applies when the result was exact, otherwise exact half gets rounded the opposite way.

It would be even better, if we can combine the multiply-by-power-of-five, the bit shift, and rounding to a single operation that yields just the correctly rounded integer part. Unfortunately, most floating-point implementations do not include such functionality.

brucehoult · « **Reply #12 on:** August 13, 2022, 11:25:40 pm »

Quote from: Nominal Animal on August 13, 2022, 09:58:03 pm

For example, round(5⁶×144115188075.85596) = round(2251799813685249 + 3/8) = 2251799813685249, but with double-precision floating point the multiplication itself yields round(2251799813685249.5) = 2251799813685250, which is off by 5/8 when ULP = 1/2, i.e. the error is 1.25 ULP.
Ouch.

Noooo!!

Everything except the final multiply / divide is done in INTEGER arithmetic.

If presented with "144115188075.85596" then we start from INTEGER 14411518807585596 and exponent = -5

Now, 14411518807585596 is bigger than 2^53 (9007199254740992), so it's already not an example of the "easy case" I was talking about.

Let's just pretend for the moment that the last 6 isn't there and you wrote "144115188075.8559". So we have INTEGER 1441151880758559 (0x51EB851EB851F) and exponent = -4.

The following function converts a positive integer less than 2^53 *exactly* to an IEEE double:

Code: [Select]

double int2double(uint64_t v){
  int binShift = 0;
  // use clz instead if you have it
  while (v < (1l<<52)){
    v<<=1;
    binShift++;
  }
  v &= ~(1l<<52);
  v |= (1023l + (52-binShift)) << 52;
  return *(double*)&v;
}

So now you just do int2double(1441151880758559)/10000 and you have your ±0.5 ULP converted value.

Oh, yes, I forgot to mention that this easy case only works for decimal exponents where the power of ten is also exactly representable. That is true if 5^exp is less than 2^53, so for values to be converted of up to 10^±22.

Fortunately, most numbers encountered in practice are in this range. Outside this exponent range (or with too many significant digits given) you need to follow a slower procedure, as previously outlined.

PlainName · « **Reply #13 on:** August 13, 2022, 11:51:44 pm »

Quote from: peter-h

It is true that in an RTOS environment if you run say a printf() which uses 1k on the stack, from each of 15 RTOS tasks, each of those tasks will need a 1k stack headroom just for that, so you need 15k (and a re-entrant printf) whereas if you structured your code as one big state machine or loop (etc etc) then you would need just the 1 x 1k (and the printf can uses statics etc).

Not quite. You could easily have a printf which uses a single block of memory¹ and which uses mutexes to prevent multiple threads using it at the same time. Printf shouldn't take long so whether or not that potential small pause would be a problem depends on the app. However, since you're waiting on the output device regardless it's not going to make a real difference if you wait at the start of printf or the output of it.

[Perhaps that other printf you had, with the mutex stuff, was intended to work like this.]

[1] Simplistically you could just allocate 1K of heap and save yourself 14K from those 15 tasks. What would be better is to be able to dynamically change how much memory is allocated to suit the job, so if it's a long string with many variables you might want a lot more memory than for a simple 'hello world'. Or you might consider 1K enough and just truncate a job that needs more.

Dynamic memory shouldn't be a problem - a stack is dynamic memory you have no control of after all! Of course, it depends on the system, but where many uses of dynamic memory come unstuck is by not allowing that the allocation may fail. What happens then depends on the system - maybe you just take a breath and try again, since whatever is gobbling it may be then have released it, of you just give up in a safe way. Or, you could just do the static allocations as normal, allocate all the stacks, and what you have left over is the printf buffer.

Nominal Animal · « **Reply #14 on:** August 14, 2022, 03:42:40 am »

Quote from: brucehoult on August 13, 2022, 11:25:40 pm

Quote from: Nominal Animal on August 13, 2022, 09:58:03 pm
For example, round(5⁶×144115188075.85596) = round(2251799813685249 + 3/8) = 2251799813685249, but with double-precision floating point the multiplication itself yields round(2251799813685249.5) = 2251799813685250, which is off by 5/8 when ULP = 1/2, i.e. the error is 1.25 ULP.
Ouch.
Noooo!!

we start from INTEGER

D'oh! I was thinking of something completely different.

Yes, I do agree that whenever the result is known to be an integer, there is no danger of double-tie-rounding error, because the only rounding occurs when multiplying by the power of ten, and we can expect the implementation to abide by IEEE 754 rules, so the result will have at most 0.5 ULP of error.

Similarly, when one reads e.g 1.2345e-6 = 12345e-10 = 0.0000012345, one can convert it to floating point by dividing 12345 by 10⁶⁺⁴=10¹⁰, as long as both can be exactly represented by the floating point type. This limits the divisor to between 10⁰=1 and 10²², inclusive, as you mentioned above. (For negative power of ten divisors, just multiply by a positive power of ten, also between 10⁰ and 10²², inclusive.)

Put simply, you can multiply or divide integers with up to 15 decimal digits with a power of ten between 10⁰=1 and 10²², using double-precision floating point numbers, and the result will be the correct double-precision floating point representation.

For single precision, the limits are smaller: only up to 7 decimal digits and a power of ten between 10⁰=1 and 10¹⁰, inclusive.

(For example, to convert e.g. "0.0000000ddddddddddddddd" to double-precision floating point, you parse ddddddddddddddd as an integer, then divide that by 10²². To convert e.g. ddddddd0000000000.0 to single-precision floating point, you can parse ddddddd as an integer (you can use a single-precision floating point for it), then multiply that by 10¹⁰. An opportunistic parser could start by assuming this is the case, but fail immediately when either limit has been exceeded. Then, the actual parser can call the opportunistic parser that either yields the correct answer or fails (but usually yields the correct answer); and when that fails, use a slower parsing method.)

peter-h · « **Reply #15 on:** August 14, 2022, 08:10:03 am »

Quote

You could easily have a printf which uses a single block of memory1 and which uses mutexes to prevent multiple threads using it at the same time. Printf shouldn't take long so whether or not that potential small pause would be a problem depends on the app. However, since you're waiting on the output device regardless it's not going to make a real difference if you wait at the start of printf or the output of it.

[Perhaps that other printf you had, with the mutex stuff, was intended to work like this.]

AFAICT that 1990 printf lib uses the heap to get some RAM and then uses mutexes to prevent being used multithreaded. Well, malloc and free themselves need mutexes anyway. So they used recursive mutexes. Unfortunately these need to be initialised somewhere and none of the various bits of code I found online was doing the right thing. If I wrote a printf lib and published it, I would include the source to a printf_mutex_init function

The problem with the heap is that you will get fragmentation, and an eventual crash which is guaranteed (except when it happens) unless you mutex the whole heap access with a single mutex (so every malloc must be followed by a free) but programmers will hate you for that

That 1990 code was written for a "big machine" on which memory leaks of the order of a few k will remain for ever undiscovered so nobody cares.

Sure, there is a load of ways around this, but a heap is the last thing I would use. At best, it makes a slow printf even slower. Doing a mutex, malloc, mutex for the malloc, and then all that in reverse, for every printf invocation, is crazy.

Also if you mutex a printf, then you don't need to use the heap. You can just allocate memory statically. On the target I am working on, a change in the ST-supplied USB code from using the heap to using statics dramatically improved the system reliability (zero crashes now). And a free() is replaced with nothing because statics don't need to be freed

The heap is just an old way of doing stuff... And the amount of code executed in a printf probably falls by 1/3 or 1/2.

I've just been running my target, with all tasks running including a load of code with sscanfs, on a debugger, with a breakpoint on _sbrk, and nothing had been there. So if the sscanf is using the heap, it may be using it only for complicated cases. However, as posted above, there should be no reason for lots of RAM for reading decimal numbers. Anyway, a good lesson and a good case for a DIY approach using atoi() rather than sscanf with %d which is the lazy way

Interesting discussion above. I didn't know IEEE floats have such a horrendous spec. I'd like to find the source for the ST-supplied sscanf so I could double-check it isn't doing something stupid, but have no idea where to start.

PlainName · « **Reply #16 on:** August 14, 2022, 10:58:55 am »

Quote

The problem with the heap is that you will get fragmentation, and an eventual crash which is guaranteed

It can do, but in the simple case for printf you don't have to do it like that. There is no real difference in allocating a static 1K array, and dynamically creating the same array at run time. The problems start appearing if you free it, but since you won't then there are no problems

You could just allocate the static array, of course. But by dynamically creating it you can use up whatever memory is left over after everything else has been set up. Just an example, and if you went this route I would suggest using the static array to get going and then mess with dynamic memory later.

Quote

Doing a mutex, malloc, mutex for the malloc, and then all that in reverse, for every printf invocation, is crazy

Only for us. Computers are pretty good at repetitive tasks

The effort of getting a mutex and releasing it is nothing compared to what printf is doing under the bonnet. And real good value to save 14K.

Quote

Also if you mutex a printf, then you don't need to use the heap. You can just allocate memory statically.

Yes, that's what I was suggesting. The dynamic stuff is icing you can implement later if you want (for instance, if there is other stuff that benefits from large memory but can use small allocations if necessary).

peter-h · « **Reply #17 on:** August 14, 2022, 11:30:55 am »

Quote

There is no real difference in allocating a static 1K array, and dynamically creating the same array at run time. The problems start appearing if you free it, but since you won't then there are no problems

Obviously if you malloc but never free, then all is good. Even I think that is ok for embedded work

But people will argue there is no point then in having a heap... Especially as with all the other methods (static, or on stack as a variable inside a function) you know how much you are grabbing and where it ends up, whereas with the heap you don't know who has got what and how much, until you run out - unless you use some heap analysis tool, or you prefill the heap area with 0xA5 or whatever and look at it after running it for a while. Any analysis tool will be totally specific to the particular heap code, and I don't even know where the ST-supplied libc.a heap code came from. I am pretty sure most people never bother.

It's staggering how much of 2022 I have spent going down rabbit holes to make this project solid. The sh*t I found in the ST-supplied libc.a - like, a non thread safe printf doing mallocs for long and float output, with both printf itself and the mallocs being mutex-protected, but the mutex calls being empty stubs which just returned straight back, all designed so that nobody finds out that it is all actually junk. And to top it all off nicely, libc.a containing non weak functions so the crap code can't be replaced (solved by using objcopy to weaken the whole lib, as posted previously).

PlainName · « **Reply #18 on:** August 14, 2022, 02:47:14 pm »

Quote

The sh*t I found in the ST-supplied libc.a

Not really what one expects from the chip vendor. There is good stuff out there, but often I find third-party sources to be like help pages for writing one's own code rather than something to be used verbatim.

SiliconWizard · « **Reply #19 on:** August 14, 2022, 06:42:27 pm »

True for vendor code. But the standard C library most often comes from newlib for MCUs, it's not the vendor that wrote it.
Besides, if you're not happy with the compiler (and its associated libraries) that is provided by the vendor, you can always use another. That's the beauty of using ARM Cortex-based MCUs. You can use GCC configured in any way you like, compiled with newlib or possible replacements, or even use LLVM is you're so inclined. But from what I've seen, most often vendor tools ship with the "official" GCC binaries provided by ARM, which is not unreasonable: https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads

peter-h · « **Reply #20 on:** August 14, 2022, 09:24:03 pm »

I am sure it is "newlib"-something but that doesn't quite connect with the screenshots here
https://www.eevblog.com/forum/microcontrollers/32f4-hard-fault-trap-how-to-track-this-down/msg4317418/#msg4317418
which say "Standard C / C++" and currently I have newlib-nano not checked.

I made a local copy of that libc.a so that selector doesn't do anything anyway.

The printf is history now because I replaced it, but the scanf is original. It doesn't use the heap in my tests but I can't test every one of the % formats.

ledtester · « **Reply #21 on:** August 14, 2022, 09:30:02 pm »

Quote from: peter-h on July 28, 2022, 07:19:15 am

But this code from 1990, which appears to be a full implementation, uses not only mutexes to make itself thread-safe but also uses the heap
https://sourceware.org/git?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdio/vfprintf.c;h=6a198e2c657e8cf44b720c8bec76b1121921a42d;hb=HEAD#l866

From the copyright it appears this code is part of or was derived from BSD Unix which indicates it was meant to run on a multi-user machine with ample memory.

For modus operandi of the newlib implementation is to accumlate all of the formatted output into one buffer. When calling fprintf() this buffer is maintained as part of the FILE* structure.

The calls to _malloc_r are used in the following ways:

1. to create/enlarge the output buffer associated with a file handle (FILE*).
2. to create a workspace for the C99 'A' double conversion if the precision is greater than a certain limit
3. to create a workspace for the 'S' wide character string conversion

Otherwise conversions are performed in a stack allocated buffer before being transferred to the output buffer.

The lwprintf library just prints 'NaN' for the 'A' double conversion:

https://github.com/MaJerle/lwprintf/blob/develop/lwprintf/src/lwprintf/lwprintf.c#L993

The '%S' format seems to be deprecated -- even the Linux man page says not to use it.

https://man7.org/linux/man-pages/man3/printf.3.html

(search for "Synonym for ls")

peter-h · « **Reply #22 on:** August 14, 2022, 09:43:02 pm »

Thank you.

Does any of this change if one is not using the file versions (fprintf) but uses just the sprintf and snprintf ones?

This project does use floats but will never need doubles to be output as doubles.

I wonder if someone can dig out the likely sscanf source? It could well be from the same era (1990??). But isn't C99 well past 1990? The copyright says 1990 but then there are symbols like _WANT_IO_C99_FORMATS.

ledtester · « **Reply #23 on:** August 14, 2022, 09:59:19 pm »

The sprintf code appears to be here:

https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdio/sprintf.c;h=be66ec6f5d73589e5147afeb7634b9de0daba09d;hb=HEAD

and it looks like it creates a dummy file handle before calling something which does all the work -- see lines 584 and then 591.

tellurium · « **Reply #24 on:** August 14, 2022, 10:49:43 pm »

Maybe what I write below, will be useful for somebody.

My company's product, a networking library, implements its own printf (a subset of ISO C). It is a fairly compact one:

https://github.com/cesanta/mongoose/blob/master/src/fmt.c
Doc: https://mongoose.ws/documentation/#mg_snprintf-mg_vsnprintf

In addition to some standard specifiers like %d, %u, %s, %g, %x, it implements some non-standard specifiers, like %q/%Q (for escaped quoted strings), %H (for hex-encoded), %V (for base64-encoded), and %M (where one can pass a custom printing function with arbitrary args).
This gives an ability to produce JSON strings easily - useful for communication.

That printf is fully retargetable and can print to anything by specifying a custom "putchar" function.

DiTBho · « **Reply #25 on:** August 14, 2022, 10:59:44 pm »

why printfs? why don't people get rid of it once and for all?

IanB · « **Reply #26 on:** August 14, 2022, 11:06:27 pm »

Quote from: brucehoult on August 12, 2022, 10:38:37 pm

printf() doesn't need to use the heap.

Some printf()s do anyway, but they don't *need* to.

Steele & White's 1990 paper How to Print Floating-Point Numbers Accurately says that you need arbitrary-sized bignums to do the job properly. I proved in 2006 that you don't need to, that in fact the size is bounded by, IIRC, something like 144 bytes, and you need several variables this size. I had correspondence with a local university HoD who put me in touch with Steele, who agreed with my analysis. I expect I still have the correspondence somewhere. However my boss forbade me to publish, so the only implementation was on our Java-to-ARM-native (on BREW) compiler & runtime library "AlcheMo".

I don't understand this. Since an IEEE float has a fixed size and precision, it follows that you will only need a fixed maximum amount of temporary storage to convert it to decimal format. There must surely be some other point to the paper you reference, because on the surface it defies common sense?

brucehoult · « **Reply #27 on:** August 14, 2022, 11:29:07 pm »

Quote from: IanB on August 14, 2022, 11:06:27 pm

Quote from: brucehoult on August 12, 2022, 10:38:37 pm
printf() doesn't need to use the heap.

Some printf()s do anyway, but they don't *need* to.

Steele & White's 1990 paper How to Print Floating-Point Numbers Accurately says that you need arbitrary-sized bignums to do the job properly. I proved in 2006 that you don't need to, that in fact the size is bounded by, IIRC, something like 144 bytes, and you need several variables this size. I had correspondence with a local university HoD who put me in touch with Steele, who agreed with my analysis. I expect I still have the correspondence somewhere. However my boss forbade me to publish, so the only implementation was on our Java-to-ARM-native (on BREW) compiler & runtime library "AlcheMo".

I don't understand this. Since an IEEE float has a fixed size and precision, it follows that you will only need a fixed maximum amount of temporary storage to convert it to decimal format. There must surely be some other point to the paper you reference, because on the surface it defies common sense?

Steele & White correctly show that to do the job properly (i.e. accurately in every case), you need to use multi-word arithmetic, and not just 2 or 3 words but numbers hundreds or thousands of bits long. They gloss over whether there is an upper bound or what it is, and everyone who implements their algorithm uses general heap-based unlimited precision bignum libraries.

Like you, I reasoned that IEEE FP is a finite format and so the size of the bignums needed must be bounded, and I set out to find and prove the limit, and implement their algorithm in a fixed amount of RAM. And that size, for IEEE doubles, is 36 words on a 32 bit machine or 18 words on a 64 bit machine. Or 142 bytes on an 8 bit machine. Per variable, and several variables are needed. I didn't try to minimize the number of variables (especially as speed is important too), but the total storage is less than 1 KB, which was fine for my application on ARM-based mid 2000's mobile phones with minimum 400 KB RAM and more usually 1 or 2 MB. The big gain came from not needing dynamic storage.

If you want to do perfect double precision floating point printing on an ATMega328 then you might want to do more work. But usually people are only using single precision variables on machines like that, which require much less space anyway (22 bytes per bignum variable).

SiliconWizard · « **Reply #28 on:** August 14, 2022, 11:41:08 pm »

Oh yes.

And then, I'm not sure it's worth all the trouble, especially on a very small target. Usually if you need to display/input numbers accurate to a given digit, using integers will be much easier and will make more sense.

Sure in some instances you *absolutely* have to use FP numbers, but it appears that in many cases, people use FP just because this is convenient or they just don't know how to do arithmetic with integers especially when "decimals" would be involved. Just my 2 cents. Learn proper arithmetics and suddenly a lot of things become much easier and you'll be a lot less dependent on the code of others.

brucehoult · « **Reply #29 on:** August 14, 2022, 11:52:11 pm »

Quote from: SiliconWizard on August 14, 2022, 11:41:08 pm

Sure in some instances you *absolutely* have to use FP numbers, but it appears that in many cases, people use FP just because this is convenient or they just don't know how to do arithmetic with integers especially when "decimals" would be involved. Just my 2 cents. Learn proper arithmetics and suddenly a lot of things become much easier and you'll be a lot less dependent on the code of others.

It's easier to get things such as PID algorithms stable using FP. The soft float library in Arduino on 16 MHz AVR takes about 5 µs per add/sub/mul. So you can do around 200k of them per second. No problems to do 10 or so operations for a PID even if it's every ms (which it's usually not, as physical systems don't react so quickly anyway).

Of course that FP library is designed for speed not IEEE compliance and probably has a few ULP of error, so worrying about completely precise printing is pointless anyway.

peter-h · « **Reply #30 on:** August 15, 2022, 06:17:49 am »

Quote

The sprintf code appears to be here:
https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdio/sprintf.c;h=be66ec6f5d73589e5147afeb7634b9de0daba09d;hb=HEAD
and it looks like it creates a dummy file handle before calling something which does all the work -- see lines 584 and then 591.

That calls _svfprintf_r(). Where could I find that?

Quote

It's easier to get things such as PID algorithms stable using FP.

Yes exactly; there are good reasons for using floats. Especially nowadays when ARM32 does a float mult etc in 1 clock cycle. On a Z80 you would be looking at 1-2ms which is only 100000 times slower

If you were happy with non-IEEE floats then you could improve this by a factor of 10. But double floats are very rarely needed because the range and accuracy exceeds (almost) anything in the physical world by a massive margin, and printing them even less so.

One can do "almost anything physical" using int32, and that's what Apollo used, evidently successfully, but you need to be very clever, know the full range of all variables, etc.

Quote

The soft float library in Arduino on 16 MHz AVR takes about 5 µs per add/sub/mul.

Must be using hardware for that.

brucehoult · « **Reply #31 on:** August 15, 2022, 07:35:05 am »

Quote from: peter-h on August 15, 2022, 06:17:49 am

Quote
The soft float library in Arduino on 16 MHz AVR takes about 5 µs per add/sub/mul.

Must be using hardware for that.

No, that's soft float. 5 µs is 80 instructions, depending on the instructions.

There's 8x8->16 multiply in 2 cycles. 24x24 mul needs 9 of those and a few more than that adds. Let's say 35 cycles. TBH, you could probably just not do some of the low order muls and adds if you don't care about a few ULP of error.

5 µs probably isn't the worst case for add and subtract, it's just what I've measured in practice on typical data. I'm sure it's a bit longer if subtract (or add) cancels a lot of bits and it needs a lot of shifting to re-normalize. But when that happens you've usually got other problems in your algorithm... Multiply shouldn't vary by much.

DiTBho · « **Reply #32 on:** August 15, 2022, 09:37:51 am »

fp64_t Softfloat on a 8bit MPU? Um, it must be the best idea ever (sarc)

brucehoult · « **Reply #33 on:** August 15, 2022, 09:48:17 am »

Quote from: DiTBho on August 15, 2022, 09:37:51 am

fp64_t Softfloat on a 8bit MPU? Um, it must be the best idea ever (sarc)

Why not?

Apple's SANE library implemented fully compliant 32 bit, 64 bit, and 80 bit FP on the 8 bit 6502 in the 1980s, API and bit for bit compatible with the Macintosh implementation.

https://en.wikipedia.org/wiki/Standard_Apple_Numerics_Environment

DiTBho · « **Reply #34 on:** August 15, 2022, 09:57:41 am »

Quote from: peter-h on August 15, 2022, 06:17:49 am

double [/b]floats are very rarely needed because the range and accuracy exceeds (almost) anything in the physical world by a massive margin

Except, in Matrix LUP decomposition for solving systems of linear equations ( A x = b), you get keeping errors on every single MAC and MSC iteration if the A matrix is intrinsically ill-conditioned (1), so things can literally explode into snow-crash-flakes if you don't adopt countermeasures:

fp64_t helps, a lot when the A matrix is pathologically intrinsically ill-conditioned (weak diagonal)
full-pivoting helps and does the rest of the job

Only combing these two, you can save your day, and full-pivoting on fp64_t is better than full-pivoting on fp32_t.

(1) which is a property of your system of linear equations, you can't do anything here, except "trying to swap rows and columns" to see if calculus look better, if you touch equations, if you voodoo black magic (modify) your numbers , you are solving a different system

DiTBho · « **Reply #35 on:** August 15, 2022, 10:01:16 am »

Quote from: brucehoult on August 15, 2022, 09:48:17 am

Why not?

because it adds more complexity than needed!
Fixedpoint should be simpler, and good enough.

brucehoult · « **Reply #36 on:** August 15, 2022, 10:12:16 am »

Quote from: DiTBho on August 15, 2022, 10:01:16 am

Quote from: brucehoult on August 15, 2022, 09:48:17 am
Why not?

because it adds more complexity than needed!

Than needed for what?

Quote

Fixedpoint should be simpler, and good enough.

Good enough for what?

I don't know any reason why people with slow computers should be banned from doing high precision calculations.

Even a 6502 or z80 is massively faster than an HP41CV or HP48 or whatever they made after those.

DiTBho · « **Reply #37 on:** August 15, 2022, 10:25:20 am »

Quote from: brucehoult on August 15, 2022, 09:48:17 am

FP on the 8 bit 6502 in the 1980s

Just checked my manual, intel designed 8051 in 1981 and released my BASIC51 ROM with softfloat fp32_t in 1986.

__________1986___________

Probably even Bill Gates contributed something, maybe to the BASIC core, maybe something more.

Yup, it's a very old technology, but the truth is: my Japanese SH3 pocket calculator doesn't use any floating point for calculations, instead it uses fixedpoint, thanks to a DSP fixedpoint engine, embedded with the CPU, it supports systems of up to 8 linear equations, and results are really good.

Which brings me to the question: does fixedpoint intrinsically solve the LUP problem with pathologically intrinsically ill-conditioned matrices? I have already tried and results are not bad, even better than results with fp64_t.

I have to investigate more, but if this is true, I will rid of fp64_t entirely for my algorithms.

DiTBho · « **Reply #38 on:** August 15, 2022, 10:31:28 am »

Quote from: brucehoult on August 15, 2022, 10:12:16 am

Than needed for what?

Simple implementations, things you can read and understand.
University courses, library for Softcores, things to be used for hobby, university and similar.

Things where less and neat code is largely better.

I mean, just to print-out floating-point, you have to voodoo black-magic

That's no good.

brucehoult · « **Reply #39 on:** August 15, 2022, 11:00:48 am »

Quote from: DiTBho on August 15, 2022, 10:31:28 am

Quote from: brucehoult on August 15, 2022, 10:12:16 am
Than needed for what?

Simple implementations, things you can read and understand.
University courses, library for Softcores, thins to be used for hobby, university and similar.

Things where less and neat code is largely better.

I can made any program as simple and fast as you want, if it doesn't have to be correct.

If you don't mind doing (2.0/7)*7 and getting back 1.999998 instead of 2.000000 then you don't have to worry about it.

Quote

I mean, just to print-out floating-point, you have to voodoo black-magic
That's no good.

It's not voodoo black-magic. It's just converting whatever floating point number you have (decimal or binary) to an exactly equivalent fraction using two integers in the source number base, converting both integers to the destination number base, then doing the long division.

e.g.

123.456 = 123456/1000 = 0b11110001001000000/0b1111101000 = 0b1111011.01110100101111001... = 0b1.11101101110100101111001 x 2^7

So the Float is 0 10000101 11101101110100101111001 i.e. 1 * 2^(133-126) * (1 + 0b0.11101101110100101111001)

The intermediate integers can get long, but it's not *complicated*.

Nominal Animal · « **Reply #40 on:** August 15, 2022, 11:26:03 am »

Quote from: DiTBho on August 15, 2022, 10:25:20 am

Which brings me to the question: does fixedpoint intrinsically solve the LUP problem with pathologically intrinsically ill-conditioned matrices?

The difference between fixed and floating point types is the range. If you know the range of your values, then a suitable fixed point type can have more precision.

The numerical stability properties in cases like matrix LU decomposition, are much harder to quantify. Some features, like the absolute precision make it easier to manage (consider machine epsilon, for example), but I do not believe it intrinsically solves the issues with ill-conditioned matrices... with fixed point, the detection of and boundary in the parameter space for ill-conditioned matrices is simpler, so it can definitely feel like it solves such problems.

Note that in many linear algebra use cases, one uses mostly additions, subtractions, and multiplications. If you can use two's complement format, the addition and subtraction become extremely fast – you don't even need to know the location of the decimal point, if you use modular arithmetic for additions and subtractions –, and you can keep your accumulators/sums at "double" precision; that is, you can implement fused multiply-and-add-double-precision type to accumulate the sum of products at double the normal precision, and only afterwards rescale it back to the original fixed point format.

For multiplication, you probably want to take a look at Karatsuba algorithm (replacing a multiplication with an addition and two subtractions). Note that if you work on only positive values (pre-negate and post-negate if necessary), the MSB will always be zero, therefore the sum of two-limb numbers' high and low parts (as used in Karatsuba to convert a multiplication into two additions and a subtraction) can never overflow. A nifty trick, although not worth it if you have a fast 32×32=64 (or low and high 32 bit results) machine multiplication operation.

brucehoult · « **Reply #41 on:** August 15, 2022, 11:55:03 am »

Quote from: brucehoult on August 15, 2022, 11:00:48 am

123.456 = 123456/1000 = 0b11110001001000000/0b1111101000 = 0b1111011.01110100101111001... = 0b1.11101101110100101111001 x 2^7

So the Float is 0 10000101 11101101110100101111001 i.e. 1 * 2^(133-126) * (1 + 0b0.11101101110100101111001)

And, just to illustrate the reverse direction...

0x42f6e979 = 0100 0010 1111 0110 1110 1001 0111 1001 = 0 10000101 11101101110100101111001 = 0b1111011.01110100101111001 = 0b111101101110100101111001/0b100000000000000000 = 16181625/131072.

Add and subtract 0.5 ULP to that:

32363251/262144 = 123.456005096
32363250/262144 = 123.456001282
32363249/262144 = 123.455997467

So the correct result is 123.45600.

Any apparent complications are just to make this process faster. You can write simple code for it that works, just a bit more slowly.

peter-h · « **Reply #42 on:** August 15, 2022, 12:31:36 pm »

Q1: Does scanf come into this at all? Earlier it was mentioned that IEEE requires it to create an exact mirror image of printf.

Q2: I wonder where this precision is relevant. It doesn't relate to anything in the physical world, except to make some output look pretty.

#2 can be achieved with very simple code. Many years ago I built a custom product for a customer in the film business. It passed 35mm film over some sprockets, and on one of these was a shaft encoder. The quadrature signals went to a Z80 CTC which had a special circuit which was just right for this job. Then my box had a bipolar DAC for driving a servo motor for winding the film to a specified position, which could be frames, feet (an exact multiple of frames), or metres! The last one of course produced a nonexact result, but it was unavoidable. One day the customer turned up and said he wants to get back to the same position, if going forwards in feet and backwards in metres. I spent many days doing that (Z80 asm, int32 maths)

But this was not needed; it was just for visual presentation at exhibitions. One could do the same by faking the value displayed if the abs(error) is < some amount.

Nominal Animal · « **Reply #43 on:** August 15, 2022, 04:03:57 pm »

Quote from: peter-h on August 15, 2022, 12:31:36 pm

Q1: Does scanf come into this at all? Earlier it was mentioned that IEEE requires it to create an exact mirror image of printf.

scanf() isn't as good as (reliable, report errors like) strtod()/strtof() do.

Some of the really good approaches use a custom version while generating the number, to handle the requirement of no more error than half an unit in the least significant place (0.5 ULP), with correct rounding (including ties).

Quote from: peter-h on August 15, 2022, 12:31:36 pm

Q2: I wonder where this precision is relevant. It doesn't relate to anything in the physical world, except to make some output look pretty.

Repeatability, for example.

Let's say you make a device that takes a number as an input, does something, and pops out a number.
The user uses the device, and knows that when they use X as input, the result should be Y.

Because output does not need to be precise, just pretty, a tiny change in the firmware causes the next revision of the device to produce Z instead of Y when inputting X. Or perhaps they decide to throw away the device and implement it in R/Python/Matlab/Octave/C, and scratch their head when they cannot reproduce the functionality.

peter-h · « **Reply #44 on:** August 15, 2022, 04:10:46 pm »

Isn't strtof etc inside a sscanf %f anyway?

I do normally use these (and atoi etc) but sometimes a sscanf %d %d %d to extract 3 integers is quite handy.

Can anyone find the source of the likely ST libc.a scanf lib? Probably it will have the same 1990 copyright message. I doubt (from debugging) that it uses the heap, but it would be nice to be more sure.

SiliconWizard · « **Reply #45 on:** August 15, 2022, 06:18:53 pm »

Quote from: brucehoult on August 14, 2022, 11:52:11 pm

Quote from: SiliconWizard on August 14, 2022, 11:41:08 pm
Sure in some instances you *absolutely* have to use FP numbers, but it appears that in many cases, people use FP just because this is convenient or they just don't know how to do arithmetic with integers especially when "decimals" would be involved. Just my 2 cents. Learn proper arithmetics and suddenly a lot of things become much easier and you'll be a lot less dependent on the code of others.

It's easier to get things such as PID algorithms stable using FP.(...)

Sorta. But that's the case I mentioned ("in some instances you *absolutely* have to use FP numbers"). Sometimes sure integer/fixed-point is not well adapted or would take a lot more dev time while requiring a lot of extra bits due to the limited dynamic range.

But that's my point. Those use cases are almost orthogonal to the need to accurately display and input floating point numbers per se.
Unless maybe you are specifically designing a calculator, in which case using IEEE FP may not be the best idea ever anyway.

(And yes, other than that, I've see many, many devs just using FP not because they actually needed it, but because they could not figure out how to do simple integer/fixed-point arithmetics at all.)

peter-h · « **Reply #46 on:** August 15, 2022, 09:01:21 pm »

For a calculator you probably want what used to be called BCD Reals.

The Z80 has special instructions for that. Historically calculators used 4 bit micros which were fine for BCD maths. Very slow; you could see the time they took

I've done plenty of fixed point and it is fine, and 32 but has a huge dynamic range, vastly more than nearly everything in the physical world needs, but undeniably takes longer to code. I did it mostly in the old days when floats were much slower. Well written 16x16=32 mult, or 32/16+16 divide, were quick - around 100us. For appropriately scaled variables I used a lot of 32x32=32 multiplies.

Surprisingly some products have used int8 maths. I know of one autopilot for light aircraft which does this internally. It works well enough - until something goes out of range and then you get massive underflow, with a resulting burning out of the servo motors.

Nominal Animal · « **Reply #47 on:** August 16, 2022, 04:15:06 am »

On 32-bit architectures, it is interesting to note that 10⁹ < 2³⁰, and 1,999,999,999 < 2³¹. So, if you have a 32×32=64-bit multiply and 64/32=32,32 division with remainder, you can pack to binary and extract from binary nine decimal digits at a time. x86, x86-64, Risc-V (rv32gc), armv8a, thumb2, and arm64 at least can do this (according to trivial test at Compiler Explorer).

The interesting part of 10⁹ is that it is equal to 2⁹ 5⁹ = 512 × 125 × 125 × 125, and that 125 = 128 - 3.
In other words, multiplying x by 10⁹ can be implemented as
((x * (128-3)) * (128-3) * (128-3)) << 9
where each multiplication by 128-3 is essentially just (x << 7) - (x << 1) - x. In other words, "multiply 30-bit integer by 100 or 10⁹ and add a 30-bit integer" fused-multiply-adds/subs can be implemented very efficiently even on 8-bit microcontrollers without fast hardware multiplication, as long as they implement bit shifts through carry and additions/subtractions through carry.

It is similarly interesting to note that 1/5, 1/25, 1/125 etc. are repeating bit patterns,

Code: [Select]

5⁻¹ = 0.0011 0011 ...
5⁻² = 0.00001010001111010111 00001010001111010111 ...
5⁻³ = 0.0000001000001100010010011011101001011110001101010011111101111100111011011001000101101000011100101011 0000001000001100010010011011101001011110001101010011111101111100111011011001000101101000011100101011 ...

with lengths of 4, 20, and 100 bits. In other words, division by 10, 100, or 1000 can be implemented as a multiplication by a repeating reciprocal (of 4, 20, or 100 bit units) and a bit shift right (by 1, 2, or 3 bits, respectively) –– if you don't need the remainder and can deal with a fractional quotient.

peter-h · « **Reply #48 on:** August 16, 2022, 06:13:14 am »

That's clever.

Yes; I used to pre-scale integer representations by some power of 2, too.

On an arm32, single float mult is 1 clock whereas div is 16 clocks (IIRC) and this would be a lot worse for doubles, which is probably another good reason to avoid a printf (even if you have the sources to it and know it has not been hacked with dummy mutexes etc) which converts float to double.

Nominal Animal · « **Reply #49 on:** August 16, 2022, 07:05:02 am »

Quote from: peter-h on August 16, 2022, 06:13:14 am

probably another good reason to avoid a printf [...] which converts float to double.

Right; the requirement for C to promote variadic float arguments to double is an important reason why one should use something else instead.

There's a lot of interesting stuff in the "simple" arithmetic here. For example, I once worked out how to determine how often calculating a division using fused multiply-adds and constant reciprocals (Brisebarre-Muller-Raina or Markstein algorithm), given a constant integer divisor, would the result be incorrect (by one ULP), among all finite floats.

In particular, any single-precision floating point division by an odd integer can be done exactly using the Markstein algorithm, at the cost of one floating-point multiply (by a constant), plus two floating-point fused multiply-adds. (FP fused multiply-adds are available on ARM VFPv3 and NEON, at least.)

That is, if you have
result = float / odd_integer_constant;
you can precalculate C1 = -1.0f / odd_integer_constant and C2 = -(float)odd_integer_constant, and do the calculation as
temp1 = float * C1;
temp2 = fmaf(C1, temp1, float);
result = fmaf(C2, temp2, temp1);
It also works for power of two odd integer constants; i.e. whenever
(odd_integer_constant & 1) || !((odd_integer_constant) & (odd_integer_constant - 1))
is true. Interestingly, there are algorithms that can obtain the reciprocal much faster than doing a floating-point division, so implementing the subset of divisions by an odd integer or power of two via that route, might be interesting for some workloads.

(There is an IEEE 2018 paper on fast float reciprocal using fused multiply adds, doi.org/10.1109/IDAACS-SWS.2018.8525803, but lacking IEEE access, I haven't read it.)

peter-h · « **Reply #50 on:** August 16, 2022, 07:34:21 am »

There was also Newton-Raphson division, which used a multiply. I don't know the details (I know what N-R is; used it for square roots - the ((N/a)+a)/2 formula) but if you have a full size barrel shifter then multiplies are very fast.

brucehoult · « **Reply #51 on:** August 16, 2022, 08:26:36 am »

Quote from: peter-h on August 16, 2022, 07:34:21 am

There was also Newton-Raphson division, which used a multiply. I don't know the details (I know what N-R is; used it for square roots - the ((N/a)+a)/2 formula) but if you have a full size barrel shifter then multiplies are very fast.

You need some initial approximation for the reciprocal. Then you use r' = r * (2 - n * r) several times to improve the reciprocal. Then you multiply by the thing you wanted to divide into.

Here's one approach to an initial approximation that you can do in hardware.

https://observablehq.com/@drom/reciprocal-approximation

Or you can use a ROM table. Most modern ISAs have a reciprocal estimate instruction, especially in their Vector/SIMD ISAs. The RISC-V Vector ISA explicitly specifies the 128-entry x 7 bit table to be used for reciprocal estimate and reciprocal sqrt estimate instructions:

https://github.com/riscv/riscv-v-spec/blob/master/vfrec7.adoc
https://github.com/riscv/riscv-v-spec/blob/master/vfrsqrt7.adoc

"The 7 bit accuracy was chosen as it requires 0,1,2,3 Newton-Raphson iterations to converge to close to bfloat16, FP16, FP32, FP64 accuracy respectively."

Where "close" is as in "good enough for 3D graphics purposes"

Nominal Animal · « **Reply #52 on:** August 16, 2022, 09:17:37 am »

Although Bruce well explained it already, let me be me and describe how that \$x_{i+1} = x_{i} ( 2 + x_i n )\$ comes about.
I'm posting this because it is a simple and useful tool that can help in related problems, as peter-h already mentioned.

Newton-Raphson method is a root-finding method. We have some differentiable function \$f\$ that has (an only) root at the desired \$x\$, \$f(x) = 0\$, and want to know the exact \$x\$ for that root. Using \$f^\prime(x)\$ for the derivative, we iterate
$$x_{i+1} = x_{i} - \frac{f(x_i)}{f^\prime(x_i)}$$

In this particular case, we pick function \$f(x) = 1/x - n\$. Then, the derivative is \$f^\prime(x) = -1/x^2\$, and
$$x - \frac{f(x)}{f^\prime(x)} = x + x^2 \left(\frac{1}{x} - n\right) = 2 x - x^2 n = x (2 - x n)$$

Oftentimes, the true "trick" is picking a good function. For that, I recommend computer algebra systems like Maxima/wxMaxima, SageMath, Maple, et cetera. For example, in Maxima/wxMaxima:
f(x) := 1/x - n $ factor( x - f(x) / diff(f(x),x) );
outputs
-x ( n x - 2 )
which is good enough. (We really want x(2-xn).)
The $ is a separator (like ; except suppresses output). factor() is a function that tries to factor the polynomial; tends to do better than e.g. ratsimp() in this kind of cases. Then, just try different forms for f(x) you can think of, until the expression is something you can easily calculate.

nctnico · « **Reply #53 on:** August 16, 2022, 10:33:40 am »

Quote from: Nominal Animal on August 16, 2022, 07:05:02 am

Quote from: peter-h on August 16, 2022, 06:13:14 am
probably another good reason to avoid a printf [...] which converts float to double.
Right; the requirement for C to promote variadic float arguments to double is an important reason why one should use something else instead.

Nahh, don't worry about that. In most applications formatting / printing numbers is not critical for performance.

peter-h · « **Reply #54 on:** August 16, 2022, 01:50:34 pm »

Yes, which is why I wonder why it was decided to make the printf family use double as a default.

Double floats were not fast in those days... a PC was an 8086 running at 16MHz, and no hardware float unless you bought the extra chip.

If I need a fast "sprintf" I would do itoa() or ltoa(). If I needed a fast %f then practically always I control the format, e.g. %6.2f, and then doing a ltoa, printing a '.', and a second ltoa, is many times faster. Many years ago I was implementing HPGL parsing and generation (also Calcomp plotter language, etc, which was great fun) and those tricks were used because the moment the IAR Z180 compiler saw a %f it would go away for 10ms

Super clever stuff above about N-R iteration. I was always crap at maths (I do enjoy reading "popular maths" textbooks) but I do get it. They could not chuck me out of univ because despite failing all the maths exams I was getting 100% in electronics

DiTBho · « **Reply #55 on:** August 16, 2022, 04:07:36 pm »

Quote from: brucehoult on August 16, 2022, 08:26:36 am

The RISC-V Vector ISA

This is where MIPS4++ wins: RV32IMAC doesn’t have, but R18200 has a count-leading-zeros instruction, which is not directly supported by C but if encoded in assembly it's extremely useful during the Look up step when you have to choose the best initial estimate of reciprocal using leading 3 nonzero bits.

Modern ARM should have something similar.

SiliconWizard · « **Reply #56 on:** August 16, 2022, 07:14:55 pm »

And RISC-V is a highly modular ISA as opposed to most others. That 's by design. The Bitmanip extension does provide such an instruction. Of course a given core must implement this extension, which is not going to happen for a while at least on commercial cores since the extension has been finalized relatively recently...

brucehoult · « **Reply #57 on:** August 16, 2022, 09:02:15 pm »

Quote from: DiTBho on August 16, 2022, 04:07:36 pm

Quote from: brucehoult on August 16, 2022, 08:26:36 am
The RISC-V Vector ISA

This is where MIPS4++ wins: RV32IMAC doesn’t have, but R18200 has a count-leading-zeros instruction, which is not directly supported by C but if encoded in assembly it's extremely useful during the Look up step when you have to choose the best initial estimate of reciprocal using leading 3 nonzero bits.

Reciprocal estimate is a floating point instruction. Floating point values are normalised, meaning the (implied) MSB is always 1, so there is no shifting needed. Denorms do exist, but the reciprocal of a denorm with two or more leading 0s is infinity anyway.

It would seem kind of crazy to do Newton-Raphson using integers. It would be tricky to avoid under/overflow.

Quote

Modern ARM should have something similar.

Modern RISC-V has CLZ. The original RV64GC from 2015 (ratified essentially unchanged in 2019, now called RVA20) doesn't, but RVA22 does, or will once it's ratified. Any new chip aimed at "real" OSes coming out from now on will have it, along with the rest of the "B" extension, in order to be RVA22 compliant. The B extension instructions are easy to implement, and uncontroversial, and were ready to go well before ratification. Unlike the much larger and more complicated Vector extension, which is optional in RVA22.

Microcontroller people are free to implement CLZ or not, depending on whether it's worth it for their application. It's a large chunk of silicon to implement a fast CLZ and probably doesn't make sense if you don't have things such as multiply and divide.

westfw · « **Reply #58 on:** August 17, 2022, 01:08:58 am »

Quote

Steele & White correctly show that to do the job properly (i.e. accurately in every case), you need to use multi-word arithmetic, and not just 2 or 3 words but numbers hundreds or thousands of bits long.

Is the amount of storage required dependent on the number of digits that you actually want to print? I would think that printing "several fewer" digits than are accurate in the internal representation would be easier.

The paper is here, BTW. https://lists.nongnu.org/archive/html/gcl-devel/2012-10/pdfkieTlklRzN.pdf I guess I should read it!

The desire to "print any floating point number to a decimal string, and then convert the string back to a floating point number, and it will have the EXACT SAME bits as the original one" seems an unnecessarily high bar, for the typical embedded system displaying a number to a human user. :-(

Quote

why printfs? why don't people get rid of it once and for all?

Is there an alternative for "formatted" output? That used to be pretty important back in the fortran and cobol days, though I guess the main excuse in modern times is just "prettyness." C's printf() seems to be a pretty reasonable compromise for all its warts (mostly caused by it being a function, rather than part of the language, I guess.)
I actually did some work on a Pascal compiler once, to add fortran-style formatted output. It was pretty gross (and certainly couldn't have been done in pascal itself.) I see This conversation about Ada, which amounts to "it's quite verbose and ugly."

Nominal Animal · « **Reply #59 on:** August 17, 2022, 02:26:07 am »

Quote from: westfw on August 17, 2022, 01:08:58 am

Is the amount of storage required dependent on the number of digits that you actually want to print?

Yes, if the range of values you want to print is clamped.

For example, you can create a function that prints double-precision floating point values with at most 19 decimal digits – for example ±ddddddddddddd.ffffff – it is trivial to do using a single 64-bit (unsigned) integer with exact output.

The problem is that unlike Fortran, C printf() does not clamp the range when using say %6.0f (which prints the double-precision value rounded to an integer). It only ensures the output is at least six characters wide, so for +0.0 it prints " 0" and for -0.0 it prints " -0"; but for e.g. 1e9 it still prints "1000000000".

If the value is very large, say 1e300, and you want it to print that value instead of a number that has the first 20 or so digits 9 or 1 followed by zeroes, and then some pseudo-random digits arising from rounding errors, you do need about a thousand bits of storage (or a bit less than that if you're clever) to do so.
We (brucehoult and I) know that it can be done in fixed amount of space, even though "current wisdom" is somewhat different; and brucehoult even has discussed about this with the authors of the "current wisdom" who agreed. It does not matter if the output format is utterly silly, like say "%999999.9999", but the required space is determined by the magnitude and precision of the type used.

This is what some of my ramblings in this and the other related threads drive at: you could have a formatting function that splits into multiple implementations, so that if the format and range is limited to an "easier" one like the ±ddddddddddddd.ffffff above, you could use a very fast, no-extra-memory needing version even in an interrupt context, but it would just say No by returning an error if the value is not within the range and clamping is not allowed. For the general printf()-like conversion without any range limits, the wrapper function would just step up to the next formatter function, with the big-slow (what printf() uses) being the final backup version. If all you use are the clamped versions, you don't need the big-slow at all.

But you cannot really do this with printf() in any sane manner. For example, you cannot tell printf() to print say ±999999.999 for numbers that cannot be printed in ±6.3 format. It cannot tell if at runtime the big-slow is needed or not, so it would have to be included in the firmware anyway; and then the other versions become just extra bloat –– a runtime speed optimization! –– which is not in the purview of a standard/base C library at all.

Thus, a different interface is needed. As long as one is limited to the printf() interface, you have to be able to output the entire range and full precision. I guess you could play with some preprocessor macros that specify some global precision and range limits used, but then you need to recompile not only your embedded firmware, but also the base C/C++ library it uses, every time you change those... Urgh, definitely a land of bugs and odd behaviour awaits there.

brucehoult · « **Reply #60 on:** August 17, 2022, 02:26:46 am »

Quote from: westfw on August 17, 2022, 01:08:58 am

Quote
Steele & White correctly show that to do the job properly (i.e. accurately in every case), you need to use multi-word arithmetic, and not just 2 or 3 words but numbers hundreds or thousands of bits long.
Is the amount of storage required dependent on the number of digits that you actually want to print?

No, the actual storage needed is dependent on the absolute value of the exponent. If you're not pushing the limits of 10^±308 (or 10^±38 for float) then you don't need anywhere near as much.

Quote

I would think that printing "several fewer" digits than are accurate in the internal representation would be easier.

Sure, if you want to print only an approximate rounded value then that's always much easier, especially if you don't care if something right on the cusp sometimes rounds the wrong way.

Quote

The desire to "print any floating point number to a decimal string, and then convert the string back to a floating point number, and it will have the EXACT SAME bits as the original one" seems an unnecessarily high bar, for the typical embedded system displaying a number to a human user. :-(

Sure, which is why embedded systems typically take all kinds of shortcuts, including arithmetic that isn't IEEE-compliant in the first place.

If you just want to put the value into something you can transmit over a communication channel that isn't binary-safe, you can print hex rather than decimal. Or some kind of Base-64 if you don't mind a little more bit shuffling.

brucehoult · « **Reply #61 on:** August 17, 2022, 02:39:11 am »

Quote from: Nominal Animal on August 17, 2022, 02:26:07 am

if the format and range is limited to an "easier" one like the ±ddddddddddddd.ffffff above, you could use a very fast, no-extra-memory needing version even in an interrupt context, but it would just say No by returning an error if the value is not within the range and clamping is not allowed.

Sure, if you're allowed to print "OVERFLOW" or clamp to the maximum value or something then no problem.

Quote

But you cannot really do this with printf() in any sane manner. For example, you cannot tell printf() to print say ±999999.999 for numbers that cannot be printed in ±6.3 format. It cannot tell if at runtime the big-slow is needed or not, so it would have to be included in the firmware anyway; and then the other versions become just extra bloat –– a runtime speed optimization!

The main issue in small machines is that you need to have the MEMORY SPACE available for the big slow version in any case, whether it's a static buffer, heap space that is maybe never deallocated, or stack space. I tend to favour stack space (about 1 KB for IEEE double) because usually you are printing from your program main function (or near to it), and you can use that stack space for computation code when you are not printing.

Quote

–– which is not in the purview of a standard/base C library at all.

It's entirely appropriate for the C library to pick a faster version that uses less memory when it knows it will do the job. That's extra code space, but code space is maybe less likely to be running out. If code space *is* a problem then compile you libc with appropriate options.

Nominal Animal · « **Reply #62 on:** August 17, 2022, 02:56:49 am »

Quote from: brucehoult on August 17, 2022, 02:39:11 am

The main issue in small machines is that you need to have the MEMORY SPACE available for the big slow version in any case, whether it's a static buffer, heap space that is maybe never deallocated, or stack space. I tend to favour stack space (about 1 KB for IEEE double) because usually you are printing from your program main function (or near to it), and you can use that stack space for computation code when you are not printing.

Me too; and splitting the formatting function into helpers not only makes the code easier to work with, but also ensures stack space is "reused" and thus minimized.

(GCC in particular is not very good at determining when it can reuse the same storage for different local variables in the same function.)

Quote from: brucehoult on August 17, 2022, 02:39:11 am

Quote
–– which is not in the purview of a standard/base C library at all.
It's entirely appropriate for the C library to pick a faster version that uses less memory when it knows it will do the job. That's extra code space, but code space is maybe less likely to be running out. If code space *is* a problem then compile you libc with appropriate options.

Well, I'd like to agree, but my (limited and not at all recent) experience with existing C library implementations is that the developers tend to reject such, saying it is outside the scope/purview of their project.

I don't know how I could have worded that better, but it seems to be so hard to get such stuff included in the existing open source standard C or base C libraries (like newlib) that I definitely would not bother even trying. (Which is one reason I've been dabbling in designing something better for C from scratch.)

In this particular case, there is no need to throw the entire base C library away, just drop printf() (or at least its ability to print floating point numbers), and use something else instead.

peter-h · « **Reply #63 on:** August 17, 2022, 09:48:36 am »

As I think I've said before, ever since C came out for embedded, say 40 years ago, one had

- integer-only printf options (mine was a stripped down fuller one, ints and longs, and I called it ilsprintf() )
- non-IEEE floats (the somewhat weird accuracy specs are irrelevant, especially on output where one is mostly doing e.g. %7.2f)
- no doubles (doubles have only highly esoteric applications and practically never in embedded systems)

None of the above need a heap, mutexes, etc. Even IEEE single floats don't need a heap.

DiTBho · « **Reply #64 on:** August 17, 2022, 12:08:31 pm »

Quote from: Nominal Animal on August 17, 2022, 02:56:49 am

drop printf()

precisely!

it's simpler to have a show() method for each datatype.
- it consumes less resources
- it's less error-prone
- it's embedded-friendly
- it can be easily customized on demand

PlainName · « **Reply #65 on:** August 17, 2022, 01:41:50 pm »

Quote

it's simpler to have a show() method for each datatype.

Then you end up with:

Quote

puts("blah ");
show_type(val);
puts("blah blah: ");
show_type(stuff);
puts("\r\n");

Which just makes your code that little harder to read and change. So now you have an overarching function that encapsulates all that, perhaps giving an easily-modified string for the format and then the show stuff as paramaters, and all of a sudden you've reinvented printf();

If you want the show stuff then why not just rewrite printf to use those functions instead of whatever it came with? Then you get what you want and the code still remains compatible with standard libraries.

DiTBho · « **Reply #66 on:** August 17, 2022, 02:47:34 pm »

Quote from: dunkemhigh on August 17, 2022, 01:41:50 pm

Then you end up with:
..
Which just makes your code that little harder to read and change.

Yup, a bit cumbersome and quite a bit more verbose, but it's perfectly fine!

Note that, as mentioned by @SiliconWizard, C11 can give you a generic "show()" method thanks to "_Generic", so you won't need to write show_${datatype}, just show() and the compiler will do the job for you.

In Ada you *mostly* write that show_${datatype} way(1), and it's perfectly ok!

Code: [Select]

datatypes="uint32, uint16, uint8, ... boolean, ..."
for datatype in ${datatypes}
       do
              use show_${datatype}()
       done

In C, show_${data_type}():
- is simpler to be read (especially when people pad printf(...) with 1981548...32171 parameters)
- is simpler for breakpoints or breakouts
- is simpler for patches, since you can apply a patch to each line and it makes sense

and, on the top of this, to *mimic* printfs() a language like myC would need to involve
- autoreference
- autotype
- monads
- local pool (with auto free) to support list but avoiding to use malloc&C
- list

All of this, just to allow you to write something like this

Code: [Select]

show
(
    @list
    (
        @string("blah "),
        @uint32(val),
        @string("blah blah: "),
        @stuff(stuff),
        @string("\r\n"),
    )
);

(1) to *mimic* printf, Ada offers even more complex inner mechanisms.

Is it worth it .... ?

I don't think so. Hence, thanks, but no

Nominal Animal · « **Reply #67 on:** August 17, 2022, 04:23:04 pm »

Quote from: dunkemhigh on August 17, 2022, 01:41:50 pm

Which just makes your code that little harder to read and change.

Yes: a formatting interface where you construct strings piecewise using separate functions, will take some effort to get used to.

Thing is, if you want the benefits, something has gotta give.

(Side note: Maintainability is important, but not because it makes developers jobs easier. It is important, because it reduces the number of bugs. The true impact to maintainability cannot be measured how easy some code is to read or how nice it looks like; it must be measured by how likely it is for an expected level of developer to modify and maintain such code without introducing new bugs, how likely they are to notice bugs in existing code, and how easy it is for them to fix such bugs.)

In the Constructing short C strings in limited or constrained situations thread from a year ago, in reply #24, I outlined an example of a replacement for printf()-like formatting I've been mulling in my mind. It is completely different to how printf() specifiers work, but would allow the kind of features that now require separate formatting functions ("show_type()").

Quote from: dunkemhigh on August 17, 2022, 01:41:50 pm

If you want the show stuff then why not just rewrite printf to use those functions instead of whatever it came with?

Because printf() formatting interface does not let you specify things like "clamp to this width".

When you extend the printf() interface, the formatting gets even uglier; just look at what hoops the Linux kernel does: adds extra characters after the standard conversion specifier pattern. They get really hairy really quickly. (The benefit, however, is that the compiler can still check the variadic arguments for their correct types, and thus help avoid bugs.)

Or, put more simply, the proper interface to formatting functions is not just show_type(variable); it is
status = format_type(destination, value, options);
and it is exactly the options part that requires a non-printf interface.

It is just that it is easiest to implement as separate functions, because a formatting specification language is something built on top of those.
(The standard C library implementations typically do not expose those separate functions at all. Even optional functions like strfromf() use the printf formatting language.)

DiTBho · « **Reply #68 on:** August 17, 2022, 05:54:39 pm »

Quote from: Nominal Animal on August 17, 2022, 04:23:04 pm

Or, put more simply, the proper interface to formatting functions is not just show_type(variable); it is
status = format_type(destination, value, options);
and it is exactly the options part that requires a non-printf interface.

Yup, even better: it's the step++ from the total demolition of printf;

day1, first step, you delete printf.c or whatever is the filename from your libc, then you recompile, but you have to be brave, don't listen to any colleague, you have to cross out every single mention in every single book that mentions it with an ash black marker, and hack every single eBooks with sed search and replace (with nothing, which is what printf deserves ... nothing of nothing)
day2, you spend the whole day celebrating the death of the big fat evil dragon. It's vanished, gone, never existed and never it will, may he rest in peace burning to pieces in the deepest worst computer science ideas ever.
If you have a wwf badge because you have feelings for the protection of rare animals, well ... I know, I know, it might sound cruel, but in this case it's all a matter of the Newton's third law to move forward you have to leave something behind
day3, you introduce show_type(), simple piece of code, suitable for light C support library
day4, you spend the whole day appreciating how best it is
day5, next step, you eat a pancake batter and you make it better, so you evolve show_type into format_type(), a bit more complex, but not so much, more flexible and much more powerful

One week, 5 steps! may be you will need to extend day5 to a second week to perfection it, but hey? It's ok, it couldn't get any better

DiTBho · « **Reply #69 on:** August 17, 2022, 05:57:22 pm »

(
next next step, in the far far future, likely in a different galaxy where people measure the power of things by talking about light side or dark side of things (the dark side is always unlimited, don't question) if are willing to add more features to a language like myC, well ... the best way to make it *dark side* is to ...

... embed the behavior within data_type and let format_type() take advantage of it

)

Nominal Animal · « **Reply #70 on:** August 17, 2022, 07:23:09 pm »

Quote from: DiTBho on August 17, 2022, 05:57:22 pm

embed the behavior within data_type

Yup, exactly.

Let's say we construct a completely new string formatting system, based on say
'{' [ number ':' ] type [ '/' options ] '}'
where number identifies the parameter to be formatted, type is a short string identifying the formatter, and options is a string passed as-is to the formatter. The number is useful in localization, so that the order of items in the string can differ from the order of the parameters to the function. The actual formatter function interface could be something like say
int formatter(buffer, pointer-to-data, options, context)

This assumes that instead of passing variadic arguments to be formatted (to the actual printf-replacement), we pass their addresses instead. This avoids the standards-required type promotions for variadic arguments, like float to double. So, one could print e.g. Hello world to Serial using
format_string(Serial, "Hello, world!\n");
or say a debugging message related to the user poking a touch panel using
format_string(Serial, "Touch event at ({1:i},{2:i})\n", &x, &y);

If we limit the implementation to systems that support ELF object files and other formats with section support, we can use section magic to autoregister formatters at build time. That way, no RAM is wasted in describing the base set of formatters. Then, if someone wants to implement their own formatter, all they need to do is define the formatter function, and then use a preprocessor macro, say
DECLARE_FORMATTER(formatter, "type", context);
to add their own to the same list. The macro emits a data structure to a dedicated section, so that the linker gathers them all into a single array in memory. On a 32-bit system, it would make sense to limit the type names to 8 characters, so that the structure would be 16 bytes (and reside in ROM/Flash).
(It would be nice to sort that array as a post-linking step, so that a binary search could be used to quickly find the proper formatter.)

Since the linker uses the addresses of the functions declared as formatters, it can trivially leave any undeclared formatters out from the final binary. This means preprocessor macros can be easily used to control what formatters are available by default.

On an AVR with separate address spaces for Flash and RAM, you could have an option that indicates when the pointer points to the not-default address space, or you could simply have different formatters for e.g. strings in Flash vs. strings in RAM.

If the buffer interface is not just an array, but also supports a window to the output buffer (so that in cases where the data is too long to fit into the buffer, only some of it is stored in the buffer, and another identical call but with a different window will generate more of it later), the same formatting interface can be bolted on to any kind of stream, FIFO, socket, file, or other contraption.

The interface itself should be re-entrant, so that each formatter can use the same interface to implement itself. For example, a formatter for date might use the context parameter to point to the current "locale", and format the date using said locale automagically, by picking the formatting string based on the locale. There is some risk here for accidentally deep recursion, though, if users create silly formatters.. but C does not protect silly people from creating footguns, so I think it is mostly a documentation issue.

This is just an example, but hopefully gives an idea how one could replace printf with something much better, much easier to control –– and based on user-extensible formatter functions. And all designed to support embedded development, especially when tight on RAM.

SiliconWizard · « **Reply #71 on:** August 17, 2022, 07:31:01 pm »

One real problem with formatted printing is the security bomb luring if the formatting string either doesn't match the arguments or it's getting modified at run-time for an unexpected reason.
If the formatting string is a constant, the compiler may be smart enough to optimize this and generate code that would essentially look like the manual, separate function for each part of the format, but I'm not too sure about that.

PlainName · « **Reply #72 on:** August 17, 2022, 07:39:34 pm »

Quote

Hello world to Serial using
format_string(Serial, "Hello, world!\n");
or say a debugging message related to the user poking a touch panel using
format_string(Serial, "Touch event at ({1:i},{2:i})\n", &x, &y);

That's the kind of thing I had in mind when I said you'd¹ just reinvented printf.

---
[1] The royal 'you'

tellurium · « **Reply #73 on:** August 17, 2022, 08:28:57 pm »

Quote from: DiTBho on August 17, 2022, 05:54:39 pm

day5, next step, you eat a pancake batter and you make it better, so you evolve show_type into format_type(), a bit more complex, but not so much, more flexible and much more powerful

Sounds interesting. Could you share a couple of examples, please? Like, a typical log line that shows a bunch of values, or something like that.

nctnico · « **Reply #74 on:** August 17, 2022, 11:23:07 pm »

Quote from: brucehoult on August 17, 2022, 02:39:11 am

It's entirely appropriate for the C library to pick a faster version that uses less memory when it knows it will do the job. That's extra code space, but code space is maybe less likely to be running out. If code space *is* a problem then compile you libc with appropriate options.

Precisely! The problem isn't printf but the library implementation may not suit your needs. I use 3 or 4 different printf implementations in my embedded projects depending on formatting features needed. In the end the C library designers where pretty clever to come up with a text printing & formatting solution / convention (=printf) which is both versatile and can be light-weight if it has to.

Having basic type specific print functions is the worst idea ever. I've seen several people do that and it always ends in a mess. It isn't standard, it is not easy to use and circles back to why printf exists in the first place: it is a good basic way of formatting numbers. Don't try to re-invent a square wheel, printf is the perfectly round wheel.

If you still persist in using type specific printing then switch to C++ and use streams. You can even overload the formatting to print your own defined type.

westfw · « **Reply #75 on:** August 18, 2022, 02:28:27 am »

So let me see if I understand the "problems" with printf():

The whole varargs thing with different types is dangerous (fixed in modern compilers via type checking the arguments against the format?) And usually inefficient.
Bloat due to runtime selection of format - binary must include code for all possible types, whether or not they're used.
Output is permitted to overflow field widths.
Floating point output takes lots of RAM (common to all "correct" floating point output functions in all languages?)

Nominal Animal · « **Reply #76 on:** August 18, 2022, 05:14:10 am »

Quote from: westfw on August 18, 2022, 02:28:27 am

So let me see if I understand the "problems" with printf():

Plus, the format is fixed. You cannot do your own extensions to it (except by prefixing or suffixing format specifiers that look like ordinary printable content).

As a practical and relevant example, take a look at the format and format_arg function attributes.

Quote from: SiliconWizard on August 17, 2022, 07:31:01 pm

If the formatting string is a constant, the compiler may be smart enough to optimize this and generate code that would essentially look like the manual, separate function for each part of the format, but I'm not too sure about that.

I've never seen anything like that at all. What most C compilers do do, however, is change printf() and fprintf() calls with a formatting string without conversions into fputs(), but that's it.

Quote from: dunkemhigh on August 17, 2022, 07:39:34 pm

That's the kind of thing I had in mind when I said you'd just reinvented printf.

What an odd way to define "reinvent".

In that other thread, I explained when formatting strings are useful (localization). It is not about ease of use, it is about providing useful functionality.

If you replace printf() with an interface that allows you to register a formatting function for a stack trace or processor state dump by just writing it as a function with a specific signature, and then use that formatter as part of your formatting strings everywhere in your firmware/application, is it reinventing?

For one, the printf() formatting specification cannot support that. The interface you call "reinvented" can support that in plain ANSI/ISO C89. With ISO C99 or later and ELF-compatible object formats, you can make it much more powerful.

I think you just do not think there is any problem in the printf() interface, and believe it is someone elses job to fix any implementation issues in it if anyone does have a problem. That is a fallacy, because this thread exists: there are problems that cannot be adequately solved within the existing printf() family of functions.

Now, quite a few posts in this thread have pointed out that it is perfectly possible for a printf() to be thread-safe, and that it is not necessary for printf() to require heap. It has also been shown that an implementation printing floating-point numbers can often –– given the formats and values typically printed –– work with very little run-time temporary RAM use (stack!); but, depending on the numerical value, the conversions can require well over a hundred bytes of stack space.

Let's say you make a variant of printf() that breaks the standard, and enforces explicit field widths. That is, if you tell it to print say %+20.9f (±ddddddddd.ddddddddd), and the value cannot fit in that, it will return an error. To make sure your floats don't exceed the valid range, you do
if (val < -999999999.999999999f) val = -999999999.999999999f;
if (val > +999999999.999999999f) val = +999999999.999999999f;
and then wonder why you still get errors. (The reason is that 999999999.999999999f is the same as 1000000000.0f, and even the same as 1000000032.0f.)

Thus, the best an implementation could do, is support only a subset of floating-point formatting, specifically those with fixed width (even if it the output is not padded to that width) so that it can clamp any values outside that width to the maximum value (or an overflow string), so that everything it does support, can be done with strict, minimal stack use limits, in a re-entrant, thread-safe manner.

The goddamn annoying thing with that is that you will be reimplementing printf() from scratch, and instead of fixing the problems and giving the users of the function new ways to solve the problems that lead to this reimplementation in the first place, you'll be just patching issues with spit and bubblegum.

That is much, much worse than "reinventing" (replacing) an interface; I'm talking from experience here. Not only are you not really fixing anything, just papering over the problems that halt the current project –– while doing at least the equivalent amount of work a replacement would need ––, but you will end up having to maintain and tweak it for years to come because none of the users of said bubblegum-fixed-printf() will be happy with the tradeoffs you chose.
It is exactly why developers who are stuck at papering over problems like that, instead of fixing them, are either sociopaths or burn out.

The above should explain why I say "fuck no" to "just fix printf()", and instead reach for something completely different. printf() is a hammer, a tool. Replacing it with a laser engraver is not reinventing a hammer, it is replacing one tool with another that is designed to be better at the task at hand (if you know what you're doing; not all developers do).

DiTBho · « **Reply #77 on:** August 18, 2022, 09:12:08 am »

Quote from: nctnico on August 17, 2022, 11:23:07 pm

Precisely! The problem isn't printf

the problem *IS* printf from its grungy interface down

Quote from: nctnico on August 17, 2022, 11:23:07 pm

I use 3 or 4 different printf implementations in my embedded projects depending on formatting features needed.

yeah, like those who like putting fresh paint on grungy railings and call it "well done job" ain't it?

Quote from: nctnico on August 17, 2022, 11:23:07 pm

Having basic type specific print functions is the worst idea ever. I've seen several people do that and it always ends in a mess.

oh, really?

It's not basic-type, it's show_${datatype} for *every single type*, including your typedef.
Everything that needs a format MUST have its format_datatype.

With Ada we have been doing show_${datatype} for years for things that must pass DO178B certifications, nobody has never claimed shit out of that.

Quote from: nctnico on August 17, 2022, 11:23:07 pm

In the end the C library designers where pretty clever

Clever, sure it's so clever that it’s incredibly easy to get undefined behavior, which, worse still, are quite a few cases of undefined behavior usually produce innocuous looking results, so it’s fairly common for bugs to exist for years before somebody tests under the right circumstances necessary to demonstrate the bug.

DiTBho · « **Reply #78 on:** August 18, 2022, 09:34:24 am »

Quote from: westfw on August 18, 2022, 02:28:27 am

The whole varargs thing with different types is dangerous (fixed in modern compilers via type checking the arguments against the format?) And usually inefficient

yup, when you design a language with pure interfaces in mind, you will find it *VERY* annoying that you have to waste time on type-checking the arguments against the format.

It's all wrong from the design point of view, and it's like telling everyone "stepping on shit brings good luck".

So you have what when you step on shit with Gcc v11? A verbose warning message, telling you messed up something in a printf?

sorry, you stepped on shit at line 59310294231, you wrote "%s" but argument 2938109 is uint32 not a string

It's sooooooooooooo dangeroussssssssssssss, you could print the whole memory until putS will reach '\0', but don't worry Gcc keeps getting better an it will catch these bugs ... most of times, and if not, well ... anything that can go wrong will go wrong, what do you want? don't blame the compiler if step on shit, human!

Quote from: westfw on August 18, 2022, 02:28:27 am

Bloat due to runtime selection of format - binary must include code for all possible types, whether or not they're used.
Output is permitted to overflow field widths.
Floating point output takes lots of RAM (common to all "correct" floating point output functions in all languages?)

yes, yes, yes

plus, it misses decent support for fixedpoint, and - talking about my last project - the support for complex numbers requires patches an dirty hacks.

nctnico · « **Reply #79 on:** August 18, 2022, 09:55:06 am »

Quote from: Nominal Animal on August 18, 2022, 05:14:10 am

Plus, the format is fixed. You cannot do your own extensions to it (except by prefixing or suffixing format specifiers that look like ordinary printable content).

You can. Just write your own extension into printf. Your own printf will take presedence over the one supplied by the C library.

PlainName · « **Reply #80 on:** August 18, 2022, 10:09:58 am »

Quote from: Nominal Animal on August 18, 2022, 05:14:10 am

Quote from: dunkemhigh on August 17, 2022, 07:39:34 pm
That's the kind of thing I had in mind when I said you'd just reinvented printf.
What an odd way to define "reinvent".

In that other thread, I explained when formatting strings are useful (localization). It is not about ease of use, it is about providing useful functionality.

If you replace printf() with an interface that allows you to register a formatting function for a stack trace or processor state dump by just writing it as a function with a specific signature, and then use that formatter as part of your formatting strings everywhere in your firmware/application, is it reinventing?

'Reinvent' doesn't necessarily mean 'the exact same thing'. However, in the sense I used it, it was the single wrapper function vs many lines of multiple functions that I was focusing on.

DiTBho · « **Reply #81 on:** August 18, 2022, 10:24:58 am »

Quote from: nctnico on August 18, 2022, 09:55:06 am

You can. Just write your own extension into printf. Your own printf will take presedence over the one supplied by the C library.

I can do a lot of things, including something like this
(it's the fist step, NominalAnimal's solution will be the step++)

Code: [Select]

void cplx_fx1616_eshow
(
      p_char_t prologue, /* it can be empty, "" */
      p_cplx_fx1616_t p_data,
      p_char_t epilogue  /* it can be empty, "" */
);

with myC, there is a mechanism to auto-detect the datatype, therefore

Code: [Select]

void eshow
(
      p_char_t prologue,
      p_autotype p_data, /* C11 offers "_generic", it's different, it works at compile-time, but useful */
      p_char_t epilogue 
)
{
      switch(type_of(p_data))
      {
            ...
            case p_cplx_fx1616 /* the auto-type is passed as p_datatype */
            {
                  void cplx_fx1616_eshow(prologue, p_data, epilogue);
            }
            ...
      }
}

so I can write this

cplx_fx1616_t C;
p_cplx_fx1616_t p_C;

p_C = get_address(C);

C'let(p_C, "1.1 i1.1"); /* it will invoke cplx_fx1616_let */
C'pow(p_C, p_C); /* it will invoke cplx_fx1616_pow */
C'eshow("X^X = ", "\n"); /* it will invoke cplx_fx1616_eshow */
or alternatively
eshow("X^X = ", p_C, "\n");

I successfully used this approach for matrix computations, LUP decompositions, and complex number algebra.

C++ will look similar, C needs to be more verbose, but you can adapt it to C11 at least for the show wrapper.

Nominal Animal · « **Reply #82 on:** August 18, 2022, 11:25:02 am »

Quote from: dunkemhigh on August 18, 2022, 10:09:58 am

Quote from: Nominal Animal on August 18, 2022, 05:14:10 am
Quote from: dunkemhigh on August 17, 2022, 07:39:34 pm
That's the kind of thing I had in mind when I said you'd just reinvented printf.
What an odd way to define "reinvent".

In that other thread, I explained when formatting strings are useful (localization). It is not about ease of use, it is about providing useful functionality.

If you replace printf() with an interface that allows you to register a formatting function for a stack trace or processor state dump by just writing it as a function with a specific signature, and then use that formatter as part of your formatting strings everywhere in your firmware/application, is it reinventing?

'Reinvent' doesn't necessarily mean 'the exact same thing'. However, in the sense I used it, it was the single wrapper function vs many lines of multiple functions that I was focusing on.

The dictionary definition does imply 'substantially the same thing', though.

Anyway, do note that current standard/base C libraries do not expose those sub-functions at all, only the topmost string-formatting interface.
Thus, even if you bolt on top a string-formatting interface, it is substantially different from printf(), exactly because it is all about those sub-formatters, and letting the "end-user" implement their own if they wish –– and at minimum, set their own limits (like float precision and range) to those formatters, making the big difference here.

Quote from: DiTBho on August 18, 2022, 10:24:58 am

Quote from: nctnico on August 18, 2022, 09:55:06 am
You can. Just write your own extension into printf. Your own printf will take presedence over the one supplied by the C library.
I can do a lot of things, including something like this

If you write a replacement, you're not extending anything. You're just replacing something with something else.

If you implement a printf() that uses a different formatting than that specified in the C standard, the compiler won't see it your way, and will complain. (Of course, you can silence those complaints.) You cannot extend the formatting by adding a new feature, unless it is reinterpreting the meaning or intent of an already supported feature, because again, the compiler won't see it your way, and will complain. You can replace it with something else, but even if it is almost exactly like the standard C printf formatting string, the compiler will still complain when it sees things that differ from the standard – or you drop all support for the compiler to check the formatting at all.

Therefore, no extension is possible: only replacement.

nctnico · « **Reply #83 on:** August 18, 2022, 11:29:04 am »

Quote from: Nominal Animal on August 18, 2022, 11:25:02 am

If you implement a printf() that uses a different formatting than that specified in the C standard, the compiler won't see it your way, and will complain.

GCC seems to have already thought about that (see function attributes):

format (archetype, string-index, first-to-check)
The format attribute specifies that a function takes printf, scanf, strftime or strfmon style arguments which should be type-checked against a format string. For example, the declaration:

extern int
my_printf (void *my_object, const char *my_format, ...)
__attribute__ ((format (printf, 2, 3)));

causes the compiler to check the arguments in calls to my_printf for consistency with the printf style format string argument my_format.

The parameter archetype determines how the format string is interpreted, and should be printf, scanf, strftime or strfmon. (You can also use __printf__, __scanf__, __strftime__ or __strfmon__.) The parameter string-index specifies which argument is the format string argument (starting from 1), while first-to-check is the number of the first argument to check against the format string. For functions where the arguments are not available to be checked (such as vprintf), specify the third parameter as zero. In this case the compiler only checks the format string for consistency. For strftime formats, the third parameter is required to be zero.

In the example above, the format string (my_format) is the second argument of the function my_print, and the arguments to check start with the third argument, so the correct parameters for the format attribute are 2 and 3.

The format attribute allows you to identify your own functions which take format strings as arguments, so that GCC can check the calls to these functions for errors. The compiler always (unless -ffreestanding is used) checks formats for the standard library functions printf, fprintf, sprintf, scanf, fscanf, sscanf, strftime, vprintf, vfprintf and vsprintf whenever such warnings are requested (using -Wformat), so there is no need to modify the header file stdio.h. In C99 mode, the functions snprintf, vsnprintf, vscanf, vfscanf and vsscanf are also checked. Except in strictly conforming C standard modes, the X/Open function strfmon is also checked as are printf_unlocked and fprintf_unlocked. See Options Controlling C Dialect.
format_arg (string-index)
The format_arg attribute specifies that a function takes a format string for a printf, scanf, strftime or strfmon style function and modifies it (for example, to translate it into another language), so the result can be passed to a printf, scanf, strftime or strfmon style function (with the remaining arguments to the format function the same as they would have been for the unmodified string). For example, the declaration:

extern char *
my_dgettext (char *my_domain, const char *my_format)
__attribute__ ((format_arg (2)));

causes the compiler to check the arguments in calls to a printf, scanf, strftime or strfmon type function, whose format string argument is a call to the my_dgettext function, for consistency with the format string argument my_format. If the format_arg attribute had not been specified, all the compiler could tell in such calls to format functions would be that the format string argument is not constant; this would generate a warning when -Wformat-nonliteral is used, but the calls could not be checked without the attribute.

The parameter string-index specifies which argument is the format string argument (starting from 1).

The format-arg attribute allows you to identify your own functions which modify format strings, so that GCC can check the calls to printf, scanf, strftime or strfmon type function whose operands are a call to one of your own function. The compiler always treats gettext, dgettext, and dcgettext in this manner except when strict ISO C support is requested by -ansi or an appropriate -std option, or -ffreestanding is used. See Options Controlling C Dialect.

PlainName · « **Reply #84 on:** August 18, 2022, 11:36:45 am »

Quote

The dictionary definition does imply 'substantially the same thing', though.

Anyway, do note that current standard/base C libraries do not expose those sub-functions at all, only the topmost string-formatting interface.
Thus, even if you bolt on top a string-formatting interface, it is substantially different from printf(), exactly because it is all about those sub-formatters, and letting the "end-user" implement their own if they wish –– and at minimum, set their own limits (like float precision and range) to those formatters, making the big difference here.

I shall try to be clearer and specific. This lead from:

Quote from: DiTBho on August 17, 2022, 12:08:31 pm

Quote from: Nominal Animal on August 17, 2022, 02:56:49 am
drop printf()

precisely!

it's simpler to have a show() method for each datatype.
- it consumes less resources
- it's less error-prone
- it's embedded-friendly
- it can be easily customized on demand

And from that I presumed he meant to do the kind of thing that I showed as:

Code: [Select]

puts("blah ");
show_type(val);
puts("blah blah: ");
show_type(stuff);
puts("\r\n");

And that looked daft to me. It would be better with a wrapper function to hid all that, so you could easily change the format string (or make it completely different). With a printf-alike wrapper you can change everything just by changing the string, which seems a natural way to progress. You start with a call, then some more and pretty soon you think "I should shove all that in a function".

Hence I suggested you've just reinvented printf. Not THE printf, but a printf-style function.

In that sense, even thought I disagree with the dictionary definition you've found, that would indeed be a pretty much exact replica of a printf style function, but it wouldn't be printf. That's as opposed to spelling out each string and value function each time, as illustrated above.

Hope that clears it up and we can move on.

Nominal Animal · « **Reply #85 on:** August 18, 2022, 11:38:45 am »

Quote from: nctnico on August 18, 2022, 11:29:04 am

Quote from: Nominal Animal on August 18, 2022, 11:25:02 am
If you implement a printf() that uses a different formatting than that specified in the C standard, the compiler won't see it your way, and will complain.
GCC seems to have already thought about that (see function attributes):

Nope. What you can do with the format and format_arg function attributes, is to tell the C compiler that your own function uses the exact same string format specification than printf()/scanf()/strftime()/strfmon().

It does not let you add or change the string format specification. You cannot extend it; you can only replace it wholesale with something else, and lose the support of the compiler in checking that it matches the arguments to your function.

You'd know this, if you'd ever used any of this in practice. Instead, you're just wasting peoples time by making incorrect claims about things you know nothing about. Please do something useful instead.

nctnico · « **Reply #86 on:** August 18, 2022, 11:54:05 am »

It looked like a sensible extension at first glance. Maybe some compilers other than GCC offer such a function. But in the end it is just compiler checking which basically is an extra service to the programmer. It is what it is. Still, following what is established as a standard is way way better than coming up with something else which is non-standard. You also need to think about the fact that C certainly has it's warts and some programming problems are better solved using a different language rather than trying to contort C (and the supplied libraries) into something it isn't. The latter will only get ugly.

DiTBho · « **Reply #87 on:** August 18, 2022, 12:11:15 pm »

Quote from: nctnico on August 18, 2022, 11:54:05 am

The latter will only get ugly.

appearing ugly-code is a matter of personal tastes.
being shitty because full of bugs is a matter of wasting human resources.

nctnico · « **Reply #88 on:** August 18, 2022, 12:23:17 pm »

Quote from: DiTBho on August 18, 2022, 12:11:15 pm

Quote from: nctnico on August 18, 2022, 11:54:05 am
The latter will only get ugly.

appearing ugly-code is a matter of personal tastes.

It is not about ugly code but creating an unmaintainable mess. 'I can do this better' and 'not invented here' is strong in some people while their documentation skills are not. That leaves a pile of code that is costly to maintain and the company is often better served by rewriting it in a way people actually understand what is going on following accepted language & library standards. Again, if resilience and reliability are a concern, then C is not the best language to start with as it takes quite a bit of experience & effort to write software in C that is hardened against faults & attacks.

Nominal Animal · « **Reply #89 on:** August 18, 2022, 12:42:56 pm »

Quote from: dunkemhigh on August 18, 2022, 11:36:45 am

I shall try to be clearer and specific.

Thank you for the effort; I now see your point.

Quote from: dunkemhigh on August 18, 2022, 11:36:45 am

Code: [Select]
puts("blah "); show_type(val); puts("blah blah: "); show_type(stuff); puts("\r\n");And that looked daft to me. It would be better with a wrapper function to hid all that, so you could easily change the format string (or make it completely different).

Sure; that exact interface would be utterly daft.

You wouldn't even need a wrapper function to hid all that, because a preprocessor varargs macro could split each parameter to a separate function call like expression, which the _Generic macro facility can expand to individual function calls based on the type of the argument. In other words, letting the programmer write
show("blah", val, "blah blah: ", stuff, "\r\n");
and have the preprocessor expand it into the code you quoted.

I thought I explained this in #67, where I said

Quote from: Nominal Animal on August 17, 2022, 04:23:04 pm

Or, put more simply, the proper interface to formatting functions is not just show_type(variable); it is
status = format_type(destination, value, options);
and it is exactly the options part that requires a non-printf interface.

(Requires a non-printf interface, because there is no way to augment/change/add to the printf string specification to support additional options, except by placing them outside the formatting specifiers like the Linux kernel does – it adds all kinds of variants for modifying how pointer types are shown –, and that seems more confusing and harder to maintain than any of the alternatives to me.)

The destination is a pointer to a structure that describes the buffering available for formatting, as well as options for flushing the buffer if it is not large enough; value is the value to be formatted, compatible with type, and options specifies the formatting options, including range limitation for floats, number of decimal digits, whether the output is padded or not, whether the output should clamp values outside the allowed range or error out, and so on, whatever the formatter supports.

This, too, can be made easier to use via preprocessor macros and _Generic, so that an uniform interface similar to
status = format(destination, value); or
status = format(destination, value, options);
can be used; the former being shorthand, allowing omitting an unneeded NULL pointer for default options.

The overarching _Generic macro can be exposed in a header file, and omitted if the user wants to override it with a different one that supports a different set of available types, even on a file-by-file basis. Note how this immediately lets the linker omit functions not used for formatting, giving developers in very tightly constrained situations very fine-grained control over all this. I can imagine even having a small subset of interrupt-safe formatters in specific situations.

The format-string formatter, is then an alternative interface to the underlying formatting functions. It could be for example
status = format_using(destination, formatting-string, &value1, &value2, &value3);
where formatting-string is a string literal or string variable, containing both string data to be emitted and formatting specifications (including (optional) options in string form), or even
status = format_using(destination, formatting-string, &value1, &value2, &value3, &options1);
if the formatting-string options part for the conversion of the first value says to take the fourth variadic argument as the options for it.

The main reason for one wanting such a format-string formatter is localization and multi-language support.

Quote from: dunkemhigh on August 18, 2022, 11:36:45 am

Hence I suggested you've just reinvented printf. Not THE printf, but a printf-style function.

Okay, I understand.

In a real sense, anything in C that takes a formatting string and optionally additional values to be formatted, is more or less printf-style; the important bits are in the details. Anything that simply replicates the functionality of printf/scanf/etc. is "reinventing the wheel", and daft, yes.

Here, the key point is the individual formatting functions, with the "printf-style function" just being the highest-level abstraction, not the core interface.
The small individual-type formatting functions are an absolutely crucial detail, because everything builds on top of them: and the C printf interface just isn't designed to work this way at all. (I guess one has to try it and see how futile it is!)

Note, in particular, how I defined an example of such a formatting string specification in a way that allows trivial parsing of the formatting string, without having to understand the exact details of the format. I suspect even the options thing itself should be a structure with a pointer to a custom/private binary structure describing the formatting, a pointer to the C string specifying the formatting options if formatting-string interface is used, and a type identifier. This kind of spec makes the formatting language extensible, even at runtime, with very little overhead.

Quote from: dunkemhigh on August 18, 2022, 11:36:45 am

Hope that clears it up and we can move on.

Indeed it does. Thank you for spending the effort, I do appreciate it.

DiTBho · « **Reply #90 on:** August 18, 2022, 01:32:37 pm »

Quote from: nctnico on August 18, 2022, 12:23:17 pm

It is not about ugly code but creating an unmaintainable mess.

WHICH mess?
I wrote 15 libraries this way, no mess at all.

Code: [Select]

procedure Test is

   subtype Index is Positive range 95 .. 1223;

   procedure Put_Line ( I : in out Index; Name : String; Phone : Natural; Address : String; T : in out Time ) is
   begin
      Put (I, Index'Width);
      Put (": ");
      Put (Head (Name, 10, ' '));
      Put (" | ");
      Put (Tail (Phone'Img (Phone'Img'First + 1 .. Phone'Img'Last), 13, '0'));
      Put (" | ");
      Put (Head (Address, 20, ' '));
      Put (Year (T), Year_Number'Width);
      Put ("-");
      Put (Month (T), Month_Number'Width);
      Put ("-");
      Put (Day (T), Day_Number'Width);
      I := Positive'Succ (I);
      T := T + Duration (60 * 60 * 24 * 3);
      New_Line;
   end;

(Ada)

Code: [Select]

   95: Ashley     | 0001033438392 | Wellington, New Zeal 2015-  5- 24
   96: Aloha      | 0001087651234 | Hawaii, United State 2015-  5- 27
   97: Jack       | 0001082840184 | Beijing, China       2015-  5- 30
 1220: Ashley     | 0001033438392 | Wellington, New Zeal 2015-  6-  2
 1221: Aloha      | 0001087651234 | Hawaii, United State 2015-  6-  5
 1222: Jack       | 0001082840184 | Beijing, China       2015-  6-  8

is it mess?

DiTBho · « **Reply #91 on:** August 18, 2022, 01:50:59 pm »

Quote from: Nominal Animal on August 18, 2022, 12:42:56 pm

that exact interface would be utterly daft.

Show_${datatype} is the simplest, most coherent and fastest replacement to printf.

I wrote several complex libraries this way, and understood why Ada used in Avionics is full of examples where people write firmware exactly this way.

eshow_${datatype} with prologue and epilogue strings proved to be even more practical because most of times it's what you need to log or print.

This stuff misses a generic way to express the format in a parametric way, which is step++, but the pre-processor must be banned and demolished entirely.

Nominal Animal · « **Reply #92 on:** August 18, 2022, 03:06:51 pm »

Quote from: DiTBho on August 18, 2022, 01:50:59 pm

Quote from: Nominal Animal on August 18, 2022, 12:42:56 pm
that exact interface would be utterly daft.
Show_${datatype} is the simplest, most coherent and fastest replacement to printf.

It would be a daft interface in C, because it can only operate on one output stream.

The underlying implementation does need some way to specify the output stream (preferably as the first parameter for simplicity), so that the same functions can be used for all I/O, and not have to duplicate the code for each.

Similarly, having an ability to pass the formatting options, even if usually NULL, lets you do Useful Stuff. Consider the case where in most cases you use one way to format floats, except that in a few cases you use a slight variation. Do you duplicate the code, or do you pass an options so you can use the same function for both?

Finally, you do want the formatting function to report success or failure, even if it is ignored in most cases. Just consider how useful it is when debugging you notice that the call reported a failure, for example.

This means you end up with something that is closer to
status = format_type(destination, value, options);
instead.

If you really want, you can then make trivial wrappers (functions or preprocessor macros) that supply the default destination. If you want to allow the callers to specify either two or three parameters, with the two-parameter version being equivalent to passing 0 or NULL or some other default as the third parameter, you do need to use the preprocessor. But the ordinary preprocessor can indeed do this.

Now, for your myC, designed for extremely tightly constrained situations (not hardware capabilities, but strict requirements on the human developers), if you restrict all textual I/O to a single stream, then it isn't daft. But that is a completely different context, and a different programming language. The topic at hand is C after all, and not myC.

DiTBho · « **Reply #93 on:** August 18, 2022, 05:01:41 pm »

Quote from: Nominal Animal on August 18, 2022, 03:06:51 pm

It would be a daft interface in C, because it can only operate on one output stream.

the fist attempt directly uses console_out, like printfs uses putch, the second attempt uses a smart buffer

not a problem at all

/*
* convert a datatype into safestring
* safestring knows its size and len
* return ans_t
*
* ans.is_valid
* True on success
* False on failure
* ans.error
* contains the reason of failure
* - safestring has not enough buffer space
- - other errors
*/
ans_t form_${datatype}
(
p_safestring_t p_safestring, /* target */
p_char_t prologue,
p_${datatype }_t p_data,
p_char_t epilogue
);

I wrote it this afternoon, it works fine for me on both C and myC so it doesn't use too weird mechanisms.

Quote from: Nominal Animal on August 18, 2022, 03:06:51 pm

The underlying implementation does need some way to specify the output stream (preferably as the first parameter for simplicity), so that the same functions can be used for all I/O, and not have to duplicate the code for each.

I waited because I want to use safestring as output.

Why safestring instead of an array of chars? because safestring is safer and doesn't need further - off by n-bytes from the boundary - checks.

If something, inside the form method tries to write out of the boundary, the function immediately returns an error, and It can be easily programmed, to invoke panic(module, fid, reason) and cpu_halt()!

p_safestring->on_error = NULL; /* it will returns */
p_safestring->on_error = panic; /* callback to panic */

(this is a precise attempt to mimic a functor, with Haskell, Scala, Java, ... it could be done better ...)

SiliconWizard · « **Reply #94 on:** August 18, 2022, 05:04:00 pm »

While I favor using fully standard C, avoiding any weird assumption for UBs or relying on specific implementations, etc, as far as the core language is concerned, I have a different view for the standard lib.

C std lib is OKish - kinda - but frankly half of it is crap with clunky and unsafe interfaces, all having this whole legacy baggage.
There are some more recent additions of functions that are not as crappy as the older ones, but still fully inspired by the old ones, so look more like bandaids than anything else. Plus, looks like most developers actually rarely use the newer ones, just like many do not even fully use C99 features - there's enormous inertia there and lack of knowledge.

I have no problem using my own functions when it's more convenient/faster/more secure/etc. Now of course it depends on the platform and the project to some level. I do use printf() and the like for desktop applications/tools. Rarely on MCUs.

Among the rare functions that I use as is in almost all cases, there are memset(), memcpy(), memmove()... which happen to be compiled as inline code relatively cleverly in most cases.

SiliconWizard · « **Reply #95 on:** August 18, 2022, 05:11:43 pm »

Quote from: Nominal Animal on August 18, 2022, 05:14:10 am

Quote from: SiliconWizard on August 17, 2022, 07:31:01 pm
If the formatting string is a constant, the compiler may be smart enough to optimize this and generate code that would essentially look like the manual, separate function for each part of the format, but I'm not too sure about that.
I've never seen anything like that at all.

Me neither. But in theory, this is fully possible. Compilers already have knowledge about some other std functions (admittedly much simpler) such as memcpy() that allows them to emit smart inline code, they could do the same for printf(). Compiler maintainers probably think that is not worth the trouble.

Quote from: Nominal Animal on August 18, 2022, 05:14:10 am

What most C compilers do do, however, is change printf() and fprintf() calls with a formatting string without conversions into fputs(), but that's it.

Yes. Well, to be more precise, they do this only if the string to be printed ends with a newline character ('\n'), since puts() automatically adds one. So Compilers can only transform printf() calls that have no formatting string *and* are called with a constant string - since compilers must know if the string ends with a newline or not. Then it does remove the newline char when storing the string constant.

DiTBho · « **Reply #96 on:** August 18, 2022, 05:44:43 pm »

Quote from: Nominal Animal on August 18, 2022, 03:06:51 pm

Similarly, having an ability to pass the formatting options, even if usually NULL, lets you do Useful Stuff.

this is step++, but I haven't yet found a definitive solution for "options".

I have a prototype that uses a string as option, so similarly to printf. I don't like it too much.

basically I need:

padding, "#<ch>,<width>" , useful for { uint64, sint64, uint32, sint32, uint16, sint16, uint8, sint8 }
base, { "bbin", "bhex", "bdec" }, useful for { uint64, sint64, uint32, sint32, uint16, sint16, uint8, sint8, p_* }
sign, { '|-', '|+' }, useful for { sint64, sint32, sint16, sint8 } + { fp32, fp64, fx* } and their cplx extensions (sign '-' only when negative? sign always?)
integer digits, "i<n>;", useful for { fp32, fp64, fx* } and their cplx extensions
fractional digits, "f<n>;", useful for { fp32, fp64, fx* } and their cplx extensions

(temporary solution)

Quote from: Nominal Animal on August 18, 2022, 03:06:51 pm

Consider the case where in most cases you use one way to format floats, except that in a few cases you use a slight variation. Do you duplicate the code, or do you pass an options so you can use the same function for both?

{ fp32, fp64, fx* } and their cplx extensions should share part of the formatting code if it's possible.

Quote from: Nominal Animal on August 18, 2022, 03:06:51 pm

Finally, you do want the formatting function to report success or failure, even if it is ignored in most cases.

One step per time, I needed to introduce ans_t and safestring, which offers native protection and native reaction on error, this way it's solved. I would like to use monads but it's too complex in C because functions with additional structures is completely unsupported.

Quote from: Nominal Animal on August 18, 2022, 03:06:51 pm

Just consider how useful it is when debugging you notice that the call reported a failure, for example.

Technically, safestring could invoke an ICE break, which would stop the CPU and alerts the debugger.
It's why safestring has a built-in callback.

I implemented safestring in C89 5 years ago, it helped writing the lexer and tokenizer of myC, plus many other libraries, including the last bZtree.

Quote from: Nominal Animal on August 18, 2022, 03:06:51 pm

status = format_type(destination, value, options);
instead.

At the moment

/*
* convert a datatype into safestring
* safestring knows its size and len
* return ans_t
*
* ans.is_valid
* True on success
* False on failure
* ans.error
* contains the reason of failure
* - safestring has not enough buffer space
* - other errors
*
* options specify how to format data
* - padding "#<ch><width>"
* - base "bbin", "bhex", "bdec"
* - sign "|-", "|+"
* sign only when negative?
* sign always?
* - integer digits "i<n>;"
* - fractional digits "f<n>;"
* options is ${datatype}-specific
* if not valid
* ans.is_valid = False
* ans.error = option_not_supported
*/
ans_t format_${datatype}
(
p_safestring_t p_safestring, /* target */
p_char_t prologue,
p_${datatype }_t p_data,
p_char_t epilogue,
p_char_t options
);

Quote from: Nominal Animal on August 18, 2022, 03:06:51 pm

use the preprocessor. But the ordinary preprocessor can indeed do this.

no-way, *for me*pre-processor is banned in C and not implemented in myC.

Nominal Animal · « **Reply #97 on:** August 18, 2022, 05:47:52 pm »

Quote from: SiliconWizard on August 18, 2022, 05:11:43 pm

Quote from: Nominal Animal on August 18, 2022, 05:14:10 am
What most C compilers do do, however, is change printf() and fprintf() calls with a formatting string without conversions into fputs(), but that's it.
Yes. Well, to be more precise, they do this only if the string to be printed ends with a newline character ('\n'), since puts() automatically adds one.

To be even more precise, the exact call they compile to varies a bit. If we look at say
void foo(FILE *out) { fprintf(out, "Foo\n"); }
we'll see that GCC compiles it into a fputs("Foo\n", out); call, and Clang into a fwrite("Foo\n", 1, 4, out); call.

It's not strictly limited to printf() and standard output, it applies to all streams. puts() is specific to standard output, though.

DiTBho · « **Reply #98 on:** August 18, 2022, 06:11:17 pm »

Quote from: SiliconWizard on August 18, 2022, 05:11:43 pm

puts() automatically adds one

that's part of the reason for the epilogue field

Code: [Select]

void fshow_${datatype}
(
    p_char_t prologue,
    p_${datatype}_t p_data,
    p_char_t epilogue,
    p_char_t options
)
{
    ans_t    ans;
    p_char_t p_buffer;

    p_buffer = p_dev_ss_buffer_out->data;
    ans      = format_${datatype}(p_dev_ss_buffer_out, prologue, p_data, epilogue, options);
    react_on(ans);
    p_dev->context.IO.out_string(p_dev, p_buffer);
}

Written and tested today, fresh code, p_dev can be conIO on my Linux/MacMini-G4 or the serial on my embedded MIPS4++ board.

epilogue can do CR + LF, or just nothing

SiliconWizard · « **Reply #99 on:** August 18, 2022, 07:29:07 pm »

So ${datatype} is some kind of macro parameter?

DiTBho · « **Reply #100 on:** August 18, 2022, 07:51:37 pm »

Quote from: SiliconWizard on August 18, 2022, 07:29:07 pm

So ${datatype} is some kind of macro parameter?

no, it's a practical way to express this

Code: [Select]

void format_uint64
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_uint64_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_sint64
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_sint64_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_uint32
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_uint32_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_sint32
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_sint32_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_uint16
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_uint16_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_sint8
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_sint8_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_uint8
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_uint8_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_boolean
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_boolean_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_string
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_string_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_char
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_char_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_fp64
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_fp64_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_fp32
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_fp32_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_fx1616
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_fx1616_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_fx32r10
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_fx32r10_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_cplx_fp64
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_cplx_fp64_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_cplx_fp32
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_cplx_fp32_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_cplx_fx1616
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_cplx_fx1616_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_cplx_fx32r10
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_cplx_fx32r10_t p_data,
    p_char_t epilogue,
    p_char_t options
)

brucehoult · « **Reply #101 on:** August 19, 2022, 12:10:04 am »

Quote from: DiTBho on August 18, 2022, 01:32:37 pm

Code: [Select]
95: Ashley | 0001033438392 | Wellington, New Zeal 2015- 5- 24 96: Aloha | 0001087651234 | Hawaii, United State 2015- 5- 27 97: Jack | 0001082840184 | Beijing, China 2015- 5- 30 1220: Ashley | 0001033438392 | Wellington, New Zeal 2015- 6- 2 1221: Aloha | 0001087651234 | Hawaii, United State 2015- 6- 5 1222: Jack | 0001082840184 | Beijing, China 2015- 6- 8

NZ, represent! Punching above its weight.

DiTBho · « **Reply #102 on:** August 19, 2022, 09:35:42 am »

"options" needs to be better defined.

padding in Ada has a lot of support, and it's very useful even for strings

e.g.
padding '_' width 10, "hAllo"
len("hAllo") = 4
means padding 6 x "_"
"hAllo______"

* * *please, can someone define "options" in terms of syntax?* * *

thanks

DiTBho · « **Reply #103 on:** August 19, 2022, 09:50:59 am »

"precision" is a very difficult argument for "options" because it impacts to the way fractional is evaluated therefore how the fractional sub-string is built, I have no idea * how * to define it for

- fixedpoint
- floatingpoint

At the moment, I only need to support 32bit floating point, and 32bit fractional, so I am using 10x8byte=128+16=144bit unsigned algebra to evaluate the fractional part, truncate it as specified by "f<n>" in "options" (how many fractional digits?) and convert it into string.

"f0" 3.1415 -> nothing
"f1" 3.1415 -> fractionalpart = 1
"f2" 3.1415 -> fractionalpart = 14
"f3" 3.1415 -> fractionalpart = 141
"f4" 3.1415 -> fractionalpart = 1415
"f5" 3.1415 -> fractionalpart = 0

so it always works at the highest internal precision possible and it just truncates unwanted fractional digits, so it works, but it's a full waste of resources, cpu cycles, code-space, ram at run time for the LUT and buffers, etc, not needed in case you only need "low precision for fast responses" or when you just "don't care high internal precision".

-

I'd also like to mimic my the "Eng" display mode of my CASIO FX9860GIII graphing pocket calculator

you type "1000", the display shows "1K"
you type "1/1000", the display shows "1m"
you type "1100", the display shows "1.1K"
you type "1010", the display shows "1.01K"
you type "1001", the display shows "1K"

nice to have

PlainName · « **Reply #104 on:** August 19, 2022, 10:29:33 am »

Quote

to evaluate the fractional part, truncate it as specified by "f<n>"

Should n+1 be evaluated, if not displayed, so the result can be rounded up? "f3" would then be 142 which is mathematically more appropriate than a simple string truncation.

Nominal Animal · « **Reply #105 on:** August 19, 2022, 11:25:40 am »

Quote from: DiTBho on August 19, 2022, 09:35:42 am

"options" needs to be better defined.

True.

The reason I haven't defined it well enough is that I haven't yet found anything that works well enough.
(I also didn't realize that in show_type(value), you included a lot of the things in type that I put in explicit arguments.)

If we ignore formatted-string formatters (specifically, providing the options as part of the formatting string), then it is just a pointer to a private structure passed to the formatting function. In other words, whatever that particular formatter happens to want.

For example, if we have a formatter that can support both integers and binary fixed point types, we might have

Code: [Select]

#define  INTEGER_OPTION_PREPAD  (1)  /* Padding before sign */
#define  INTEGER_OPTION_MIDPAD  (3)  /* Padding between sign and digits */
#define  INTEGER_OPTION_POSTPAD (2)  /* Padding after digits */

struct integer_options {
    const int      typeid;     /* For checking that the user passed a pointer to a valid structure */
    int            min_value;  /* Values below this are clamped to this value */
    int            max_value;  /* Values above this are clamped to this value */
    unsigned int   opts;       /* collection of INTEGER_OPTION_ flags */
    signed char    width;      /* Positive if fixed to a specific limit */
    signed char    markerpos;  /* Nonnegative if conversion places a decimal point */
    unsigned char  pad;        /* Padding character, usually '0' or space */
    unsigned char  marker;     /* Decimal point character */
};

When formatting-string formatting is used, it might be better to split that into two, so that the options from the formatting strings are passed separately. Otherwise we need an union of structures with a common prefix consisting of the typeid, and a const pointer to the formatting-string optional parts in string form... I still haven't found anything I really like for that case.

Note that the destination would contain basically the things you put in ${datatype}.

One extremely useful thing I have found, is windowed buffers. In the case that we have some room in the buffer, but not enough, it is sometimes useful to call the formatter twice with the exact same data, but just a different part saved of the buffer. (This is how we deal with e.g. partial framebuffers on intelligent displays: we draw a small slice of the display, transfer that, then redraw the display but buffering an another part, and so on. For example, some of the Adafruit Arduino display libraries can do this.)

At minimum, it could be a pointer to

Code: [Select]

struct destination {
    int  pos;  /* Current position in the buffer */
    int  head;  /* First accessible index in the buffer */
    int  tail;  /* The index following the last accessible one in the buffer */
    unsigned int  status;  /* For recording formatting event issues */
    unsigned char  data[];
};

#define  DESTINATION_MISSED_BEFORE  (1<<0)  /* An attempt to write before head occurred */
#define  DESTINATION_MISSED_AFTER  (1<<1)  /* An attempt to write at or after tail occurred */
#define  DESTINATION_MISSED  (DESTINATION_MISSED_BEFORE | DESTINATION_MISSED_AFTER)

static inline int destination_pos(struct destination *dst) { return (dst) ? dst->pos : 0; }

static inline void destination_set(struct destination *dst, int index, unsigned char value)
{
    if (!dst) {
        return;
    } else
    if (index < dst->head) {
        dst->status |= DESTINATION_MISSED_BEFORE;
        return;
    } else
    if (index >= dst->tail) {
        dst->status |= DESTINATION_MISSED_AFTER;
        return;
    } else {
        dst->data[index - dst->head] = value;
    }
}

static inline void destination_advance(struct destination *dst, int len)
{
    /* TODO: Check for overflow/wrap.  Add a dst->status flag to indicate that happened. */
    if (dst)
        dst->pos += len;
}

where you always use destination_set(dst,chr) to store stuff in the buffer, with destination_pos(dst) the initial index you "write" to, and destination_advance(dst,len) setting the emitted length. It just adds len to the index, and you can do it first or last; just use negative indexes (if you advance before set), or positive indexes (if you advance after set).

This way, instead of the formatter function trying to decide when to flush the buffer or not, it is up to whoever is calling the formatter function to handle that.
If (dst->status & DESTINATION_MISSED) is nonzero, then not all of the formatting was captured in the buffer.

Would an actual, compilable example C code help you see how this would work in practice?
I would have included a practical example here, but it is very hot and humid right now where I am, and my brain is running very slow; sorry.

DiTBho · « **Reply #106 on:** August 19, 2022, 12:12:14 pm »

Quote from: dunkemhigh on August 19, 2022, 10:29:33 am

result [..] rounded up?

good point!
- just truncated at digit n
- rounded at digit n

two different options!

DiTBho · « **Reply #107 on:** August 19, 2022, 01:11:59 pm »

Quote from: Nominal Animal on August 19, 2022, 11:25:40 am

Would an actual, compilable example C code help you see how this would work in practice?

Yup, I am already using this stuff while I am developing other stuff.
Examples are useful to see if they match needs

eugene · « **Reply #108 on:** August 19, 2022, 02:41:55 pm »

First, let me say that I am like the child listening to the adults having a conversation, which is completely fine until the child speaks...

Anyway, my interest is in formatting floating point (rarely) and fixed point (often) on MCUs with limited resources. A sprintf() style function would be fine, except that I don't want to link the entire sprintf() into my code. I like Dr Animal's idea of using a pointer to a struct instead of a format string, so a function declaration might look something like

int fmt_float(char *buf, float value, fmt_struct *fmt);

which returns some status information. Converting fixed point to a char array is not too hard even for me, but I wonder if you guys can point me to a good (efficient) algorithm to convert floating point (32 or 64 bit) to a char array without consuming more resources than required. That seems to be exactly the subject of this thread; using only the resources needed to get a result that's good enough. Algorithms are being tossed around as though each of you expect the others to already be familiar with them, but I'm not.

If it makes Dr Animal feel good to write a 10000 word post expounding all of the details, I won't deny him that pleasure.

But really I'm just looking for something I can find online. Academic papers are fine (I am trained in physics but self-taught in CS.)

Nominal Animal · « **Reply #109 on:** August 19, 2022, 06:25:48 pm »

Okay, here is an example program.

Do note that this is not something I can suggest as-is, and is just work-in-progress. I'm very happy to hear ideas and suggestions for improvement, too, but do note that this is mainly intended to show how my above examples would work in practice. It is all licensed under CC0-1.0 (i.e. public domain, do as you wish, just don't try to sue me for any damages you cause if you do use it), too.

First, here is the example C program, example.c that you can compile and run under any OS. (Well, I used Linux, but it should compile and run everywhere.)

Code: [Select]

// SPDX-License-Identifier: CC0-1.0
//
#include <stdlib.h>
#include <stdio.h>
#include "writeonly-buffer.h"
#include "int-format.h"

int main(void)
{
    /* Define a buffer, */
    unsigned char     mybuf_data[50];
    /* and declare the write-only buffer that can write into it. */
    writeonly_buffer  mybuf = WRITEONLY_BUFFER(mybuf_data, sizeof mybuf_data);

    /* Define a fixed-point decimal integer formatting, */
    const int_format  myfix_format = {
        .flags = INT_FORMAT_USE_PLUS | INT_FORMAT_POINT,
        .min_value = -999999,
        .max_value = +999999,
        .width = 0,  /* Not set, so whatever minimum width is needed */
        .decimals = 3,
        .point_char = '.'
    };
    /* and check it is valid. */
    if (int_format_invalid(&myfix_format)) {
        fprintf(stderr, "Oops, myfix_format is invalid.\n");
        return EXIT_FAILURE;
    }

    /* Format something. */

    int  x = 45267;
    int  y = -13;

    format_string(&mybuf, "x is ");
    format_int(&mybuf, x, NULL);
    format_string(&mybuf, " and y is ");
    format_int(&mybuf, y, &myfix_format);

    /* Finalise the buffer. */
    int  len = writeonly_buffer_finish(&mybuf);

    /* Check for errors. */
    if (len < 0) {
        fprintf(stderr, "writeonly_buffer_finish() failed: error %d.\n", len);
        return EXIT_FAILURE;
    } else
    if (writeonly_buffer_state(&mybuf)) {
        fprintf(stderr, "writeonly_buffer_state() reported %d.\n", writeonly_buffer_state(&mybuf));
        return EXIT_FAILURE;
    }

    /* Show what we have. */
    printf("Constructed a string containing %d characters: \"%s\".\n", len, mybuf_data);

    return EXIT_SUCCESS;
}

The format_type() calls in the middle are the salient point, as well as the definition of the fixed point integer formatting options (which I named spec, because I realized "specification" describes its purpose better than "options") just preceding it. Note that I omitted the status checks from the formatting themselves, and instead moved it to the writeonly_buffer_finish().

The fixed point decimal integer type means that the integer value represents the same fixed point number with the decimal point omitted. Because it only requires dropping in the decimal point at the needed spot, I folded these into the same formatting facility.
You can also play with the .width and .flags, especially INT_FORMAT_ constants in the last file, to see how you can use that very same formatter to format integers to a specific number of characters with leading spaces, padded with zeroes with the sign on the extreme left, including + sign for positive numbers, the .min_value and .max_value clamping, and so on.

Just note that if you ask it to do the impossible, like show 6 decimals but keep width down to 5 characters, it will do wonky output. I was too lazy to implement the width checks in int_format_valid(), which would be responsible for checking that the formatting choices are acceptable.

Now, the writeonly_buffer_state() will report if the buffer was not large enough to format everything we wanted to. The program will report an error in that case, but a more sensible program or embedded firmware could do the formatting in a loop, and just move the head,tail part so that no matter how small the buffer is, we eventually get all of the formatted content. Sure, it is slower than just dynamically allocating a buffer large enough, but remember, we're talking about stuff intended for very memory-constrained environments, and the ability to work even with very small buffers may come in useful!

So, let's look at that write-only buffer stuff next. writeonly-buffer.h:

Code: [Select]

// SPDX-License-Identifier: CC0-1.0
//
#ifndef   WRITEONLY_BUFFER_H
#define   WRITEONLY_BUFFER_H
#include <limits.h>

typedef struct {
    unsigned int    state;
    int             pos;
    int             head;
    int             tail;
    unsigned char  *data;
} writeonly_buffer;

#define  WRITEONLY_BUFFER(dataref, size)    \
    {   .state = 0,                         \
        .pos   = 0,                         \
        .head  = 0,                         \
        .tail  = (size),                    \
        .data  = (dataref)  }

#define  WRITEONLY_BUFFER_STATE_BEFORE      1   /* Data store attempt before head */
#define  WRITEONLY_BUFFER_STATE_AFTER       2   /* Data store attempt at or after tail */
#define  WRITEONLY_BUFFER_STATE_OVERFLOW    4   /* pos wraparound or limit exceeded */

static inline unsigned int  writeonly_buffer_state(writeonly_buffer *wo)
{
    return (wo) ? wo->state : 0;
}

static inline int  writeonly_buffer_pos(writeonly_buffer *wo)
{
    return (wo) ? wo->pos : -1;
}

static inline void  writeonly_buffer_commit(writeonly_buffer *wo, int pos)
{
    if (!wo)
        return;
    else
    if (pos < wo->pos)
        wo->state |= WRITEONLY_BUFFER_STATE_OVERFLOW;
    else
        wo->pos = pos;
}

static inline void  writeonly_buffer_set(writeonly_buffer *wo, int pos, int ch)
{
    if (!wo)
        return;
    else
    if (pos < wo->head) {
        wo->state |= WRITEONLY_BUFFER_STATE_BEFORE;
        return;
    } else
    if (pos >= wo->tail) {
        wo->state |= WRITEONLY_BUFFER_STATE_AFTER;
        return;
    } else {
        wo->data[pos - wo->head] = (unsigned char)ch;
        return;
    }
}

static inline int  writeonly_buffer_finish(writeonly_buffer *wo)
{
    if (!wo)
        return -1; /* No buffer specified */

    if (wo->state & WRITEONLY_BUFFER_STATE_OVERFLOW)
        return -2; /* Buffer len (position) overflow */

    /* Add string-terminating NUL char */
    if (wo->pos >= wo->head && wo->pos < wo->tail)
        wo->data[wo->pos - wo->head] = '\0';

    /* Return the length of the data emitted to the buffer. */
    return wo->pos;
}

/*
 * String and single-character formatters.
*/

__attribute__((unused))
static void format_string(writeonly_buffer *wo, const char *src)
{
    /* Nothing to add? */
    if (!src || !*src)
        return;

    int  pos = writeonly_buffer_pos(wo);

    while (*src)
        writeonly_buffer_set(wo, pos++, *(src++));

    writeonly_buffer_commit(wo, pos);
}

__attribute__((unused))
static void format_char(writeonly_buffer *wo, int ch)
{
    /* Nothing to add? */
    if (ch <= 0 || ch > UCHAR_MAX)
        return;

    int  pos = writeonly_buffer_pos(wo);

    writeonly_buffer_set(wo, pos++, ch);
    writeonly_buffer_commit(wo, pos);
}

#endif /* WRITEONLY_BUFFER_H */

The writeonly_buffer structure near the beginning is the key here.

I probably should have named the pos field the len field, because it indicates where the current string construction point is. It is updated by a call to writeonly_buffer_commit(), specifying the new position/length. data points to the current real data buffer (window, not a complete buffer), where the (tail-head) char positions starting at position head are stored at. When data is the entire buffer, then head is zero, and tail is the length of that buffer. That's what the WRITEONLY_BUFFER() macro does for you: initializes the structure members that way, zeroing the initial position/length.

The state member is a bit cookie tracking things related to how the buffer was accessed. If someone tries to commit the buffer backwards, the bits set in WRITEONLY_BUFFER_STATE_OVERFLOW will be set in state (currently, bit 2, value 2²=4). If someone tries to set already flushed buffer data (i.e., data prior to current position/length), then WRITEONLY_BUFFER_STATE_BEFORE gets set. If someone tries to set buffer data past the current window (thus indicating that a larger buffer is needed), WRITEONLY_BUFFER_STATE_AFTER gets set.

(So, when using a smaller buffer than the stuff we want to format, our initial formatting will finish with WRITEONLY_BUFFER_STATE_AFTER. We write out the buffered data, set head to what tail was, and add the buffer size to tail, and do the formatting calls again. We now expect to see WRITEONLY_BUFFER_STATE_BEFORE. When WRITEONLY_BUFFER_STATE_AFTER is no longer set, we have the last (pos-head) chars in the buffer. When we have written those out, we're fully done. This way does need the formatting to be wrapped inside a do..while loop, where the loop condition function is one that writes the buffered data out, and only lets the loop exit when all of the formatted data is printed. I can show a separate example of that if you want, but I haven't yet even verified this works correctly...)

The __attribute__((unused)) just tells the compiler to not complain if one of the helper functions are not used.

Note that the inline here is purely for us humans; the compiler ignores it. I use static inline for helper/accessor type trivial functions, and static for local functions. It helps me think about the functions in an organized manner.

There are a lot of safety checks, but that is intentional. Making sure that only the valid parts of the buffer is accessed is worth the extra cost; even passing NULL pointers should be absolutely safe.

The writeonly_buffer_set() function is the one formatters will use to set any character in the logical buffer, in whatever order they want. When they have "written" a chunk, they then call writeonly_buffer_commit() to set the position/length they think should be now completed.

The buffer is write-only, because we cannot support access to already written/set data without keeping it in memory. That's just something we need to deal with, that's all.

Finally, let's take a look at how the format_int() is implemented. int-format.h:

Code: [Select]

// SPDX-License-Identifier: CC0-1.0
//
#ifndef   INT_FORMAT_H
#define   INT_FORMAT_H
#include <limits.h>
#include "writeonly-buffer.h"

#define  INT_FORMAT_PREPAD           1  /* Padding before sign */
#define  INT_FORMAT_MIDPAD           3  /* Padding between sign and digits */
#define  INT_FORMAT_POSTPAD          2  /* Padding after digits */
#define  INT_FORMAT_PADDING          3  /* Padding selection mask */
#define  INT_FORMAT_OMIT_MINUS       4  /* Omit '-' sign even if negative */
#define  INT_FORMAT_USE_PLUS         8  /* Use '+' sign if positive */
#define  INT_FORMAT_POINT           16  /* Add decimal point */

typedef struct {
    unsigned int    flags;              /* INT_FORMAT_ flags */
    int             min_value;          /* Minimum value for clamping, inclusive */
    int             max_value;          /* Maximum value for clamping, inclusive */
    signed char     width;              /* Formatted total width */
    signed char     decimals;           /* Number of fractional digits */
    unsigned char   padding_char;       /* Padding character */
    unsigned char   point_char;         /* Decimal point character */
} int_format;

static const int_format  default_int_format = {
    .flags = 0,             /* No padding, signed integers, only use - if negative, no decimal point */
    .min_value = INT_MIN,   /* No clamping */
    .max_value = INT_MAX,   /* No clamping */
    .width = 0,             /* Unspecified */
    .decimals = 0,          /* None */
    .padding_char = ' ',    /* Default padding would be with spaces */
    .point_char = '.',      /* Default decimal point is '.' */
};

static int  int_format_invalid(const int_format *spec) {
    /* TODO: Verify sanity of formatting spec */
    (void)spec;  /* For now, just silence any warnings about unused parameters... */
    return 0;
}

static inline int  uint_decimal_digits(unsigned int value)
{
    int  digits = 1;

    /* TODO: Implement more efficient way, e.g. an if tree. */

    while (value >= 1000) {
        value /= 1000;
        digits += 3;
    }
    while (value >= 10) {
        value /= 10;
        digits += 1;
    }

    return digits;
}

static void format_int(writeonly_buffer *wo, int value, const int_format *spec)
{
    /* Position in buffer. */
    int  pos = writeonly_buffer_pos(wo);
    /* We can drop out, if there is no buffer to write to. */
    if (pos < 0)
        return;

    /* If NULL spec, we use the default integer format. */
    if (!spec)
        spec = &default_int_format;

    /* Apply clamping. */
    if (value > spec->max_value)
        value = spec->max_value;
    if (value < spec->min_value)
        value = spec->min_value;

    /* The magnitude of the value to be formatted. */
    unsigned int  absval = (value < 0) ? (unsigned int)(-value) : (unsigned int)value;

    /* Count how many decimal digits we'll need. */
    int  digits = uint_decimal_digits(absval);
    if (digits <= spec->decimals)
        digits = spec->decimals + 1;

    /* Actual width, and number of padding characters. */
    int  width = digits + (!!(spec->flags & INT_FORMAT_POINT))
               + ((value < 0) ? (!(spec->flags & INT_FORMAT_OMIT_MINUS)) : 0)
               + ((value > 0) ? (!!(spec->flags & INT_FORMAT_USE_PLUS)) : 0)
               ;
    int  padding = (spec->width > width) ? spec->width - width : 0;

    /* Prepad? */
    if (padding && (spec->flags & INT_FORMAT_PADDING) == INT_FORMAT_PREPAD) {
        while (padding-->0)
            writeonly_buffer_set(wo, pos++, spec->padding_char);
    }

    /* Sign? */
    if (value < 0 && !(spec->flags & INT_FORMAT_OMIT_MINUS))
        writeonly_buffer_set(wo, pos++, '-');
    else
    if (value > 0 && (spec->flags & INT_FORMAT_USE_PLUS))
        writeonly_buffer_set(wo, pos++, '+');

    /* Midpad? */
    if (padding && (spec->flags & INT_FORMAT_PADDING) == INT_FORMAT_MIDPAD) {
        while (padding-->0)
            writeonly_buffer_set(wo, pos++, spec->padding_char);
    }

    /* Digits and decimal point, if any. */
    if ((spec->flags & INT_FORMAT_POINT)) {
        pos += digits;
        for (int d = 0; d <= digits; d++) {
            if (d == spec->decimals) {
                writeonly_buffer_set(wo, pos - d, spec->point_char);
            } else {
                writeonly_buffer_set(wo, pos - d, '0' + (absval % 10));
                absval /= 10;
            }
        }
        pos++;
    } else {
        pos += digits;
        for (int d = 1; d <= digits; d++) {
            writeonly_buffer_set(wo, pos - d, '0' + (absval % 10));
            absval /= 10;
        }
    }

    /* Postpad? */
    if (padding && (spec->flags & INT_FORMAT_PADDING) == INT_FORMAT_POSTPAD) {
        while (padding-->0)
            writeonly_buffer_set(wo, pos++, spec->padding_char);
    }

    /* Commit. */
    writeonly_buffer_commit(wo, pos);
}

#endif /* INT_FORMAT_H */

I'm not "happy" at the format_int() implementation, but it should suffice as an example. (Note how simple the format_string() one defined in writeonly-buffer.h is for comparison. That one is so simple it doesn't take any spec/options as a parameter.)
This integer formatting implementation is based on the right-to-left conversion, repeatedly dividing the integer by ten and using the remainder as the next digit to be set in order of increasing importance.

Note that because the buffer is read-only, we cannot just temporarily save the digits to the beginning of the buffer, then reverse and insert the decimal point afterwards. That's why it uses the call to uint_decimal_digits() to find out how many digits will be needed.

The format_int() function first obtains the current position/length using a call to writeonly_buffer_pos(), then sets characters in a wonky order using writeonly_buffer_set(), and finally sets the new position/length using a call to writeonly_buffer_commit(). This is common to all formatters. Everything else, including how they use the spec or whether they even accept one, is up to each formatter.

(To support formatting-string formatting, I would "register" each formatting function with the associated spec. For the example program, one could use for example "i" for NULL spec, i.e. default signed integer formatting, and "i3.3" for myfix_format. This does require that the conversion specifier has both a start and an end character, and most other languages are using braces; so that's why I used braces too. The example program formatting call would then be format_using(target, "x is {1i} and y is {2i3.3}", &x, &y) for example. The target wouldn't be mybuf, but a stream handle, that would contain a mybuf and a function pointer to output buffers, so that format_using() can do the repeat-formatting do-while loop with any size of stream buffer.)

I don't know why anyone would ever use INT_FORMAT_POSTPAD, but I added that because of symmetry. Also, I consider integer zero signless, so even if you use INT_FORMAT_USE_PLUS, zero won't have a sign. And while the code should implement all the formatting features implied by the flags and the formatting structure, there probably are bugs in it, because again, it's a hot, humid Friday evening, and my brain is in slow mode.
And once again, I curse at not having learned to write better comments from the get go when I first learned to program. It is damned hard to learn to write them well afterwards.

eutectique · « **Reply #110 on:** August 19, 2022, 06:29:24 pm »

Quote from: DiTBho on August 19, 2022, 12:12:14 pm

- just truncated at digit n
- rounded at digit n

two different options!

- round to even

Make it three.

Nominal Animal · « **Reply #111 on:** August 19, 2022, 06:57:37 pm »

Quote from: eugene on August 19, 2022, 02:41:55 pm

I like Dr Animal's idea of using a pointer to a struct instead of a format string, so a function declaration might look something like

Hey, it's my idea, not some goddamn PhD's! (And if that was a honorific, drop it: I'm a Finn, we don't use them. And even if we did, I'm not one; I'm just another Uncle Bumblefuck here trying to be of use.)

Quote from: eugene on August 19, 2022, 02:41:55 pm

I wonder if you guys can point me to a good (efficient) algorithm to convert floating point (32 or 64 bit) to a char array without consuming more resources than required.

The basic idea that bruce and I discussed, can be described as first splitting the floating point number at the binary decimal point, and handling the fractional part first, and then the integer part. For Binary32 (float), each part is treated like a 128 or 152-bit unsigned integer; for Binary64 (double), like a 1024 or 1077-bit unsigned integer. Of course, there is no direct support in C for such huge numbers, so we use the BigInt approach: multiple "limbs" of suitable size, typically machine native word size. The fractional part is converted by repeatedly multiplying it by ten, and taking the ensuing integer part (and clearing or subtracting it from the value); this gives the decimal digits in order of descending importance (left to right, starting at just after the decimal point). The integer part is converted either by repeatedly subtracting the largest power of ten not larger than the value itself, or by repeatedly dividing the value by ten and obtaining the remainder. The repeated subtraction (the number of subtractions needed gives the decimal digit corresponding to that power of ten) gives the decimal digits in order of descending importance (left to right), and the divide by ten and use the remainder gives the decimal digits in order of ascending importance (right to left).

My suggestion is that instead of using heap or stack for those temporary bits, one puts it into the formatting structure. Then, the formatting function itself is perfectly re-entrant and even thread-safe, but the use of the formatting structure is not! That is, you just need to ensure that the formatting structure is only used sequentially. If you're unsure, or want to use one in say interrupt handler, you need to give it its own formatting structure.

For float on a 32-bit architecture, an uint32_t cache[5]; (20 bytes) would suffice.
For double on a 32-bit architecture, an uint32_t cache[34]; (136 bytes) would suffice.
So, we're not talking about that much of RAM reserved for just the formatting cache, especially since tracking the use of the formatting structure is so much easier than trying to track all possible call chains; I'd say the benefits are worth it.

During operation, most numbers formatted will be relatively close to 1.0 in magnitude, i.e. absolute value between say 1e10 and 1e-10. Then, only the first cache limb (two for Binary64 fractional part) is ever nonzero, and huge speedups can be obtained. But, this tends to complicate the code, but it does not need to, not really. So, the main reason I haven't posted one, is that I haven't yet done the work needed to do it right with code that does not make me cringe in shame. I probably should , because it is the sort of thing I can still accomplish (me being a burned out husk of a man), and it seems it would be useful to many; it'd delight me to be able to help that way.

(I even know how to implement the various tie-breaking rounding rules: floating-point on most architectures uses round exact-half to even.)

Quote from: eugene on August 19, 2022, 02:41:55 pm

But really I'm just looking for something I can find online. Academic papers are fine (I am trained in physics but self-taught in CS.)

That's why I was hoping Bruce could publish his work. The current ones use arbitrary-precision BigNum constructs instead of the fixed-point analogue approach, you see.

(I have a background in computational materials physics myself, but consider myself more of a toolmaker, since I've specialized in developing non-QM molecular simulators and cluster or distributed-parallel processing and such. No CUDA though, me dislike single-vendor dependencies.)

DiTBho · « **Reply #112 on:** August 19, 2022, 07:21:36 pm »

Quote from: Nominal Animal on August 19, 2022, 06:25:48 pm

Now, the writeonly_buffer_state() will report if the buffer was not large enough to format everything we wanted to

that's the same job done by safestring, silently and hidden so I can focus on other things

DiTBho · « **Reply #113 on:** August 19, 2022, 07:29:42 pm »

Quote from: Nominal Animal on August 19, 2022, 06:25:48 pm

Code: [Select]
typedef struct { unsigned int flags; /* INT_FORMAT_ flags */ int min_value; /* Minimum value for clamping, inclusive */ int max_value; /* Maximum value for clamping, inclusive */ signed char width; /* Formatted total width */ signed char decimals; /* Number of fractional digits */ unsigned char padding_char; /* Padding character */ unsigned char point_char; /* Decimal point character */ } int_format;

this is ok, basically even if option is expresses by a string it could be parsed to extract all these points

Nominal Animal · « **Reply #114 on:** August 19, 2022, 07:31:36 pm »

Quote from: DiTBho on August 19, 2022, 07:21:36 pm

Quote from: Nominal Animal on August 19, 2022, 06:25:48 pm
Now, the writeonly_buffer_state() will report if the buffer was not large enough to format everything we wanted to
that's the same job done by safestring, silently and hidden so I can focus on other things

Yes, but note that it is only used before the buffer contents are needed. You, too, will have a function operating on safestring that checks for that at some point. I just already exposed that accessor function, that's all; you're still hiding it underneath safestring.

Basically, writeonly_buffer is an example of what your safestring will become, if you decide to support windowed buffering at some point: the generation of strings longer than the buffer itself.

Now, I still omitted the "stream" abstraction layer, the thing that can flush the already filled part of the buffer, making room for further data. That's the one that will be using writeonly_buffer_state() to decide when to do that and find out when the entire formatting operation has been done. Because I want my formatters to be usable even on static buffers, I cannot incorporate those to the writeonly buffers yet (unlike you can, into safestring, unless you too move to windowed buffers).

DiTBho · « **Reply #115 on:** August 19, 2022, 08:17:11 pm »

Two days ago I created a bit library to support artificial datatype like uint128, uint256, uint512.

Code: [Select]

void test_umul128()
{
    uint32_t    i0;
    uint32_t    carry;
    uint128_t   xA;
    uint128_t   xB;
    uint128_t   xC;
    p_uint128_t p_xA;
    p_uint128_t p_xB;
    p_uint128_t p_xC;

    p_xA = get_address(xA);
    p_xB = get_address(xB);
    p_xC = get_address(xC);

    uint128_let_with(p_xA, 1);  /* A = 1 */
    uint128_let_with(p_xB, 10); /* B = 10 */
    uint128_let_with(p_xC, 0);  /* C = 0 */

    fshow("xA=0x", p_xA, "\n", "bhex");
    fshow("xB=0x", p_xB, "\n", "bhex");

    for (i0 = 1; i0 < 38; i0++)
    {
        carry = uint128_mul(p_xC, p_xA, p_xB); /* C = A x B */
        uint128_let_to(p_xC, p_xA);            /* A = C */
        fshow("cycle-", p_uint32(i0), " ", "bdec #0,2"); /* cycle-01 .. cycle-10 .. cycle-37 */
        fshow("10^", p_uint32(i0), "=", "bdec");         /* 10^1 .. 10^10 .. 10^37 */
        fshow("0x", p_xC, "\n", "bhex");
    }
    fshow("carry=0b", p_uint32(carry), "\n", "bbin"); /* overflow ? */
}

(fshow is the wrapper described some posts ago
here, it calls format_uint128 or format_uint32
depending on the datatype of its argument)

This is a real code example, see how things look in practice with the old string-options

(bhex is, at the moment, the only working format for those big numbers
going to implement bdec the soonest possible)

DiTBho · « **Reply #116 on:** August 19, 2022, 08:28:20 pm »

p.s.
money-formats also need to be supported
"1000 euro" and "21 cents" = "1.000,21 euro"

DiTBho · « **Reply #117 on:** August 19, 2022, 11:52:07 pm »

Quote from: Nominal Animal on August 19, 2022, 07:31:36 pm

Basically, writeonly_buffer is an example of what your safestring will become, if you decide to support windowed buffering at some point: the generation of strings longer than the buffer itself.

I just checked what Elizabeth wrote as a note before the introduction of safestring years ago.

Quote

Turns out this is a stupid memcopy error!!!

Will someone please kill the C language ?
Or at least make a runtime lib that does boundary checking so that this kind of crap can never occur ?

Arrays should have a structure in them telling their size, as opposed to allowing memcopy to take whatever size you feed it.

The runtime for memory operation must do three things :
-1- store the real size of the array in the array
-2- flush a newly created array with all zeroes, upon resize flush the released space or claimed space with zeroes.
-3- each add must go through a method, which does boundary checking and invokes panic if found broken

I added her lines as motivation for the library in motivation.txt, then I after which I decided to write the library as she asked me to reduce the time we wasted debugging similar stuff

Technically safestring operates on a circular buffer and there is a callback that can be used to *flush* things once tail reaches head (buffer full), the callback is currently NULL; the feature has never been really used, but ... it can be resumed ... to support windowed buffering.

Never tried, it's there because Elizabeth asked me it, then forgot about it, which is good, because I could recycle the code for other projects, including experimental stuff like myC

Which brings to Who is Elizabeth? I guess you guessed it ... yes she is the boss, the person who leads and coordinates projects preparation, and pays the salary, hence is the person to whom you can only say " yes ma'am, it will be implemented" ...

... and then forget about it

(well, in this case, it turned out to be a good idea)

SiliconWizard · « **Reply #118 on:** August 20, 2022, 12:13:38 am »

Blame the tools when you make a mistake!

Nominal Animal · « **Reply #119 on:** August 20, 2022, 06:04:34 am »

Quote from: DiTBho on August 19, 2022, 11:52:07 pm

Technically safestring operates on a circular buffer and there is a callback that can be used to *flush* things once tail reaches head (buffer full), the callback is currently NULL; the feature has never been really used, but ... it can be resumed ... to support windowed buffering.

There are some important differences, like mine allows random write access into the entire virtual buffer and storing any continous window of it, whereas yours is limited to forward-going windows, and unlike yours, mine explicitly will not have the flush capability because the windowing is controlled at a different level of abstraction, but yes.

Also, to implement some of the very useful operations making e.g. integer formatting even simpler, I really should implement writeonly_buffer_memcpy(wo,deststart,sourcestart,length) and writeonly_buffer_reverse(wo,deststart,sourcestart,length), because they can make formatting things so much easier. In particular, the digits can be initially stored in increasing order of importance to the buffer as part of obtaining the number of decimal digits in it, and then just swapped and moved to the correct place.

So it is definitely a work in progress.

I admit, I have found myself thinking about
len = format_float(charbufferptr, length, formatspecptr, floatval, fcacheptr);
len = format_double(charbufferptr, length, formatspecptr, doubleval, dcacheptr);
and their implementation on 32-bit architectures like ARM, using integer math only, for the last few hours. If I do get something in a form I can post, I shall start a new thread in here (Programming sub-forum) and post a note in this thread too. They will definitely not be "best" in any sense, but they might be useful and informative.

eugene · « **Reply #120 on:** August 21, 2022, 04:42:48 pm »

Quote from: Nominal Animal on August 19, 2022, 06:25:48 pm

Okay, here is an example program.
[...]

Thank you! It will take me a while to grok all of it. Actually, it will take me a while to get around to even attempting it, but I do have an upcoming project that will require me to store 32 bit floats and occasionally convert them to char arrays with limited precision. So I will almost certainly use some of what you wrote... eventually.

Quote from: Nominal Animal on August 19, 2022, 06:57:37 pm

Quote from: eugene on August 19, 2022, 02:41:55 pm
I like Dr Animal's idea of using a pointer to a struct instead of a format string, so a function declaration might look something like
Hey, it's my idea, not some goddamn PhD's! (And if that was a honorific, drop it: I'm a Finn, we don't use them. And even if we did, I'm not one; I'm just another Uncle Bumblefuck here trying to be of use.)

Sorry. That's a mistake I will not make a second time!

westfw · « **Reply #121 on:** August 21, 2022, 09:11:02 pm »

y'all are addressing technical issues without paying much attention to the cosmetic but very real desire of people to be able to tell what the output is going to look like by reading the source code.

IanB · « **Reply #122 on:** August 21, 2022, 09:49:24 pm »

Quote from: westfw on August 21, 2022, 09:11:02 pm

y'all are addressing technical issues without paying much attention to the cosmetic but very real desire of people to be able to tell what the output is going to look like by reading the source code.

I've long since stopped reading this thread.

It's insane how people can make such a palaver over such a simple requirement as printing out a number in a form suitable for accountants, scientists or engineers to read.

brucehoult · « **Reply #123 on:** August 21, 2022, 10:43:25 pm »

Quote from: IanB on August 21, 2022, 09:49:24 pm

Quote from: westfw on August 21, 2022, 09:11:02 pm
y'all are addressing technical issues without paying much attention to the cosmetic but very real desire of people to be able to tell what the output is going to look like by reading the source code.

I've long since stopped reading this thread.

It's insane how people can make such a palaver over such a simple requirement as printing out a number in a form suitable for accountants, scientists or engineers to read.

Oh gosh. Simple requirement doesn't imply simple implementation. That's like MS-DOS people asking why you'd want to waste all that lovely CPU power rendering the Mac UI.

And in this case, it's not all that much harder to do it right than to do it wrong, assuming you work from published algorithms rather than trying to roll your own.

Scientists and engineers might not care what is happening in the last decimal place, but accountants sure do!

PlainName · « **Reply #124 on:** August 21, 2022, 10:56:12 pm »

Quote

It's insane how people can make such a palaver over ...

Er, how would you know? You've stopped reading the thread, remember

DiTBho · « **Reply #125 on:** August 21, 2022, 11:50:16 pm »

I wished someone commented about the old format options, instead, or about the new struct format spec.

Probably both are needed, but I am not sure, but for sure that bloody Printf does no need further comments.

IanB · « **Reply #126 on:** August 22, 2022, 04:43:03 am »

Quote from: DiTBho on August 21, 2022, 11:50:16 pm

Probably both are needed, but I am not sure, but for sure that bloody Printf does no need further comments.

There is only one certainty in the world. If you try to eliminate printf and replace it with atomic output functions, then sooner or later someone will reinvent printf or its equivalent. It is one of those things that has to exist.

IanB · « **Reply #127 on:** August 22, 2022, 04:53:21 am »

Quote from: brucehoult on August 21, 2022, 10:43:25 pm

Oh gosh. Simple requirement doesn't imply simple implementation. That's like MS-DOS people asking why you'd want to waste all that lovely CPU power rendering the Mac UI.

And in this case, it's not all that much harder to do it right than to do it wrong, assuming you work from published algorithms rather than trying to roll your own.

Scientists and engineers might not care what is happening in the last decimal place, but accountants sure do!

In the 1960's Fortran was able to do a perfectly satisfactory job of printing floating point numbers on machines with about 128 K of memory. Every compiler ever written since then has been able to do the same thing. If I ever need to write a compiler I will go back to original references. I see no reason to believe this is not a solved problem.

brucehoult · « **Reply #128 on:** August 22, 2022, 09:14:57 am »

Quote from: IanB on August 22, 2022, 04:53:21 am

Quote from: brucehoult on August 21, 2022, 10:43:25 pm
Oh gosh. Simple requirement doesn't imply simple implementation. That's like MS-DOS people asking why you'd want to waste all that lovely CPU power rendering the Mac UI.

And in this case, it's not all that much harder to do it right than to do it wrong, assuming you work from published algorithms rather than trying to roll your own.

Scientists and engineers might not care what is happening in the last decimal place, but accountants sure do!

In the 1960's Fortran was able to do a perfectly satisfactory job of printing floating point numbers on machines with about 128 K of memory. Every compiler ever written since then has been able to do the same thing. If I ever need to write a compiler I will go back to original references. I see no reason to believe this is not a solved problem.

Well, that's where you are wrong. I perfectly well remember writing FORTRAN programs in the 1970s and early 80s and getting a lot of results like 2.359998.

128 K of memory is a LOT. Doing a proper job needs less than 1 K, temporarily, during the conversion from double precision. Single precision needs a lot less -- something like 150 bytes.

IanB · « **Reply #129 on:** August 22, 2022, 04:52:52 pm »

Quote from: brucehoult on August 22, 2022, 09:14:57 am

Well, that's where you are wrong. I perfectly well remember writing FORTRAN programs in the 1970s and early 80s and getting a lot of results like 2.359998.

That's interesting. But of course, if you had asked for four decimal places you would got 2.36.

Quote

128 K of memory is a LOT.

Could someone remind modern programmers, who think nothing of requiring 16 GB for a modern workstation?

DiTBho · « **Reply #130 on:** August 22, 2022, 09:43:37 pm »

Quote from: brucehoult on August 22, 2022, 09:14:57 am

Single precision needs a lot less -- something like 150 bytes.

I likely did it wrong, I implemented 256bit unsigned logic because I need it for other stuff, so it got exploited for computing the base10 printable fractional part of fp32_t.

You deal with numbers >~ 10^37 base10.

The uint256 library supports all the operators { +, -, *, /, %, logic, bitwise, cmp, shift, rotate }.
For MIPS32 it consumes 2K byte of rom space, and 518 byte all inclusive at run-time (including stack space) for format-uint256 and the buffer used by safestring.

so, half Kilo byte for ram from "show" to "chars sent on the serial port" to print a fp32 number by using uint256, safestring and format-uint256

myC on MIPS4++ consumes 634 byte

DiTBho · « **Reply #131 on:** August 22, 2022, 09:58:23 pm »

I mean, extract the fractional part

Code: [Select]

integer_part "." fractional part

fractional part = { bit0=1/2, bit1=1/4, bit2=1/8, ... }

assume bit0 is 5000000000000...000 base10, uint256, limited < 244 bit
assume bit1 is 2500000000000...000 base10, uint256, limited < 244 bit
assume bit2 is 1250000000000...000 base10, uint256, limited < 244 bit
(values pre-calculated and stored in a LUT)

sum contributes and got the final big number uint256
convert the first(1) high n digits into string
optionally, round the last digit to the right, instead of truncating

that's the done job.

a lot of cycles, no doubt about
simple approach with no black magic tricks

(1) baseN to string works in reverse
from right to left, so ... you need an extra uint256bit division by 10^k here

brucehoult · « **Reply #132 on:** August 23, 2022, 01:21:36 am »

Quote from: IanB on August 22, 2022, 04:52:52 pm

Quote
128 K of memory is a LOT.

Could someone remind modern programmers, who think nothing of requiring 16 GB for a modern workstation?

16 GB RAM is entirely reasonable for a *workstation*, given that:

1) working effectively requires multiple opened windows with web pages, PDFs etc. The ARMv8.1-A manual is over 50 MB and I have others that are hundreds of MB. Every shutter click of my rather old camera makes a 15 MB JPG and newer ones much more (let alone RAW). Movies for demonstrations, tutorials etc are many GB in size and need to be edited, filtered, compressed etc.

2) 16 GB DDR4 costs about 40 minutes of my salary (not even the grossed-up one including overheads and benefits etc). If 1 TB of RAM increased productivity by 10% it would be a completely reasonable purchase.

On the other hand, I and many others on this board are regularly writing code that RUNS on hardware with 64 KB, 16 KB, 2 KB (ATMega328), 512 bytes (ATTiny85) or even less RAM.

brucehoult · « **Reply #133 on:** August 23, 2022, 01:26:54 am »

Quote from: DiTBho on August 22, 2022, 09:43:37 pm

Quote from: brucehoult on August 22, 2022, 09:14:57 am
Single precision needs a lot less -- something like 150 bytes.

I likely did it wrong, I implemented 256bit unsigned logic because I need it for other stuff, so it got exploited for computing the base10 printable fractional part of fp32_t.

For single precision you need around 160 bits *per variable*. 256 is overkill, a little. But you need both numerator and denominator that size, plus a handful of other working variables. 150 bytes total is probably too generous, but I think 100 bytes is not quite enough.

DiTBho · « **Reply #134 on:** August 23, 2022, 01:40:59 am »

Quote from: brucehoult on August 23, 2022, 01:26:54 am

For single precision you need around 160 bits *per variable*. 256 is overkill, a little.

uint128 was not enough. The choice was between 128 and 256, but since I need uint256 for other stuff, i chose to use it even for this job.

brucehoult · « **Reply #135 on:** August 23, 2022, 02:02:54 am »

Quote from: DiTBho on August 23, 2022, 01:40:59 am

Quote from: brucehoult on August 23, 2022, 01:26:54 am
For single precision you need around 160 bits *per variable*. 256 is overkill, a little.

uint128 was not enough. The choice was between 128 and 256, but since I need uint256 for other stuff, i chose to use it even for this job.

Sure, of course. No harm in using integers slightly larger than absolutely necessary if you've already got the code and the memory space. On an 8 bit machine I think you technically could use 152 bits, but a 16 or 32 bit machine is going to round up to 160 bits, and a 64 bit machine to 192 bits.

westfw · « **Reply #136 on:** August 23, 2022, 06:54:36 am »

was a bit surprised not to find (in a somewhat quick web search) a C implementation of Fortran formatted output. Something that would parse an actual Fortran format specifier string and produce compatible output. I would have thought that the inefficiency would be small compared to the general cost of I/O, and the speed of modern CPUs. Or maybe even mitigated by some sort of JIT interpreter or magic pre-processing. Just as a boon to interaction with old code,

Is there really no such thing?

brucehoult · « **Reply #137 on:** August 23, 2022, 08:04:15 am »

Picture that!

DiTBho · « **Reply #138 on:** August 23, 2022, 08:31:24 am »

I don't know Fortran, What don't you like about format-${datatype} struct specs (NominalAnimal's idea)? What do you like about Fortran formatted output?

Examples, please

I need to support Fortran only because it is mandatory in Gentoo.

Code: [Select]

Using built-in specs.
 --enable-languages=c,c++,fortran

Therefore I have its profile for Catalyst extended with +=ada, but I have never programmed anything in Fortran, an I am investing time with Haskell to master functors and functoids monads.

DiTBho · « **Reply #139 on:** August 23, 2022, 02:39:59 pm »

Quote from: brucehoult on August 23, 2022, 02:02:54 am

32 bit machine is going to round up to 160 bits, and a 64 bit machine to 192 bits.

ok, I abstracted the library support for both C and myC, and now I have
- uint128_t
- uint160_t
- uint192_t
- uint256_t

they are fully supported with all the operators { +, -, *, /, bitwise, logic, cmp, inc, dec, shift, rotate } and pointers.

% is missing, and sint versions are not yet supported, only unsigned logic for now;

What I love of myC, you can do this

Code: [Select]

uint192_t a0;
uint192_t a1;
uint192_t a2;
uint192_format_spec_t uint192_format_spec; /* yeah, I like the NominalAnimal's way */
dev_t dev;

a0 = 1;
a1 = 10^37;
a2 = a0 + a1;

p_dev = dev_open("/dev/ttyS1", "rw", panics_on_failure ); */

uint192_format_spec = 
{
    .width = 40; /* this way, it will be right aliment, stuffed with ' ', which is default */;
};
a2'show(p_uint192_format_spec); /* it will invoke 'format, and will output on the default console */
a2'fshow(p_dev, p_uint192_format_spec); /* will output on the new console, on both safestring internally uses the dev's buffer  */

a2'show2(p_dev, "C=", "\n"p_uint192_format_spec); /* show with prologue and epilogue */

dev_close(p_dev);

operations on uint192 datatype look native to the user while they are artificial.

C++ offers something similar, but you need to overload things.
With C ... you have explicit methods ${datatype}_let, ${datatype}_add, ${datatype}_show, etc ... a bit too verbose, but it's the price to pay.

Frankly, I do find it nice, useful, comfortable, light years ahead from printfs.

How can people not appreciate it and love it at first sight?!?

p.s.
Approved even by Ania, she likes it a lot!

PlainName · « **Reply #140 on:** August 23, 2022, 02:47:53 pm »

Quote

a2'show(p_uint192_format_spec)

What is the ' character? Rather, what does it mean?

DiTBho · « **Reply #141 on:** August 23, 2022, 04:13:43 pm »

Quote from: dunkemhigh on August 23, 2022, 02:47:53 pm

Quote
a2'show(p_uint192_format_spec)

What is the ' character? Rather, what does it mean?

Ada style for property

In myC the lexer passes it as property_token
The lexer works with a dictionary
If you don t like a keyword you can easily redefine it
It is possible even at runtime, but you d better recompile the lib_tokener because during the process the dictionary is optimized by auto hashes which make find 10x faster

Candidates were : ' of

Of looked too verbose
' like ada, so I chose it

The parser can look at the datarype properties
A method is a property of datatype in myC
When the parses sees a property token it looks at the datatype table, if the request matches and it s a method, it gets the function address of the method (e.g. show) and prepares the function call by passing parameters
The first parameter is an inner pointer to the datarype, the second is an inner datarype kind
What you call typedef enum in C, each datarype has its id

All automatic, simpler than in C++, but ...
... everytime you add a datarype in myC you have to recompile the compiler to fully use your datatype as native datatype
I recompile it ... Up to 5 times per day ... O man
Differential compile only compiles three modules and it takes 90 seconds on my macmini Intel core duo
Not bad

Anyway C++ is superior here in every means, even if it uses dot for methods because it has classes and a more complex mechanisms to overload stuff

PlainName · « **Reply #142 on:** August 23, 2022, 05:46:21 pm »

Ah. Thanks

SiliconWizard · « **Reply #143 on:** August 23, 2022, 07:12:13 pm »

Just a question, but how did you validate the "myC" tool? From what I got, you use it for relatively safety critical stuff?
Validation would be a major endeavour and something I would tend to shy away from in practice for this reason.

DiTBho · « **Reply #144 on:** August 23, 2022, 07:55:35 pm »

Quote from: SiliconWizard on August 23, 2022, 07:12:13 pm

Just a question, but how did you validate the "myC" tool?

Each major version of our ICE support includes external (paid) activities to verify it.
And even this way, it cannot compete with Green Hills' tools if this is the question

edit: cut

DiTBho · « **Reply #145 on:** August 27, 2022, 12:24:25 am »

Quote from: westfw on August 23, 2022, 06:54:36 am

was a bit surprised not to find (in a somewhat quick web search) a C implementation of Fortran formatted output. Something that would parse an actual Fortran format specifier string and produce compatible output. I would have thought that the inefficiency would be small compared to the general cost of I/O, and the speed of modern CPUs. Or maybe even mitigated by some sort of JIT interpreter or magic pre-processing. Just as a boon to interaction with old code,

Is there really no such thing?

is it like this?

Nominal Animal · « **Reply #146 on:** August 27, 2022, 06:02:24 pm »

I've done some testing for formatting floats, and it's looking pretty nice (not complete yet, though).

I'm using a temporary work area of 8×32 bits (32 bytes), six of which form a 6×28-bit unsigned integer ("work area", six 28-bit limbs) used in the calculation.

Technically, I'm calling m28f the fractional type, with limb order most significant first. That is, bit 27 in the first limb corresponds to value 2^-1 = 0.5, bit 0 to 2^-28 = 0.000000003725, and so on. I'm calling m28i the integral type, with limb order least significant first: bit 0 in the first limb corresponds to 2⁰ = 1, bit 27 to 2²⁷ = 134217728.

When moving the mantissa (with the implicitly set bit 23 (24th) if exponent is nonzero), the mantissa can span two limbs.

The order can seem odd, but the first limb is always the one next to the decimal point. Furthermore, the key operation –– multiply by ten with carry for the fractional part, divide by 10 with remainder for the integer part –– proceeds always towards the first limb, starting at the furthest nonzero limb. Since it is trivial to keep track of the furthest nonzero limb (starting at when the mantissa is moved to the limbs, one only needs to check when it becomes zero, and move to the next closer limb), we do not need to check or conditionally operate on zero limbs; we only deal with the limbs that we need to, and no more.

What's with the 28 bit radix? Well, it turns out that using an extra 14% of memory for the limbs, all base operations stay 32-bit. Extracting a decimal digit from the integer part (divide by 10 with remainder) only requires two 32-bit multiplications, some bit shifts, and additions per active limb; and extracting a decimal digit from the fractional part (multiply by 10 with carry) only requires one, plus some bit shifts and an bitwise and, per active limb. No floating-point or 64-bit operations are needed at all –– my test code does not use any of the compiler support functions (__udivdi3 et cetera) on x86, risc-v rv32gc, or 32-bit ARMs.

To divide a 32-bit number by ten, you simply multiply it by 3435973837 = 0xCCCCCCD, and shift the result right by 35 bits. Most 32-bit architectures have a multiply-high instruction, which returns the 32 high bits, so the result only needs to be shifted right by three bits. (So, a divide by ten with remainder is two 32-bit multiplications (one multiply-high, one normal unsigned multiply), subtraction, and a shift right by three bits.)

When multiplying by ten, the high 4 bits of the 32-bit result forms the carry, which is added to the result of the next higher multiplication. This cannot overflow, because 10×0x0FFFFFFF+9 = 0x9fffffff. So, a multiply by ten with carry is just one 32-bit multiplication and addition; plus a bit shift to extract the carry, and a bitwise and to extract the result.

In summary, we're talking about exact precision and rounding according to IEEE-754 rules (or whatever you want), with something like a dozen or two cycles per decimal digit emitted, plus a few dozen cycles overhead, with a tight upper bound on the memory needed (which can be statically allocated beforehand, or allocated internally on stack via alloca()/__builtin_alloca()).

Finally, the exact same procedure applies to double precision, except that the temporary storage needs 39 limbs, and a temporary work area of about 1312 bits or 164 bytes. The mantissa is 52-bit (if subnormal) or 53-bit (high bit implicitly set if exponent is nonzero), so it can span three limbs initially. But the base operations stay exactly the same.

I find this rather exciting, I must say, since it looks like it is way more efficient than any standard C library implementation I've seen, while still capable of producing the exact same results. I am currently rejecting infinities and NaNs with an error, but they're trivial to add in if someone wants to. I am having a bit of a struggle to decide which rounding modes I want to support; IEEE defaults to round to nearest breaking ties to even, and I like round to nearest breaking ties away from zero, and while others are possible, it might be nice to KISS and omit stuff not needed/used at run time. For example, does anyone need to actually change the tie-breaking mode at runtime? I don't think so, but...

DiTBho · « **Reply #147 on:** August 27, 2022, 11:31:25 pm »

For both fixed point and floating point, to show the decimal part I am using uint160t, inside it uses uint32 operations

Functoids can distinguish between valid and not valid with a proper behavior ( panic on nam, overflow, underflow, div by zero) or just format a different string.

Love Haskell for this, but it s functional programming, difficult to be replicated with imperative procedural programming, so I am using callbacks to mimic it in C.

westfw · « **Reply #148 on:** August 29, 2022, 12:19:58 am »

Quote

What don't you like about format-${datatype} struct specs (NominalAnimal's idea)?

I thought I already made clear that what I want it to be able to glance at the source code and tell what the output looks like.
A C printf format like "raw data = %6l (0x%8lx), converted = %6.3f\n"
or a fortran format like '"raw data = ", I6," (0x", Z8),", converted = ", F6.3'
Makes that relatively clear, compared to a string of individual statements for each part. (or compared to C++'s "streams.")

Quote

What do you like about Fortran formatted output?is it like this?

Yeah, more or less like that. Although those particular examples seem to delight in "remoting" the actual format string from the write statements, contrary to what I said I wanted above. I don't really LIKE Fortran formats, they've just stood the test of time pretty well, and seem to address some of the complaints I've seen about printf().

Have we mentioned symmetry of input and output formatting yet? (NOT the same as the symmetry of values.)
In Fortran you can write "2I3" with 123456 and get output like "123456", and then if your read with the same format you get two values 123 and 456 again, even without a delimiter between them. That was pretty important back when you were fitting things on 72column cards (or files that emulated card decks.) Maybe not so much any more.
(I had an interesting experience back in college. Me and another programmer "hired"l to write some Fortran code to pretty-print some data array. But we were not allowed to know what the data WAS (including stuff like the ranges of values, IIRC.) This turned out to be more difficult than you might imagine!)

DiTBho · « **Reply #149 on:** August 29, 2022, 01:01:09 am »

Quote from: westfw on August 29, 2022, 12:19:58 am

Makes that relatively clear

Prologue and the Epilogue looks a good compromise for me.
Now my interest is only in the string form of "specs".
The Fortran way, maybe something from it


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee