Author Topic: Best thread-safe "printf", and why does printf need the heap for %f etc? (Read 16159 times)

DiTBho · « **Reply #100 on:** August 18, 2022, 07:51:37 pm »

Quote from: SiliconWizard on August 18, 2022, 07:29:07 pm

So ${datatype} is some kind of macro parameter?

no, it's a practical way to express this

void format_uint64
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_uint64_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_sint64
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_sint64_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_uint32
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_uint32_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_sint32
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_sint32_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_uint16
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_uint16_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_sint8
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_sint8_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_uint8
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_uint8_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_boolean
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_boolean_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_string
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_string_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_char
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_char_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_fp64
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_fp64_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_fp32
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_fp32_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_fx1616
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_fx1616_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_fx32r10
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_fx32r10_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_cplx_fp64
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_cplx_fp64_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_cplx_fp32
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_cplx_fp32_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_cplx_fx1616
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_cplx_fx1616_t p_data,
    p_char_t epilogue,
    p_char_t options
)
void format_cplx_fx32r10
(
    p_safestring_t p_safestring,
    p_char_t prologue,
    p_cplx_fx32r10_t p_data,
    p_char_t epilogue,
    p_char_t options
)

brucehoult · « **Reply #101 on:** August 19, 2022, 12:10:04 am »

Quote from: DiTBho on August 18, 2022, 01:32:37 pm

Code: [Select]
95: Ashley | 0001033438392 | Wellington, New Zeal 2015- 5- 24 96: Aloha | 0001087651234 | Hawaii, United State 2015- 5- 27 97: Jack | 0001082840184 | Beijing, China 2015- 5- 30 1220: Ashley | 0001033438392 | Wellington, New Zeal 2015- 6- 2 1221: Aloha | 0001087651234 | Hawaii, United State 2015- 6- 5 1222: Jack | 0001082840184 | Beijing, China 2015- 6- 8

NZ, represent! Punching above its weight.

DiTBho · « **Reply #102 on:** August 19, 2022, 09:35:42 am »

"options" needs to be better defined.

padding in Ada has a lot of support, and it's very useful even for strings

e.g.
padding '_' width 10, "hAllo"
len("hAllo") = 4
means padding 6 x "_"
"hAllo______"

* * *please, can someone define "options" in terms of syntax?* * *

thanks

DiTBho · « **Reply #103 on:** August 19, 2022, 09:50:59 am »

"precision" is a very difficult argument for "options" because it impacts to the way fractional is evaluated therefore how the fractional sub-string is built, I have no idea * how * to define it for

- fixedpoint
- floatingpoint

At the moment, I only need to support 32bit floating point, and 32bit fractional, so I am using 10x8byte=128+16=144bit unsigned algebra to evaluate the fractional part, truncate it as specified by "f<n>" in "options" (how many fractional digits?) and convert it into string.

"f0" 3.1415 -> nothing
"f1" 3.1415 -> fractionalpart = 1
"f2" 3.1415 -> fractionalpart = 14
"f3" 3.1415 -> fractionalpart = 141
"f4" 3.1415 -> fractionalpart = 1415
"f5" 3.1415 -> fractionalpart = 0

so it always works at the highest internal precision possible and it just truncates unwanted fractional digits, so it works, but it's a full waste of resources, cpu cycles, code-space, ram at run time for the LUT and buffers, etc, not needed in case you only need "low precision for fast responses" or when you just "don't care high internal precision".

-

I'd also like to mimic my the "Eng" display mode of my CASIO FX9860GIII graphing pocket calculator

you type "1000", the display shows "1K"
you type "1/1000", the display shows "1m"
you type "1100", the display shows "1.1K"
you type "1010", the display shows "1.01K"
you type "1001", the display shows "1K"

nice to have

PlainName · « **Reply #104 on:** August 19, 2022, 10:29:33 am »

Quote

to evaluate the fractional part, truncate it as specified by "f<n>"

Should n+1 be evaluated, if not displayed, so the result can be rounded up? "f3" would then be 142 which is mathematically more appropriate than a simple string truncation.

Nominal Animal · « **Reply #105 on:** August 19, 2022, 11:25:40 am »

Quote from: DiTBho on August 19, 2022, 09:35:42 am

"options" needs to be better defined.

True.

The reason I haven't defined it well enough is that I haven't yet found anything that works well enough.
(I also didn't realize that in show_type(value), you included a lot of the things in type that I put in explicit arguments.)

If we ignore formatted-string formatters (specifically, providing the options as part of the formatting string), then it is just a pointer to a private structure passed to the formatting function. In other words, whatever that particular formatter happens to want.

For example, if we have a formatter that can support both integers and binary fixed point types, we might have

Code: [Select]

#define  INTEGER_OPTION_PREPAD  (1)  /* Padding before sign */
#define  INTEGER_OPTION_MIDPAD  (3)  /* Padding between sign and digits */
#define  INTEGER_OPTION_POSTPAD (2)  /* Padding after digits */

struct integer_options {
    const int      typeid;     /* For checking that the user passed a pointer to a valid structure */
    int            min_value;  /* Values below this are clamped to this value */
    int            max_value;  /* Values above this are clamped to this value */
    unsigned int   opts;       /* collection of INTEGER_OPTION_ flags */
    signed char    width;      /* Positive if fixed to a specific limit */
    signed char    markerpos;  /* Nonnegative if conversion places a decimal point */
    unsigned char  pad;        /* Padding character, usually '0' or space */
    unsigned char  marker;     /* Decimal point character */
};

When formatting-string formatting is used, it might be better to split that into two, so that the options from the formatting strings are passed separately. Otherwise we need an union of structures with a common prefix consisting of the typeid, and a const pointer to the formatting-string optional parts in string form... I still haven't found anything I really like for that case.

Note that the destination would contain basically the things you put in ${datatype}.

One extremely useful thing I have found, is windowed buffers. In the case that we have some room in the buffer, but not enough, it is sometimes useful to call the formatter twice with the exact same data, but just a different part saved of the buffer. (This is how we deal with e.g. partial framebuffers on intelligent displays: we draw a small slice of the display, transfer that, then redraw the display but buffering an another part, and so on. For example, some of the Adafruit Arduino display libraries can do this.)

At minimum, it could be a pointer to

Code: [Select]

struct destination {
    int  pos;  /* Current position in the buffer */
    int  head;  /* First accessible index in the buffer */
    int  tail;  /* The index following the last accessible one in the buffer */
    unsigned int  status;  /* For recording formatting event issues */
    unsigned char  data[];
};

#define  DESTINATION_MISSED_BEFORE  (1<<0)  /* An attempt to write before head occurred */
#define  DESTINATION_MISSED_AFTER  (1<<1)  /* An attempt to write at or after tail occurred */
#define  DESTINATION_MISSED  (DESTINATION_MISSED_BEFORE | DESTINATION_MISSED_AFTER)

static inline int destination_pos(struct destination *dst) { return (dst) ? dst->pos : 0; }

static inline void destination_set(struct destination *dst, int index, unsigned char value)
{
    if (!dst) {
        return;
    } else
    if (index < dst->head) {
        dst->status |= DESTINATION_MISSED_BEFORE;
        return;
    } else
    if (index >= dst->tail) {
        dst->status |= DESTINATION_MISSED_AFTER;
        return;
    } else {
        dst->data[index - dst->head] = value;
    }
}

static inline void destination_advance(struct destination *dst, int len)
{
    /* TODO: Check for overflow/wrap.  Add a dst->status flag to indicate that happened. */
    if (dst)
        dst->pos += len;
}

where you always use destination_set(dst,chr) to store stuff in the buffer, with destination_pos(dst) the initial index you "write" to, and destination_advance(dst,len) setting the emitted length. It just adds len to the index, and you can do it first or last; just use negative indexes (if you advance before set), or positive indexes (if you advance after set).

This way, instead of the formatter function trying to decide when to flush the buffer or not, it is up to whoever is calling the formatter function to handle that.
If (dst->status & DESTINATION_MISSED) is nonzero, then not all of the formatting was captured in the buffer.

Would an actual, compilable example C code help you see how this would work in practice?
I would have included a practical example here, but it is very hot and humid right now where I am, and my brain is running very slow; sorry.

DiTBho · « **Reply #106 on:** August 19, 2022, 12:12:14 pm »

Quote from: dunkemhigh on August 19, 2022, 10:29:33 am

result [..] rounded up?

good point!
- just truncated at digit n
- rounded at digit n

two different options!

DiTBho · « **Reply #107 on:** August 19, 2022, 01:11:59 pm »

Quote from: Nominal Animal on August 19, 2022, 11:25:40 am

Would an actual, compilable example C code help you see how this would work in practice?

Yup, I am already using this stuff while I am developing other stuff.
Examples are useful to see if they match needs

eugene · « **Reply #108 on:** August 19, 2022, 02:41:55 pm »

First, let me say that I am like the child listening to the adults having a conversation, which is completely fine until the child speaks...

Anyway, my interest is in formatting floating point (rarely) and fixed point (often) on MCUs with limited resources. A sprintf() style function would be fine, except that I don't want to link the entire sprintf() into my code. I like Dr Animal's idea of using a pointer to a struct instead of a format string, so a function declaration might look something like

int fmt_float(char *buf, float value, fmt_struct *fmt);

which returns some status information. Converting fixed point to a char array is not too hard even for me, but I wonder if you guys can point me to a good (efficient) algorithm to convert floating point (32 or 64 bit) to a char array without consuming more resources than required. That seems to be exactly the subject of this thread; using only the resources needed to get a result that's good enough. Algorithms are being tossed around as though each of you expect the others to already be familiar with them, but I'm not.

If it makes Dr Animal feel good to write a 10000 word post expounding all of the details, I won't deny him that pleasure.

But really I'm just looking for something I can find online. Academic papers are fine (I am trained in physics but self-taught in CS.)

Nominal Animal · « **Reply #109 on:** August 19, 2022, 06:25:48 pm »

Okay, here is an example program.

Do note that this is not something I can suggest as-is, and is just work-in-progress. I'm very happy to hear ideas and suggestions for improvement, too, but do note that this is mainly intended to show how my above examples would work in practice. It is all licensed under CC0-1.0 (i.e. public domain, do as you wish, just don't try to sue me for any damages you cause if you do use it), too.

First, here is the example C program, example.c that you can compile and run under any OS. (Well, I used Linux, but it should compile and run everywhere.)

Code: [Select]

// SPDX-License-Identifier: CC0-1.0
//
#include <stdlib.h>
#include <stdio.h>
#include "writeonly-buffer.h"
#include "int-format.h"

int main(void)
{
    /* Define a buffer, */
    unsigned char     mybuf_data[50];
    /* and declare the write-only buffer that can write into it. */
    writeonly_buffer  mybuf = WRITEONLY_BUFFER(mybuf_data, sizeof mybuf_data);

    /* Define a fixed-point decimal integer formatting, */
    const int_format  myfix_format = {
        .flags = INT_FORMAT_USE_PLUS | INT_FORMAT_POINT,
        .min_value = -999999,
        .max_value = +999999,
        .width = 0,  /* Not set, so whatever minimum width is needed */
        .decimals = 3,
        .point_char = '.'
    };
    /* and check it is valid. */
    if (int_format_invalid(&myfix_format)) {
        fprintf(stderr, "Oops, myfix_format is invalid.\n");
        return EXIT_FAILURE;
    }

    /* Format something. */

    int  x = 45267;
    int  y = -13;

    format_string(&mybuf, "x is ");
    format_int(&mybuf, x, NULL);
    format_string(&mybuf, " and y is ");
    format_int(&mybuf, y, &myfix_format);

    /* Finalise the buffer. */
    int  len = writeonly_buffer_finish(&mybuf);

    /* Check for errors. */
    if (len < 0) {
        fprintf(stderr, "writeonly_buffer_finish() failed: error %d.\n", len);
        return EXIT_FAILURE;
    } else
    if (writeonly_buffer_state(&mybuf)) {
        fprintf(stderr, "writeonly_buffer_state() reported %d.\n", writeonly_buffer_state(&mybuf));
        return EXIT_FAILURE;
    }

    /* Show what we have. */
    printf("Constructed a string containing %d characters: \"%s\".\n", len, mybuf_data);

    return EXIT_SUCCESS;
}

The format_type() calls in the middle are the salient point, as well as the definition of the fixed point integer formatting options (which I named spec, because I realized "specification" describes its purpose better than "options") just preceding it. Note that I omitted the status checks from the formatting themselves, and instead moved it to the writeonly_buffer_finish().

The fixed point decimal integer type means that the integer value represents the same fixed point number with the decimal point omitted. Because it only requires dropping in the decimal point at the needed spot, I folded these into the same formatting facility.
You can also play with the .width and .flags, especially INT_FORMAT_ constants in the last file, to see how you can use that very same formatter to format integers to a specific number of characters with leading spaces, padded with zeroes with the sign on the extreme left, including + sign for positive numbers, the .min_value and .max_value clamping, and so on.

Just note that if you ask it to do the impossible, like show 6 decimals but keep width down to 5 characters, it will do wonky output. I was too lazy to implement the width checks in int_format_valid(), which would be responsible for checking that the formatting choices are acceptable.

Now, the writeonly_buffer_state() will report if the buffer was not large enough to format everything we wanted to. The program will report an error in that case, but a more sensible program or embedded firmware could do the formatting in a loop, and just move the head,tail part so that no matter how small the buffer is, we eventually get all of the formatted content. Sure, it is slower than just dynamically allocating a buffer large enough, but remember, we're talking about stuff intended for very memory-constrained environments, and the ability to work even with very small buffers may come in useful!

So, let's look at that write-only buffer stuff next. writeonly-buffer.h:

Code: [Select]

// SPDX-License-Identifier: CC0-1.0
//
#ifndef   WRITEONLY_BUFFER_H
#define   WRITEONLY_BUFFER_H
#include <limits.h>

typedef struct {
    unsigned int    state;
    int             pos;
    int             head;
    int             tail;
    unsigned char  *data;
} writeonly_buffer;

#define  WRITEONLY_BUFFER(dataref, size)    \
    {   .state = 0,                         \
        .pos   = 0,                         \
        .head  = 0,                         \
        .tail  = (size),                    \
        .data  = (dataref)  }

#define  WRITEONLY_BUFFER_STATE_BEFORE      1   /* Data store attempt before head */
#define  WRITEONLY_BUFFER_STATE_AFTER       2   /* Data store attempt at or after tail */
#define  WRITEONLY_BUFFER_STATE_OVERFLOW    4   /* pos wraparound or limit exceeded */

static inline unsigned int  writeonly_buffer_state(writeonly_buffer *wo)
{
    return (wo) ? wo->state : 0;
}

static inline int  writeonly_buffer_pos(writeonly_buffer *wo)
{
    return (wo) ? wo->pos : -1;
}

static inline void  writeonly_buffer_commit(writeonly_buffer *wo, int pos)
{
    if (!wo)
        return;
    else
    if (pos < wo->pos)
        wo->state |= WRITEONLY_BUFFER_STATE_OVERFLOW;
    else
        wo->pos = pos;
}

static inline void  writeonly_buffer_set(writeonly_buffer *wo, int pos, int ch)
{
    if (!wo)
        return;
    else
    if (pos < wo->head) {
        wo->state |= WRITEONLY_BUFFER_STATE_BEFORE;
        return;
    } else
    if (pos >= wo->tail) {
        wo->state |= WRITEONLY_BUFFER_STATE_AFTER;
        return;
    } else {
        wo->data[pos - wo->head] = (unsigned char)ch;
        return;
    }
}

static inline int  writeonly_buffer_finish(writeonly_buffer *wo)
{
    if (!wo)
        return -1; /* No buffer specified */

    if (wo->state & WRITEONLY_BUFFER_STATE_OVERFLOW)
        return -2; /* Buffer len (position) overflow */

    /* Add string-terminating NUL char */
    if (wo->pos >= wo->head && wo->pos < wo->tail)
        wo->data[wo->pos - wo->head] = '\0';

    /* Return the length of the data emitted to the buffer. */
    return wo->pos;
}

/*
 * String and single-character formatters.
*/

__attribute__((unused))
static void format_string(writeonly_buffer *wo, const char *src)
{
    /* Nothing to add? */
    if (!src || !*src)
        return;

    int  pos = writeonly_buffer_pos(wo);

    while (*src)
        writeonly_buffer_set(wo, pos++, *(src++));

    writeonly_buffer_commit(wo, pos);
}

__attribute__((unused))
static void format_char(writeonly_buffer *wo, int ch)
{
    /* Nothing to add? */
    if (ch <= 0 || ch > UCHAR_MAX)
        return;

    int  pos = writeonly_buffer_pos(wo);

    writeonly_buffer_set(wo, pos++, ch);
    writeonly_buffer_commit(wo, pos);
}

#endif /* WRITEONLY_BUFFER_H */

The writeonly_buffer structure near the beginning is the key here.

I probably should have named the pos field the len field, because it indicates where the current string construction point is. It is updated by a call to writeonly_buffer_commit(), specifying the new position/length. data points to the current real data buffer (window, not a complete buffer), where the (tail-head) char positions starting at position head are stored at. When data is the entire buffer, then head is zero, and tail is the length of that buffer. That's what the WRITEONLY_BUFFER() macro does for you: initializes the structure members that way, zeroing the initial position/length.

The state member is a bit cookie tracking things related to how the buffer was accessed. If someone tries to commit the buffer backwards, the bits set in WRITEONLY_BUFFER_STATE_OVERFLOW will be set in state (currently, bit 2, value 2²=4). If someone tries to set already flushed buffer data (i.e., data prior to current position/length), then WRITEONLY_BUFFER_STATE_BEFORE gets set. If someone tries to set buffer data past the current window (thus indicating that a larger buffer is needed), WRITEONLY_BUFFER_STATE_AFTER gets set.

(So, when using a smaller buffer than the stuff we want to format, our initial formatting will finish with WRITEONLY_BUFFER_STATE_AFTER. We write out the buffered data, set head to what tail was, and add the buffer size to tail, and do the formatting calls again. We now expect to see WRITEONLY_BUFFER_STATE_BEFORE. When WRITEONLY_BUFFER_STATE_AFTER is no longer set, we have the last (pos-head) chars in the buffer. When we have written those out, we're fully done. This way does need the formatting to be wrapped inside a do..while loop, where the loop condition function is one that writes the buffered data out, and only lets the loop exit when all of the formatted data is printed. I can show a separate example of that if you want, but I haven't yet even verified this works correctly...)

The __attribute__((unused)) just tells the compiler to not complain if one of the helper functions are not used.

Note that the inline here is purely for us humans; the compiler ignores it. I use static inline for helper/accessor type trivial functions, and static for local functions. It helps me think about the functions in an organized manner.

There are a lot of safety checks, but that is intentional. Making sure that only the valid parts of the buffer is accessed is worth the extra cost; even passing NULL pointers should be absolutely safe.

The writeonly_buffer_set() function is the one formatters will use to set any character in the logical buffer, in whatever order they want. When they have "written" a chunk, they then call writeonly_buffer_commit() to set the position/length they think should be now completed.

The buffer is write-only, because we cannot support access to already written/set data without keeping it in memory. That's just something we need to deal with, that's all.

Finally, let's take a look at how the format_int() is implemented. int-format.h:

Code: [Select]

// SPDX-License-Identifier: CC0-1.0
//
#ifndef   INT_FORMAT_H
#define   INT_FORMAT_H
#include <limits.h>
#include "writeonly-buffer.h"

#define  INT_FORMAT_PREPAD           1  /* Padding before sign */
#define  INT_FORMAT_MIDPAD           3  /* Padding between sign and digits */
#define  INT_FORMAT_POSTPAD          2  /* Padding after digits */
#define  INT_FORMAT_PADDING          3  /* Padding selection mask */
#define  INT_FORMAT_OMIT_MINUS       4  /* Omit '-' sign even if negative */
#define  INT_FORMAT_USE_PLUS         8  /* Use '+' sign if positive */
#define  INT_FORMAT_POINT           16  /* Add decimal point */

typedef struct {
    unsigned int    flags;              /* INT_FORMAT_ flags */
    int             min_value;          /* Minimum value for clamping, inclusive */
    int             max_value;          /* Maximum value for clamping, inclusive */
    signed char     width;              /* Formatted total width */
    signed char     decimals;           /* Number of fractional digits */
    unsigned char   padding_char;       /* Padding character */
    unsigned char   point_char;         /* Decimal point character */
} int_format;

static const int_format  default_int_format = {
    .flags = 0,             /* No padding, signed integers, only use - if negative, no decimal point */
    .min_value = INT_MIN,   /* No clamping */
    .max_value = INT_MAX,   /* No clamping */
    .width = 0,             /* Unspecified */
    .decimals = 0,          /* None */
    .padding_char = ' ',    /* Default padding would be with spaces */
    .point_char = '.',      /* Default decimal point is '.' */
};

static int  int_format_invalid(const int_format *spec) {
    /* TODO: Verify sanity of formatting spec */
    (void)spec;  /* For now, just silence any warnings about unused parameters... */
    return 0;
}

static inline int  uint_decimal_digits(unsigned int value)
{
    int  digits = 1;

    /* TODO: Implement more efficient way, e.g. an if tree. */

    while (value >= 1000) {
        value /= 1000;
        digits += 3;
    }
    while (value >= 10) {
        value /= 10;
        digits += 1;
    }

    return digits;
}

static void format_int(writeonly_buffer *wo, int value, const int_format *spec)
{
    /* Position in buffer. */
    int  pos = writeonly_buffer_pos(wo);
    /* We can drop out, if there is no buffer to write to. */
    if (pos < 0)
        return;

    /* If NULL spec, we use the default integer format. */
    if (!spec)
        spec = &default_int_format;

    /* Apply clamping. */
    if (value > spec->max_value)
        value = spec->max_value;
    if (value < spec->min_value)
        value = spec->min_value;

    /* The magnitude of the value to be formatted. */
    unsigned int  absval = (value < 0) ? (unsigned int)(-value) : (unsigned int)value;

    /* Count how many decimal digits we'll need. */
    int  digits = uint_decimal_digits(absval);
    if (digits <= spec->decimals)
        digits = spec->decimals + 1;

    /* Actual width, and number of padding characters. */
    int  width = digits + (!!(spec->flags & INT_FORMAT_POINT))
               + ((value < 0) ? (!(spec->flags & INT_FORMAT_OMIT_MINUS)) : 0)
               + ((value > 0) ? (!!(spec->flags & INT_FORMAT_USE_PLUS)) : 0)
               ;
    int  padding = (spec->width > width) ? spec->width - width : 0;

    /* Prepad? */
    if (padding && (spec->flags & INT_FORMAT_PADDING) == INT_FORMAT_PREPAD) {
        while (padding-->0)
            writeonly_buffer_set(wo, pos++, spec->padding_char);
    }

    /* Sign? */
    if (value < 0 && !(spec->flags & INT_FORMAT_OMIT_MINUS))
        writeonly_buffer_set(wo, pos++, '-');
    else
    if (value > 0 && (spec->flags & INT_FORMAT_USE_PLUS))
        writeonly_buffer_set(wo, pos++, '+');

    /* Midpad? */
    if (padding && (spec->flags & INT_FORMAT_PADDING) == INT_FORMAT_MIDPAD) {
        while (padding-->0)
            writeonly_buffer_set(wo, pos++, spec->padding_char);
    }

    /* Digits and decimal point, if any. */
    if ((spec->flags & INT_FORMAT_POINT)) {
        pos += digits;
        for (int d = 0; d <= digits; d++) {
            if (d == spec->decimals) {
                writeonly_buffer_set(wo, pos - d, spec->point_char);
            } else {
                writeonly_buffer_set(wo, pos - d, '0' + (absval % 10));
                absval /= 10;
            }
        }
        pos++;
    } else {
        pos += digits;
        for (int d = 1; d <= digits; d++) {
            writeonly_buffer_set(wo, pos - d, '0' + (absval % 10));
            absval /= 10;
        }
    }

    /* Postpad? */
    if (padding && (spec->flags & INT_FORMAT_PADDING) == INT_FORMAT_POSTPAD) {
        while (padding-->0)
            writeonly_buffer_set(wo, pos++, spec->padding_char);
    }

    /* Commit. */
    writeonly_buffer_commit(wo, pos);
}

#endif /* INT_FORMAT_H */

I'm not "happy" at the format_int() implementation, but it should suffice as an example. (Note how simple the format_string() one defined in writeonly-buffer.h is for comparison. That one is so simple it doesn't take any spec/options as a parameter.)
This integer formatting implementation is based on the right-to-left conversion, repeatedly dividing the integer by ten and using the remainder as the next digit to be set in order of increasing importance.

Note that because the buffer is read-only, we cannot just temporarily save the digits to the beginning of the buffer, then reverse and insert the decimal point afterwards. That's why it uses the call to uint_decimal_digits() to find out how many digits will be needed.

The format_int() function first obtains the current position/length using a call to writeonly_buffer_pos(), then sets characters in a wonky order using writeonly_buffer_set(), and finally sets the new position/length using a call to writeonly_buffer_commit(). This is common to all formatters. Everything else, including how they use the spec or whether they even accept one, is up to each formatter.

(To support formatting-string formatting, I would "register" each formatting function with the associated spec. For the example program, one could use for example "i" for NULL spec, i.e. default signed integer formatting, and "i3.3" for myfix_format. This does require that the conversion specifier has both a start and an end character, and most other languages are using braces; so that's why I used braces too. The example program formatting call would then be format_using(target, "x is {1i} and y is {2i3.3}", &x, &y) for example. The target wouldn't be mybuf, but a stream handle, that would contain a mybuf and a function pointer to output buffers, so that format_using() can do the repeat-formatting do-while loop with any size of stream buffer.)

I don't know why anyone would ever use INT_FORMAT_POSTPAD, but I added that because of symmetry. Also, I consider integer zero signless, so even if you use INT_FORMAT_USE_PLUS, zero won't have a sign. And while the code should implement all the formatting features implied by the flags and the formatting structure, there probably are bugs in it, because again, it's a hot, humid Friday evening, and my brain is in slow mode.
And once again, I curse at not having learned to write better comments from the get go when I first learned to program. It is damned hard to learn to write them well afterwards.

eutectique · « **Reply #110 on:** August 19, 2022, 06:29:24 pm »

Quote from: DiTBho on August 19, 2022, 12:12:14 pm

- just truncated at digit n
- rounded at digit n

two different options!

- round to even

Make it three.

Nominal Animal · « **Reply #111 on:** August 19, 2022, 06:57:37 pm »

Quote from: eugene on August 19, 2022, 02:41:55 pm

I like Dr Animal's idea of using a pointer to a struct instead of a format string, so a function declaration might look something like

Hey, it's my idea, not some goddamn PhD's! (And if that was a honorific, drop it: I'm a Finn, we don't use them. And even if we did, I'm not one; I'm just another Uncle Bumblefuck here trying to be of use.)

Quote from: eugene on August 19, 2022, 02:41:55 pm

I wonder if you guys can point me to a good (efficient) algorithm to convert floating point (32 or 64 bit) to a char array without consuming more resources than required.

The basic idea that bruce and I discussed, can be described as first splitting the floating point number at the binary decimal point, and handling the fractional part first, and then the integer part. For Binary32 (float), each part is treated like a 128 or 152-bit unsigned integer; for Binary64 (double), like a 1024 or 1077-bit unsigned integer. Of course, there is no direct support in C for such huge numbers, so we use the BigInt approach: multiple "limbs" of suitable size, typically machine native word size. The fractional part is converted by repeatedly multiplying it by ten, and taking the ensuing integer part (and clearing or subtracting it from the value); this gives the decimal digits in order of descending importance (left to right, starting at just after the decimal point). The integer part is converted either by repeatedly subtracting the largest power of ten not larger than the value itself, or by repeatedly dividing the value by ten and obtaining the remainder. The repeated subtraction (the number of subtractions needed gives the decimal digit corresponding to that power of ten) gives the decimal digits in order of descending importance (left to right), and the divide by ten and use the remainder gives the decimal digits in order of ascending importance (right to left).

My suggestion is that instead of using heap or stack for those temporary bits, one puts it into the formatting structure. Then, the formatting function itself is perfectly re-entrant and even thread-safe, but the use of the formatting structure is not! That is, you just need to ensure that the formatting structure is only used sequentially. If you're unsure, or want to use one in say interrupt handler, you need to give it its own formatting structure.

For float on a 32-bit architecture, an uint32_t cache[5]; (20 bytes) would suffice.
For double on a 32-bit architecture, an uint32_t cache[34]; (136 bytes) would suffice.
So, we're not talking about that much of RAM reserved for just the formatting cache, especially since tracking the use of the formatting structure is so much easier than trying to track all possible call chains; I'd say the benefits are worth it.

During operation, most numbers formatted will be relatively close to 1.0 in magnitude, i.e. absolute value between say 1e10 and 1e-10. Then, only the first cache limb (two for Binary64 fractional part) is ever nonzero, and huge speedups can be obtained. But, this tends to complicate the code, but it does not need to, not really. So, the main reason I haven't posted one, is that I haven't yet done the work needed to do it right with code that does not make me cringe in shame. I probably should , because it is the sort of thing I can still accomplish (me being a burned out husk of a man), and it seems it would be useful to many; it'd delight me to be able to help that way.

(I even know how to implement the various tie-breaking rounding rules: floating-point on most architectures uses round exact-half to even.)

Quote from: eugene on August 19, 2022, 02:41:55 pm

But really I'm just looking for something I can find online. Academic papers are fine (I am trained in physics but self-taught in CS.)

That's why I was hoping Bruce could publish his work. The current ones use arbitrary-precision BigNum constructs instead of the fixed-point analogue approach, you see.

(I have a background in computational materials physics myself, but consider myself more of a toolmaker, since I've specialized in developing non-QM molecular simulators and cluster or distributed-parallel processing and such. No CUDA though, me dislike single-vendor dependencies.)

DiTBho · « **Reply #112 on:** August 19, 2022, 07:21:36 pm »

Quote from: Nominal Animal on August 19, 2022, 06:25:48 pm

Now, the writeonly_buffer_state() will report if the buffer was not large enough to format everything we wanted to

that's the same job done by safestring, silently and hidden so I can focus on other things

DiTBho · « **Reply #113 on:** August 19, 2022, 07:29:42 pm »

Quote from: Nominal Animal on August 19, 2022, 06:25:48 pm

Code: [Select]
typedef struct { unsigned int flags; /* INT_FORMAT_ flags */ int min_value; /* Minimum value for clamping, inclusive */ int max_value; /* Maximum value for clamping, inclusive */ signed char width; /* Formatted total width */ signed char decimals; /* Number of fractional digits */ unsigned char padding_char; /* Padding character */ unsigned char point_char; /* Decimal point character */ } int_format;

this is ok, basically even if option is expresses by a string it could be parsed to extract all these points

Nominal Animal · « **Reply #114 on:** August 19, 2022, 07:31:36 pm »

Quote from: DiTBho on August 19, 2022, 07:21:36 pm

Quote from: Nominal Animal on August 19, 2022, 06:25:48 pm
Now, the writeonly_buffer_state() will report if the buffer was not large enough to format everything we wanted to
that's the same job done by safestring, silently and hidden so I can focus on other things

Yes, but note that it is only used before the buffer contents are needed. You, too, will have a function operating on safestring that checks for that at some point. I just already exposed that accessor function, that's all; you're still hiding it underneath safestring.

Basically, writeonly_buffer is an example of what your safestring will become, if you decide to support windowed buffering at some point: the generation of strings longer than the buffer itself.

Now, I still omitted the "stream" abstraction layer, the thing that can flush the already filled part of the buffer, making room for further data. That's the one that will be using writeonly_buffer_state() to decide when to do that and find out when the entire formatting operation has been done. Because I want my formatters to be usable even on static buffers, I cannot incorporate those to the writeonly buffers yet (unlike you can, into safestring, unless you too move to windowed buffers).

DiTBho · « **Reply #115 on:** August 19, 2022, 08:17:11 pm »

Two days ago I created a bit library to support artificial datatype like uint128, uint256, uint512.

Code: [Select]

void test_umul128()
{
    uint32_t    i0;
    uint32_t    carry;
    uint128_t   xA;
    uint128_t   xB;
    uint128_t   xC;
    p_uint128_t p_xA;
    p_uint128_t p_xB;
    p_uint128_t p_xC;

    p_xA = get_address(xA);
    p_xB = get_address(xB);
    p_xC = get_address(xC);

    uint128_let_with(p_xA, 1);  /* A = 1 */
    uint128_let_with(p_xB, 10); /* B = 10 */
    uint128_let_with(p_xC, 0);  /* C = 0 */

    fshow("xA=0x", p_xA, "\n", "bhex");
    fshow("xB=0x", p_xB, "\n", "bhex");

    for (i0 = 1; i0 < 38; i0++)
    {
        carry = uint128_mul(p_xC, p_xA, p_xB); /* C = A x B */
        uint128_let_to(p_xC, p_xA);            /* A = C */
        fshow("cycle-", p_uint32(i0), " ", "bdec #0,2"); /* cycle-01 .. cycle-10 .. cycle-37 */
        fshow("10^", p_uint32(i0), "=", "bdec");         /* 10^1 .. 10^10 .. 10^37 */
        fshow("0x", p_xC, "\n", "bhex");
    }
    fshow("carry=0b", p_uint32(carry), "\n", "bbin"); /* overflow ? */
}

(fshow is the wrapper described some posts ago
here, it calls format_uint128 or format_uint32
depending on the datatype of its argument)

This is a real code example, see how things look in practice with the old string-options

(bhex is, at the moment, the only working format for those big numbers
going to implement bdec the soonest possible)

DiTBho · « **Reply #116 on:** August 19, 2022, 08:28:20 pm »

p.s.
money-formats also need to be supported
"1000 euro" and "21 cents" = "1.000,21 euro"

DiTBho · « **Reply #117 on:** August 19, 2022, 11:52:07 pm »

Quote from: Nominal Animal on August 19, 2022, 07:31:36 pm

Basically, writeonly_buffer is an example of what your safestring will become, if you decide to support windowed buffering at some point: the generation of strings longer than the buffer itself.

I just checked what Elizabeth wrote as a note before the introduction of safestring years ago.

Quote

Turns out this is a stupid memcopy error!!!

Will someone please kill the C language ?
Or at least make a runtime lib that does boundary checking so that this kind of crap can never occur ?

Arrays should have a structure in them telling their size, as opposed to allowing memcopy to take whatever size you feed it.

The runtime for memory operation must do three things :
-1- store the real size of the array in the array
-2- flush a newly created array with all zeroes, upon resize flush the released space or claimed space with zeroes.
-3- each add must go through a method, which does boundary checking and invokes panic if found broken

I added her lines as motivation for the library in motivation.txt, then I after which I decided to write the library as she asked me to reduce the time we wasted debugging similar stuff

Technically safestring operates on a circular buffer and there is a callback that can be used to *flush* things once tail reaches head (buffer full), the callback is currently NULL; the feature has never been really used, but ... it can be resumed ... to support windowed buffering.

Never tried, it's there because Elizabeth asked me it, then forgot about it, which is good, because I could recycle the code for other projects, including experimental stuff like myC

Which brings to Who is Elizabeth? I guess you guessed it ... yes she is the boss, the person who leads and coordinates projects preparation, and pays the salary, hence is the person to whom you can only say " yes ma'am, it will be implemented" ...

... and then forget about it

(well, in this case, it turned out to be a good idea)

SiliconWizard · « **Reply #118 on:** August 20, 2022, 12:13:38 am »

Blame the tools when you make a mistake!

Nominal Animal · « **Reply #119 on:** August 20, 2022, 06:04:34 am »

Quote from: DiTBho on August 19, 2022, 11:52:07 pm

Technically safestring operates on a circular buffer and there is a callback that can be used to *flush* things once tail reaches head (buffer full), the callback is currently NULL; the feature has never been really used, but ... it can be resumed ... to support windowed buffering.

There are some important differences, like mine allows random write access into the entire virtual buffer and storing any continous window of it, whereas yours is limited to forward-going windows, and unlike yours, mine explicitly will not have the flush capability because the windowing is controlled at a different level of abstraction, but yes.

Also, to implement some of the very useful operations making e.g. integer formatting even simpler, I really should implement writeonly_buffer_memcpy(wo,deststart,sourcestart,length) and writeonly_buffer_reverse(wo,deststart,sourcestart,length), because they can make formatting things so much easier. In particular, the digits can be initially stored in increasing order of importance to the buffer as part of obtaining the number of decimal digits in it, and then just swapped and moved to the correct place.

So it is definitely a work in progress.

I admit, I have found myself thinking about
len = format_float(charbufferptr, length, formatspecptr, floatval, fcacheptr);
len = format_double(charbufferptr, length, formatspecptr, doubleval, dcacheptr);
and their implementation on 32-bit architectures like ARM, using integer math only, for the last few hours. If I do get something in a form I can post, I shall start a new thread in here (Programming sub-forum) and post a note in this thread too. They will definitely not be "best" in any sense, but they might be useful and informative.

eugene · « **Reply #120 on:** August 21, 2022, 04:42:48 pm »

Quote from: Nominal Animal on August 19, 2022, 06:25:48 pm

Okay, here is an example program.
[...]

Thank you! It will take me a while to grok all of it. Actually, it will take me a while to get around to even attempting it, but I do have an upcoming project that will require me to store 32 bit floats and occasionally convert them to char arrays with limited precision. So I will almost certainly use some of what you wrote... eventually.

Quote from: Nominal Animal on August 19, 2022, 06:57:37 pm

Quote from: eugene on August 19, 2022, 02:41:55 pm
I like Dr Animal's idea of using a pointer to a struct instead of a format string, so a function declaration might look something like
Hey, it's my idea, not some goddamn PhD's! (And if that was a honorific, drop it: I'm a Finn, we don't use them. And even if we did, I'm not one; I'm just another Uncle Bumblefuck here trying to be of use.)

Sorry. That's a mistake I will not make a second time!

westfw · « **Reply #121 on:** August 21, 2022, 09:11:02 pm »

y'all are addressing technical issues without paying much attention to the cosmetic but very real desire of people to be able to tell what the output is going to look like by reading the source code.

IanB · « **Reply #122 on:** August 21, 2022, 09:49:24 pm »

Quote from: westfw on August 21, 2022, 09:11:02 pm

y'all are addressing technical issues without paying much attention to the cosmetic but very real desire of people to be able to tell what the output is going to look like by reading the source code.

I've long since stopped reading this thread.

It's insane how people can make such a palaver over such a simple requirement as printing out a number in a form suitable for accountants, scientists or engineers to read.

brucehoult · « **Reply #123 on:** August 21, 2022, 10:43:25 pm »

Quote from: IanB on August 21, 2022, 09:49:24 pm

Quote from: westfw on August 21, 2022, 09:11:02 pm
y'all are addressing technical issues without paying much attention to the cosmetic but very real desire of people to be able to tell what the output is going to look like by reading the source code.

I've long since stopped reading this thread.

It's insane how people can make such a palaver over such a simple requirement as printing out a number in a form suitable for accountants, scientists or engineers to read.

Oh gosh. Simple requirement doesn't imply simple implementation. That's like MS-DOS people asking why you'd want to waste all that lovely CPU power rendering the Mac UI.

And in this case, it's not all that much harder to do it right than to do it wrong, assuming you work from published algorithms rather than trying to roll your own.

Scientists and engineers might not care what is happening in the last decimal place, but accountants sure do!

PlainName · « **Reply #124 on:** August 21, 2022, 10:56:12 pm »

Quote

It's insane how people can make such a palaver over ...

Er, how would you know? You've stopped reading the thread, remember


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Best thread-safe "printf", and why does printf need the heap for %f etc? (Read 16159 times)

Share me