Author Topic: Best thread-safe "printf", and why does printf need the heap for %f etc? (Read 16652 times)

DiTBho · « **Reply #125 on:** August 21, 2022, 11:50:16 pm »

I wished someone commented about the old format options, instead, or about the new struct format spec.

Probably both are needed, but I am not sure, but for sure that bloody Printf does no need further comments.

IanB · « **Reply #126 on:** August 22, 2022, 04:43:03 am »

Quote from: DiTBho on August 21, 2022, 11:50:16 pm

Probably both are needed, but I am not sure, but for sure that bloody Printf does no need further comments.

There is only one certainty in the world. If you try to eliminate printf and replace it with atomic output functions, then sooner or later someone will reinvent printf or its equivalent. It is one of those things that has to exist.

IanB · « **Reply #127 on:** August 22, 2022, 04:53:21 am »

Quote from: brucehoult on August 21, 2022, 10:43:25 pm

Oh gosh. Simple requirement doesn't imply simple implementation. That's like MS-DOS people asking why you'd want to waste all that lovely CPU power rendering the Mac UI.

And in this case, it's not all that much harder to do it right than to do it wrong, assuming you work from published algorithms rather than trying to roll your own.

Scientists and engineers might not care what is happening in the last decimal place, but accountants sure do!

In the 1960's Fortran was able to do a perfectly satisfactory job of printing floating point numbers on machines with about 128 K of memory. Every compiler ever written since then has been able to do the same thing. If I ever need to write a compiler I will go back to original references. I see no reason to believe this is not a solved problem.

brucehoult · « **Reply #128 on:** August 22, 2022, 09:14:57 am »

Quote from: IanB on August 22, 2022, 04:53:21 am

Quote from: brucehoult on August 21, 2022, 10:43:25 pm
Oh gosh. Simple requirement doesn't imply simple implementation. That's like MS-DOS people asking why you'd want to waste all that lovely CPU power rendering the Mac UI.

And in this case, it's not all that much harder to do it right than to do it wrong, assuming you work from published algorithms rather than trying to roll your own.

Scientists and engineers might not care what is happening in the last decimal place, but accountants sure do!

In the 1960's Fortran was able to do a perfectly satisfactory job of printing floating point numbers on machines with about 128 K of memory. Every compiler ever written since then has been able to do the same thing. If I ever need to write a compiler I will go back to original references. I see no reason to believe this is not a solved problem.

Well, that's where you are wrong. I perfectly well remember writing FORTRAN programs in the 1970s and early 80s and getting a lot of results like 2.359998.

128 K of memory is a LOT. Doing a proper job needs less than 1 K, temporarily, during the conversion from double precision. Single precision needs a lot less -- something like 150 bytes.

IanB · « **Reply #129 on:** August 22, 2022, 04:52:52 pm »

Quote from: brucehoult on August 22, 2022, 09:14:57 am

Well, that's where you are wrong. I perfectly well remember writing FORTRAN programs in the 1970s and early 80s and getting a lot of results like 2.359998.

That's interesting. But of course, if you had asked for four decimal places you would got 2.36.

Quote

128 K of memory is a LOT.

Could someone remind modern programmers, who think nothing of requiring 16 GB for a modern workstation?

DiTBho · « **Reply #130 on:** August 22, 2022, 09:43:37 pm »

Quote from: brucehoult on August 22, 2022, 09:14:57 am

Single precision needs a lot less -- something like 150 bytes.

I likely did it wrong, I implemented 256bit unsigned logic because I need it for other stuff, so it got exploited for computing the base10 printable fractional part of fp32_t.

You deal with numbers >~ 10^37 base10.

The uint256 library supports all the operators { +, -, *, /, %, logic, bitwise, cmp, shift, rotate }.
For MIPS32 it consumes 2K byte of rom space, and 518 byte all inclusive at run-time (including stack space) for format-uint256 and the buffer used by safestring.

so, half Kilo byte for ram from "show" to "chars sent on the serial port" to print a fp32 number by using uint256, safestring and format-uint256

myC on MIPS4++ consumes 634 byte

DiTBho · « **Reply #131 on:** August 22, 2022, 09:58:23 pm »

I mean, extract the fractional part

Code: [Select]

integer_part "." fractional part

fractional part = { bit0=1/2, bit1=1/4, bit2=1/8, ... }

assume bit0 is 5000000000000...000 base10, uint256, limited < 244 bit
assume bit1 is 2500000000000...000 base10, uint256, limited < 244 bit
assume bit2 is 1250000000000...000 base10, uint256, limited < 244 bit
(values pre-calculated and stored in a LUT)

sum contributes and got the final big number uint256
convert the first(1) high n digits into string
optionally, round the last digit to the right, instead of truncating

that's the done job.

a lot of cycles, no doubt about
simple approach with no black magic tricks

(1) baseN to string works in reverse
from right to left, so ... you need an extra uint256bit division by 10^k here

brucehoult · « **Reply #132 on:** August 23, 2022, 01:21:36 am »

Quote from: IanB on August 22, 2022, 04:52:52 pm

Quote
128 K of memory is a LOT.

Could someone remind modern programmers, who think nothing of requiring 16 GB for a modern workstation?

16 GB RAM is entirely reasonable for a *workstation*, given that:

1) working effectively requires multiple opened windows with web pages, PDFs etc. The ARMv8.1-A manual is over 50 MB and I have others that are hundreds of MB. Every shutter click of my rather old camera makes a 15 MB JPG and newer ones much more (let alone RAW). Movies for demonstrations, tutorials etc are many GB in size and need to be edited, filtered, compressed etc.

2) 16 GB DDR4 costs about 40 minutes of my salary (not even the grossed-up one including overheads and benefits etc). If 1 TB of RAM increased productivity by 10% it would be a completely reasonable purchase.

On the other hand, I and many others on this board are regularly writing code that RUNS on hardware with 64 KB, 16 KB, 2 KB (ATMega328), 512 bytes (ATTiny85) or even less RAM.

brucehoult · « **Reply #133 on:** August 23, 2022, 01:26:54 am »

Quote from: DiTBho on August 22, 2022, 09:43:37 pm

Quote from: brucehoult on August 22, 2022, 09:14:57 am
Single precision needs a lot less -- something like 150 bytes.

I likely did it wrong, I implemented 256bit unsigned logic because I need it for other stuff, so it got exploited for computing the base10 printable fractional part of fp32_t.

For single precision you need around 160 bits *per variable*. 256 is overkill, a little. But you need both numerator and denominator that size, plus a handful of other working variables. 150 bytes total is probably too generous, but I think 100 bytes is not quite enough.

DiTBho · « **Reply #134 on:** August 23, 2022, 01:40:59 am »

Quote from: brucehoult on August 23, 2022, 01:26:54 am

For single precision you need around 160 bits *per variable*. 256 is overkill, a little.

uint128 was not enough. The choice was between 128 and 256, but since I need uint256 for other stuff, i chose to use it even for this job.

brucehoult · « **Reply #135 on:** August 23, 2022, 02:02:54 am »

Quote from: DiTBho on August 23, 2022, 01:40:59 am

Quote from: brucehoult on August 23, 2022, 01:26:54 am
For single precision you need around 160 bits *per variable*. 256 is overkill, a little.

uint128 was not enough. The choice was between 128 and 256, but since I need uint256 for other stuff, i chose to use it even for this job.

Sure, of course. No harm in using integers slightly larger than absolutely necessary if you've already got the code and the memory space. On an 8 bit machine I think you technically could use 152 bits, but a 16 or 32 bit machine is going to round up to 160 bits, and a 64 bit machine to 192 bits.

westfw · « **Reply #136 on:** August 23, 2022, 06:54:36 am »

was a bit surprised not to find (in a somewhat quick web search) a C implementation of Fortran formatted output. Something that would parse an actual Fortran format specifier string and produce compatible output. I would have thought that the inefficiency would be small compared to the general cost of I/O, and the speed of modern CPUs. Or maybe even mitigated by some sort of JIT interpreter or magic pre-processing. Just as a boon to interaction with old code,

Is there really no such thing?

brucehoult · « **Reply #137 on:** August 23, 2022, 08:04:15 am »

Picture that!

DiTBho · « **Reply #138 on:** August 23, 2022, 08:31:24 am »

I don't know Fortran, What don't you like about format-${datatype} struct specs (NominalAnimal's idea)? What do you like about Fortran formatted output?

Examples, please

I need to support Fortran only because it is mandatory in Gentoo.

Code: [Select]

Using built-in specs.
 --enable-languages=c,c++,fortran

Therefore I have its profile for Catalyst extended with +=ada, but I have never programmed anything in Fortran, an I am investing time with Haskell to master functors and functoids monads.

DiTBho · « **Reply #139 on:** August 23, 2022, 02:39:59 pm »

Quote from: brucehoult on August 23, 2022, 02:02:54 am

32 bit machine is going to round up to 160 bits, and a 64 bit machine to 192 bits.

ok, I abstracted the library support for both C and myC, and now I have
- uint128_t
- uint160_t
- uint192_t
- uint256_t

they are fully supported with all the operators { +, -, *, /, bitwise, logic, cmp, inc, dec, shift, rotate } and pointers.

% is missing, and sint versions are not yet supported, only unsigned logic for now;

What I love of myC, you can do this

Code: [Select]

uint192_t a0;
uint192_t a1;
uint192_t a2;
uint192_format_spec_t uint192_format_spec; /* yeah, I like the NominalAnimal's way */
dev_t dev;

a0 = 1;
a1 = 10^37;
a2 = a0 + a1;

p_dev = dev_open("/dev/ttyS1", "rw", panics_on_failure ); */

uint192_format_spec = 
{
    .width = 40; /* this way, it will be right aliment, stuffed with ' ', which is default */;
};
a2'show(p_uint192_format_spec); /* it will invoke 'format, and will output on the default console */
a2'fshow(p_dev, p_uint192_format_spec); /* will output on the new console, on both safestring internally uses the dev's buffer  */

a2'show2(p_dev, "C=", "\n"p_uint192_format_spec); /* show with prologue and epilogue */

dev_close(p_dev);

operations on uint192 datatype look native to the user while they are artificial.

C++ offers something similar, but you need to overload things.
With C ... you have explicit methods ${datatype}_let, ${datatype}_add, ${datatype}_show, etc ... a bit too verbose, but it's the price to pay.

Frankly, I do find it nice, useful, comfortable, light years ahead from printfs.

How can people not appreciate it and love it at first sight?!?

p.s.
Approved even by Ania, she likes it a lot!

PlainName · « **Reply #140 on:** August 23, 2022, 02:47:53 pm »

Quote

a2'show(p_uint192_format_spec)

What is the ' character? Rather, what does it mean?

DiTBho · « **Reply #141 on:** August 23, 2022, 04:13:43 pm »

Quote from: dunkemhigh on August 23, 2022, 02:47:53 pm

Quote
a2'show(p_uint192_format_spec)

What is the ' character? Rather, what does it mean?

Ada style for property

In myC the lexer passes it as property_token
The lexer works with a dictionary
If you don t like a keyword you can easily redefine it
It is possible even at runtime, but you d better recompile the lib_tokener because during the process the dictionary is optimized by auto hashes which make find 10x faster

Candidates were : ' of

Of looked too verbose
' like ada, so I chose it

The parser can look at the datarype properties
A method is a property of datatype in myC
When the parses sees a property token it looks at the datatype table, if the request matches and it s a method, it gets the function address of the method (e.g. show) and prepares the function call by passing parameters
The first parameter is an inner pointer to the datarype, the second is an inner datarype kind
What you call typedef enum in C, each datarype has its id

All automatic, simpler than in C++, but ...
... everytime you add a datarype in myC you have to recompile the compiler to fully use your datatype as native datatype
I recompile it ... Up to 5 times per day ... O man
Differential compile only compiles three modules and it takes 90 seconds on my macmini Intel core duo
Not bad

Anyway C++ is superior here in every means, even if it uses dot for methods because it has classes and a more complex mechanisms to overload stuff

PlainName · « **Reply #142 on:** August 23, 2022, 05:46:21 pm »

Ah. Thanks

SiliconWizard · « **Reply #143 on:** August 23, 2022, 07:12:13 pm »

Just a question, but how did you validate the "myC" tool? From what I got, you use it for relatively safety critical stuff?
Validation would be a major endeavour and something I would tend to shy away from in practice for this reason.

DiTBho · « **Reply #144 on:** August 23, 2022, 07:55:35 pm »

Quote from: SiliconWizard on August 23, 2022, 07:12:13 pm

Just a question, but how did you validate the "myC" tool?

Each major version of our ICE support includes external (paid) activities to verify it.
And even this way, it cannot compete with Green Hills' tools if this is the question

edit: cut

DiTBho · « **Reply #145 on:** August 27, 2022, 12:24:25 am »

Quote from: westfw on August 23, 2022, 06:54:36 am

was a bit surprised not to find (in a somewhat quick web search) a C implementation of Fortran formatted output. Something that would parse an actual Fortran format specifier string and produce compatible output. I would have thought that the inefficiency would be small compared to the general cost of I/O, and the speed of modern CPUs. Or maybe even mitigated by some sort of JIT interpreter or magic pre-processing. Just as a boon to interaction with old code,

Is there really no such thing?

is it like this?

Nominal Animal · « **Reply #146 on:** August 27, 2022, 06:02:24 pm »

I've done some testing for formatting floats, and it's looking pretty nice (not complete yet, though).

I'm using a temporary work area of 8×32 bits (32 bytes), six of which form a 6×28-bit unsigned integer ("work area", six 28-bit limbs) used in the calculation.

Technically, I'm calling m28f the fractional type, with limb order most significant first. That is, bit 27 in the first limb corresponds to value 2^-1 = 0.5, bit 0 to 2^-28 = 0.000000003725, and so on. I'm calling m28i the integral type, with limb order least significant first: bit 0 in the first limb corresponds to 2⁰ = 1, bit 27 to 2²⁷ = 134217728.

When moving the mantissa (with the implicitly set bit 23 (24th) if exponent is nonzero), the mantissa can span two limbs.

The order can seem odd, but the first limb is always the one next to the decimal point. Furthermore, the key operation –– multiply by ten with carry for the fractional part, divide by 10 with remainder for the integer part –– proceeds always towards the first limb, starting at the furthest nonzero limb. Since it is trivial to keep track of the furthest nonzero limb (starting at when the mantissa is moved to the limbs, one only needs to check when it becomes zero, and move to the next closer limb), we do not need to check or conditionally operate on zero limbs; we only deal with the limbs that we need to, and no more.

What's with the 28 bit radix? Well, it turns out that using an extra 14% of memory for the limbs, all base operations stay 32-bit. Extracting a decimal digit from the integer part (divide by 10 with remainder) only requires two 32-bit multiplications, some bit shifts, and additions per active limb; and extracting a decimal digit from the fractional part (multiply by 10 with carry) only requires one, plus some bit shifts and an bitwise and, per active limb. No floating-point or 64-bit operations are needed at all –– my test code does not use any of the compiler support functions (__udivdi3 et cetera) on x86, risc-v rv32gc, or 32-bit ARMs.

To divide a 32-bit number by ten, you simply multiply it by 3435973837 = 0xCCCCCCD, and shift the result right by 35 bits. Most 32-bit architectures have a multiply-high instruction, which returns the 32 high bits, so the result only needs to be shifted right by three bits. (So, a divide by ten with remainder is two 32-bit multiplications (one multiply-high, one normal unsigned multiply), subtraction, and a shift right by three bits.)

When multiplying by ten, the high 4 bits of the 32-bit result forms the carry, which is added to the result of the next higher multiplication. This cannot overflow, because 10×0x0FFFFFFF+9 = 0x9fffffff. So, a multiply by ten with carry is just one 32-bit multiplication and addition; plus a bit shift to extract the carry, and a bitwise and to extract the result.

In summary, we're talking about exact precision and rounding according to IEEE-754 rules (or whatever you want), with something like a dozen or two cycles per decimal digit emitted, plus a few dozen cycles overhead, with a tight upper bound on the memory needed (which can be statically allocated beforehand, or allocated internally on stack via alloca()/__builtin_alloca()).

Finally, the exact same procedure applies to double precision, except that the temporary storage needs 39 limbs, and a temporary work area of about 1312 bits or 164 bytes. The mantissa is 52-bit (if subnormal) or 53-bit (high bit implicitly set if exponent is nonzero), so it can span three limbs initially. But the base operations stay exactly the same.

I find this rather exciting, I must say, since it looks like it is way more efficient than any standard C library implementation I've seen, while still capable of producing the exact same results. I am currently rejecting infinities and NaNs with an error, but they're trivial to add in if someone wants to. I am having a bit of a struggle to decide which rounding modes I want to support; IEEE defaults to round to nearest breaking ties to even, and I like round to nearest breaking ties away from zero, and while others are possible, it might be nice to KISS and omit stuff not needed/used at run time. For example, does anyone need to actually change the tie-breaking mode at runtime? I don't think so, but...

DiTBho · « **Reply #147 on:** August 27, 2022, 11:31:25 pm »

For both fixed point and floating point, to show the decimal part I am using uint160t, inside it uses uint32 operations

Functoids can distinguish between valid and not valid with a proper behavior ( panic on nam, overflow, underflow, div by zero) or just format a different string.

Love Haskell for this, but it s functional programming, difficult to be replicated with imperative procedural programming, so I am using callbacks to mimic it in C.

westfw · « **Reply #148 on:** August 29, 2022, 12:19:58 am »

Quote

What don't you like about format-${datatype} struct specs (NominalAnimal's idea)?

I thought I already made clear that what I want it to be able to glance at the source code and tell what the output looks like.
A C printf format like "raw data = %6l (0x%8lx), converted = %6.3f\n"
or a fortran format like '"raw data = ", I6," (0x", Z8),", converted = ", F6.3'
Makes that relatively clear, compared to a string of individual statements for each part. (or compared to C++'s "streams.")

Quote

What do you like about Fortran formatted output?is it like this?

Yeah, more or less like that. Although those particular examples seem to delight in "remoting" the actual format string from the write statements, contrary to what I said I wanted above. I don't really LIKE Fortran formats, they've just stood the test of time pretty well, and seem to address some of the complaints I've seen about printf().

Have we mentioned symmetry of input and output formatting yet? (NOT the same as the symmetry of values.)
In Fortran you can write "2I3" with 123456 and get output like "123456", and then if your read with the same format you get two values 123 and 456 again, even without a delimiter between them. That was pretty important back when you were fitting things on 72column cards (or files that emulated card decks.) Maybe not so much any more.
(I had an interesting experience back in college. Me and another programmer "hired"l to write some Fortran code to pretty-print some data array. But we were not allowed to know what the data WAS (including stuff like the ranges of values, IIRC.) This turned out to be more difficult than you might imagine!)

DiTBho · « **Reply #149 on:** August 29, 2022, 01:01:09 am »

Quote from: westfw on August 29, 2022, 12:19:58 am

Makes that relatively clear

Prologue and the Epilogue looks a good compromise for me.
Now my interest is only in the string form of "specs".
The Fortran way, maybe something from it


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Best thread-safe "printf", and why does printf need the heap for %f etc? (Read 16652 times)

Share me