Author Topic: stdlib + MCU haters: std itoa() uses less resources!  (Read 17345 times)

0 Members and 1 Guest are viewing this topic.

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5835
  • Country: es
stdlib + MCU haters: std itoa() uses less resources!
« on: May 27, 2023, 04:29:29 pm »
Just had to show this to the old "I never use std libs in MCU" dogs  ;), showing it's not always the case.
(Sure enough, the flash usage will blast off when using any std print function)
Custom itoa, taken form here (Can it get more simple than this?):
Code: [Select]
char* _itoa(int val, int base){
    static char buf[32];
    int i=30;
    for(; val && i ; --i, val /= base)
      buf[i] = "0123456789ABCDEF"[val % base];
    return &buf[i+1];
  }

End result was standard itoa using less resources overall  ::). In custom itoa, sdtlib is not used at all.
« Last Edit: May 27, 2023, 04:33:41 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8110
  • Country: fi
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #1 on: May 27, 2023, 04:54:27 pm »
You have invented a pretty interesting strawman. stdlib haters? I have never heard anyone criticizing itoa() as big or bloated. Maybe you are confusing itoa() and printf()? I do use my own itoa()-like function, but it's not because of assumed size or resource use, it's usability: my own returns a pointer to the end of the string, allowing easier chaining and thus more maintainable code.

(It would make very little sense to duplicate itoa(). Try with fixed base 10 instead - that would bring the code size down a bit; of course assuming base 10 is all you need.)
« Last Edit: May 27, 2023, 04:56:10 pm by Siwastaja »
 

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5835
  • Country: es
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #2 on: May 27, 2023, 05:15:22 pm »
I guess the compiler optimization should already notice that, as the function is called only with base 10?
C'mon, we've all seen lots of very polarized duDes here, still thinking we're in the 64-byte ram MCU era!
#include <std-  STOP YOU FOOL!! Inefficient!! Wasting resources! Make it from scratch! Baremetal only! Waah!!  :-DD
« Last Edit: May 28, 2023, 02:52:27 am by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8110
  • Country: fi
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #3 on: May 27, 2023, 06:25:39 pm »
Alcohol?

Funny rant, but you are fighting against a made-up enemy. Your output would not make you look as ridiculous as it's now if you took a few moments to learn the basic concepts. A few points that would be helpful, but seeing you are a "us - them" direction brain, you are likely to ignore the chance to learn:

* bare metal refers to not using an operating system, such as RTOS. I'm sure your projects are mostly that way. A weird enemy to pick; maybe you think it means something else?
* I really can't remember anyone advicing against using the C standard library at all. It's part of the C standard and thus portable. Some particular functions are heavier (e.g. printf) than others (e.g. memcpy), and some are not always relevant on MCUs (e.g. fopen()). But using anything from C standard library considered harmful? A weird idea. I do remember Nominal Animal criticizing some C standard library interfaces, in particular file access functions (opendir() etc.), but this isn't very relevant in microcontroller systems.
* Recurring critique against some vendor-specific hardware abstraction libraries which abstract hardware poorly, limit the feature set of peripherals and lock in to a single vendor is what I have seen, and it's mostly spot-on. It's your call to make the right choice and you'll be responsible to your customers.
* Small RAM and ROM is pretty much still a thing. Very small and cheap gadgets have their place, not only in super inexpensive gadgets, but distributed IO controller nodes etc., too.
* Bigger reason than performance* to do things yourself is you get what needs to be done. Using existing work will limit the scope to that provided by said work.

*) not to say performance never matters

Last point is important; your opening post is a good summary of this recurring misunderstanding we are seeing, in other words:
>I need to do X
<Use libxxx, it does X, Y, Z, Å, Ä and Ö, and doing X requires doing Q, W, E, R, T, Y, U, I, O
>But I just need to do X so I wrote this 100 LoC piece which does X, why is it wrong?
<Geez, libxxx is 170kLoC and developed over tens of thousands of man-hours, why would you do all that from scratch?
>No but I'm not doing all that from scratch - guess what, nevermind.

Your opening post demonstrates this because you replicated the interface of itoa(), i.e., made another itoa implementation and compared performance to the existing work. Save for some very special cases, such work is not usually fruitful. Instead, you would write something similar (but not identical) to itoa(), but which solves your actual problem better. The reason for that work would not be poor performance of itoa(), but poor interface or functionality of itoa() for solving your particular problem.

The most important feature for a successful developer is to correctly identify when you are solving a truly complex problem which already has a solution you can reuse, and when you are not doing that.
« Last Edit: May 27, 2023, 06:40:28 pm by Siwastaja »
 
The following users thanked this post: janoc, tellurium, MMMarco

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21609
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #4 on: May 27, 2023, 06:40:07 pm »
Nice buffer overrun. :D

ASCII is built intentionally so that this kind of thing is easy: save memory by using digit = value % base + '0', and add 'a'-'0' - 10 (or 'A' if you prefer) when the remainder exceeds 9.  Or don't even bother with the subtraction and set the bit(s) directly, since number and alphabet sequences are aligned.

You'll also save stack frame size and initialization by setting that string to static const.  Always sanity-check your output when optimizing!

The one thing an implementation (as opposed to a fixed library blob) does gain is, if you're only using it for one base, the compiler can potentially collect and propagate const params and use multiplies or shift and mask for the arithmetic (if applicable).  (Remainder is possible by multiplication, but I think it's not commonly implemented by compilers?)

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8110
  • Country: fi
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #5 on: May 27, 2023, 06:43:15 pm »
I guess the compiler optimziation shoud already notice that, as the function is called only with base 10?

Not necessarily, because you wrote a library function replacement, i.e. a function with external linkage for which the compiler needs to emit code which can be called with any argument values, preventing many types of optimizations that would be possible with static functions. It is still possible the compiler implements two copies of the function one of which is optimized, especially at -O3, but don't count on it. You can easily find out by looking the compiler output.

BTW, this is also the reason why library functions can have quite significant CPU time overhead when the functions are short and called in inner loops, over a custom function which can be static (even inlined) and thus optimized by compiler with understanding of argument values used. This is also why compilers offer so called built-ins so when you think you are calling memcpy() (string.h, in libc), the compiler might be inlining a "custom" implementation of memcpy at that spot instead, for performance. This is also a good reason not to reinvent memcpy because compiler probably does better.
« Last Edit: May 27, 2023, 06:48:08 pm by Siwastaja »
 

Offline AVI-crak

  • Regular Contributor
  • *
  • Posts: 124
  • Country: ru
    • Rtos
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #6 on: May 27, 2023, 08:09:21 pm »
Can it get more simple than this?
Stop mocking common sense. I tested your code on cortex-m7, cortex-m0, RISC-V (CH32V307VC, CH32V003J4) - the _itoa(a,b) function throws garbage and sometimes crashes the program. On a completely clean project (absolutely bare metal) - _itoa(a,b) gave out garbage for conditional 189 processor cycles (there is hardware division and multiplication). At the last moment, _itoa(a,b) adds 4 bytes to the address of the string.
Testing was carried out on the maximum number of uint32_t (4294967295u).
My version runs in 98 cycles on cortex-m7.
Code: [Select]
char* u32_char (char* tail_txt, uint32_t value)//40
{
    *tail_txt = 0;
    uint32_t res, tmp;
    do{
        tmp = ((uint64_t) value * 3435973837UL >> 32);
        res = value + '0';
        value = tmp >> 3;
        tmp = value + (value << 2);
        res -= tmp << 1;
        *(--tail_txt) = res;
    }while (value > 0);
    return tail_txt;
};
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #7 on: May 27, 2023, 11:05:41 pm »
Just as a fun reminder - unless I missed it and someone already said it - but itoa() is actually not part of the standard C library.
It's an addition that you may, or may not find along with a given compiler/environment.

With GCC on Linux/x86_64, it is not available - I ran into this with some old code, lately!
 
The following users thanked this post: janoc, Siwastaja, JPortici, eugene, tellurium

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #8 on: May 28, 2023, 12:44:29 am »
Just had to show this to the old "I never use std libs in MCU" dogs  ;), showing it's not always the case.
(Sure enough, the flash usage will blast off when using any std print function)
Custom itoa, taken form here (Can it get more simple than this?):
Code: [Select]
char* _itoa(int val, int base){
    static char buf[32];
    int i=30;
    for(; val && i ; --i, val /= base)
      buf[i] = "0123456789ABCDEF"[val % base];
    return &buf[i+1];
  }

I can't spot the place where you're adding the "-" sign.
 

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5835
  • Country: es
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #9 on: May 28, 2023, 03:03:27 am »
Alcohol? Nope! Why? The bad writing?
My phone tends to f** my writing as the keyboard its not set to english, Gboard has certain issues that ruin typing in my native language when enabling multiple languages... So it either doesn't correct anything or does it with garbage.

I know it's not a fully implemented itoa!
I just needed to show some positive integers up to 16bit size, for only that it works just fine!
I've been running a bouncing ball the entire afternoon, the second counter is already 30530 right now.

The point was just to show how the most skinny and crippled itoa ever existed would still use more resources than the standard one.
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #10 on: May 28, 2023, 03:42:15 am »
It would pretty much entirely depend on the platform. For MCUs, the std lib is most often implemented by newlib, and even so there are several versions depending on options.

For those curious, this is the core of the implementation of itoa() in newlib (it's a cascade of functions as is usual in newlib, but the bulk of the work is this):
https://github.com/leaningtech/cheerp-newlib/blob/master/newlib/libc/stdlib/utoa.c

As I mentioned, technically itoa() is not part of the standard C lib and may not be present. If it's not, you'll know why.
If you stick to newlib, chances are it'll always be there though.
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 822
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #11 on: May 28, 2023, 04:01:28 am »
Quote
The point was just to show how the most skinny and crippled itoa ever existed would still use more resources than the standard one.
Not sure how that could be, but we do not know the mcu in use or the particular library version being compared to.

Cortex-m newlib itoa/utoa source, along with the skinny and crippled version given-
https://godbolt.org/z/qc9co79fv

Not sure how the skinny/crippled one could be producing more code unless it was being used inline with multiple uses. No matter which mcu was in use, as all library versions essentially look the same. edit-  the ram use would be explained by the static buffer assuming there was no equivalent testing the library version (if using the stack for example, will not show up in ram stats).

The bigger problem, aside from the fact that a library replacement is created that doesn't match the library function signature and does not have the same functionality (and it then becomes something different where comparisons become somewhat meaningless), is the use of a static buffer shared by anyone who wants to use itoa. That would be fine up until its not, and the not will be found at some point. There is a good reason the library version takes a char* as an argument.

I happen to think in many cases in makes more sense to skip right over the 'minor' formatting league and just use snprintf/printf/equivalent from the start. Once that library is brought in, use it everywhere as it becomes essentially free after the first use. So much easier than the bits and pieces one always ends up with when the original intention was small/light but eventually turned into complex/touch-it-break-it code.
« Last Edit: May 28, 2023, 04:34:53 am by cv007 »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #12 on: May 28, 2023, 04:49:43 am »
Not sure how the skinny/crippled one could be producing more code unless it was being used inline with multiple uses.

If using the function more than once in the same same file as it's defined, then the compiler will inline it (unless you go with no or basic  optimization level) and that would be the main reason why the overall code size would be slightly larger indeed.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #13 on: May 28, 2023, 07:36:37 am »
I started a thread about this a couple of years ago.  Reply #1 includes the signed 16-bit integer to decimal, backwards, via divide-by-ten and modulus, as an example.  It also shows how to do it very fast on architectures with fast subtract-with-carry, without relying on multiplication or division at all, as well.

While both avr-libc and newlibc include itoa(), it is not a standard C function at all, so really, stdlib is not exactly involved here.

(The actual non-stdio.h string-to-numeric conversion routines in the standard C library are strtol(), strtoul(), strtof(), strtod(), and since C99, strtoll(), strtoull(), strtold(), strtoimax(), and strtoumax().  (There are also corresponding functions for wide character strings.)
Unlike e.g. sscanf(), these do report errors in input.   These are of course inverse to how itoa() etc. operate; all standard string construction operations to generate a string from arbitrary integers are in <stdio.h>.)
 _ _ _ _ _

There is no reason for the "haters" hyperbole, even in jest, exactly because the standard C library is not an inseparable part of the C language.

The C standards specifically define "hosted" and "freestanding" environments, with "hosted" the one most consider "proper C", with all of the standard library features available.  "Freestanding" environment is one where the C standard library is not available, and only a subset of header files (basically those provided by the compiler for the target architecture, in practice) are accessible; things like <stdint.h>, for example.

That is also the reason for that thread, and my relatively scarce participation in "new programming language" threads.  If we understand the development history and pressures involved in the standard C library, instead of starting from scratch, we can simply replace the standard C library with something better and get most of the way there.  (Indeed, I have come to suspect that a single base language change regarding pointers and arrays would suffice to let the compiler detect all cases where buffer underrun or overrun is possible.  That, combined with a "replacement standard library" using knowledge hard-earned in practice over the last three decades or so, would be a huge leap forward, in my opinion.)

That thread I started involved some ideas regarding string construction in very limited or constrained situations, i.e. using minimal resources, exactly like when one is programming microcontrollers, for example.  Aside from reserved function names and naming, it is perfectly "standard" C to use the freestanding environment, and implement your own base library.  Usually, some things (syscalls, when running under an OS kernel or hypervisor) do require extensions or external implementations or extended inline assembly; I personally favour the last (with GCC and Clang).

The same applies to C++ as well, except that C++ leaves almost all of the freestanding environment implementation-defined; and this leads to many freestanding developers to actually use a subset of C++ or mix of C and C++ freestanding environments, especially on Harvard architectures, as the standards have not grown separate address space support and it requires compiler extensions, with Clang supporting them well in both C and C++, but GCC only in C (leading to all of the oddness wrt. strings and flash memory accesses in Arduino, which uses GCC's C++ frontend on AVRs).

There is also a thread somewhere about how it is possible to format even float (IEEE 754 Binary32 or Binary64, single or double precision) exactly correctly (the rounding is annoying!) using very little resources.  This is because the <stdio.h> part of the C standard library has never been optimized for performance; its output is very carefully specified and tuned for correctness instead.  Similarly, it is very simple to parse limited ranges of floating-point numbers in decimal format (exactly correctly), and at least an order of magnitude faster than what standard C library implementations do; exactly because they are designed per the language spec for correctness and not for speed/throughput.  Mostly, it is the subnormal numbers and exponential notation (in cases that ends up requiring a bit-correct division by ten) that are the slowest and most resource-hungry to implement; and even they only require a limited-size conversion workspace (unlike most implementations in standard libraries, which use arbitrary-precision math for this).  There are even scientific papers in ACM and elsewhere about such conversions...
« Last Edit: May 28, 2023, 07:48:11 am by Nominal Animal »
 

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5835
  • Country: es
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #14 on: May 28, 2023, 08:56:59 am »
i don't think it's inlining it, I'm using Os + flto to pack it as much as possible.
The tested MCU is a Cortex M0 HK32F030M, using existing GCC from ST CubeIDE.
It only has 16KB flash so size matters! :)
« Last Edit: May 28, 2023, 02:18:10 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline AVI-crak

  • Regular Contributor
  • *
  • Posts: 124
  • Country: ru
    • Rtos
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #15 on: May 28, 2023, 09:33:32 am »
I just needed to show some positive integers up to 16bit size, for only that it works just fine!
Only some of them, and even with a limitation of 16 bits, are commendable.
The reminder "do not put kittens in the microwave" is not implemented in your function. Using the "int" input parameter is a very bad idea. On eight-bit MK and 64-bit ARM - the result will be unpredictable. Using division is very stupid, it is the slowest command on all architectures. The execution time of _itoa(a,b) for the number 2147483647 = 223 ticks.
Ideologically correct function for int16_t https://godbolt.org/z/vccEcazhs
 
The following users thanked this post: janoc

Offline jnk0le

  • Contributor
  • Posts: 40
  • Country: pl
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #16 on: May 28, 2023, 11:12:25 am »
the "standard" itoa/utoa is overbloated by design. The only bases that are actually used 99.99% of times are 2,10,16.

about newlib:
https://github.com/leaningtech/cheerp-newlib/blob/master/newlib/libc/stdlib/utoa.c#LL36C3-L36C64
This whole string will be put on the stack every time the function is called.
Which is of course slower than even hitting waitstates in .rodata lookups.
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 822
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #17 on: May 28, 2023, 11:45:15 am »
Quote
O don't think it's inlining it
What the compiler produces is not a secret, so you can see what its doing if you wish. I think your extra flash space used for the simple version is because its using integer division where the library version is using unsigned division (itoa does not do division, its call to utoa does).

Create a couple versions, compile and objdump, look for the division functions, the udiv version will be smaller than the div version. If you do not otherwise use both, there will be a difference due to one using udiv the other div.

edit- but maybe we should just let it rest so you keep using the better library version.
« Last Edit: May 28, 2023, 11:58:29 am by cv007 »
 
The following users thanked this post: SiliconWizard

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #18 on: May 28, 2023, 12:51:10 pm »
the "standard" itoa/utoa is overbloated by design. The only bases that are actually used 99.99% of times are 2,10,16.

While that is true, the cost of supporting 15 (or even 35) bases instead of 3 is very small.

Quote
about newlib:
https://github.com/leaningtech/cheerp-newlib/blob/master/newlib/libc/stdlib/utoa.c#LL36C3-L36C64
This whole string will be put on the stack every time the function is called.
Which is of course slower than even hitting waitstates in .rodata lookups.

I doubt that speed of converting integers to strings is very often a concern in embedded uses, even on a 1 MHz 6502 let alone a 24 or 48 or more MHz Cortex-M0. It's probably only updating a small display a few times a second (no point in doing it more often than a human can observe and react to), or writing to a log file where you don't want to be spewing out MBs of text every second. Something like 223 cycles (maybe 5-10 µs) doesn't really need to be reduced, and if it was 10x more probably would not matter.

The code size has got to be much more important in most cases.

Which makes me think including a 16 (or 36) byte string is pretty crazy.

Code: [Select]
int digitToChar(int digit){
    int res = digit + '0';
    if (res > '9') res += 'A'-'0'-10;
    return res;
}

Code: [Select]
  adds r0, r0, #48
  cmp r0, #57
  ble 1f
  adds r0, r0, #7
1:

8 bytes of code VS
a 16 or 36 byte string PLUS 4 bytes of code PLUS 4 bytes of literal pool = 24 or 44 bytes total

The pure code solution isn't even significantly slower.
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8110
  • Country: fi
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #19 on: May 28, 2023, 01:02:53 pm »
I agree code size is usually more important than execution speed for visualization / status print functions because it should not be in the timing-critical path.
 

Offline jnk0le

  • Contributor
  • Posts: 40
  • Country: pl
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #20 on: May 28, 2023, 01:59:42 pm »

Code: [Select]
int digitToChar(int digit){
    int res = digit + '0';
    if (res > '9') res += 'A'-'0'-10;
    return res;
}

Code: [Select]
  adds r0, r0, #48
  cmp r0, #57
  ble 1f
  adds r0, r0, #7
1:

8 bytes of code VS
a 16 or 36 byte string PLUS 4 bytes of code PLUS 4 bytes of literal pool = 24 or 44 bytes total

plus the code for memcpy call and related spilling in case of newlib.

Code: [Select]
          __utoa:
00000420:   addi    sp,sp,-72
00000424:   sw      s0,64(sp)
00000426:   sw      s1,60(sp)
00000428:   sw      a0,0(sp)
0000042a:   mv      s1,a2
0000042c:   mv      s0,a1
0000042e:   li      a2,37
00000432:   auipc   a1,0x0
00000436:   addi    a1,a1,-814 # 0x104
0000043a:   addi    a0,sp,20
0000043c:   sw      ra,68(sp)
0000043e:   jal     0x70e <memcpy>
00000440:   addi    a2,s1,-2
00000444:   li      a3,34
00000448:   li      a4,0
0000044a:   bgeu    a3,a2,0x462 <__utoa+66>
0000044e:   sb      zero,0(s0)
00000452:   li      s0,0
00000454:   lw      ra,68(sp)
00000456:   mv      a0,s0
00000458:   lw      s0,64(sp)
0000045a:   lw      s1,60(sp)
0000045c:   addi    sp,sp,72
00000460:   ret     
00000462:   lw      a0,0(sp)
00000464:   mv      a2,a4
00000466:   addi    a4,a4,1
00000468:   add     a5,s0,a4
0000046c:   mv      a1,s1
0000046e:   sw      a2,16(sp)
00000470:   sw      a4,12(sp)
00000472:   sw      a5,4(sp)
00000474:   jal     ra,0x16ac <__umodsi3>
00000478:   addi    a4,sp,20
0000047a:   addi    a5,a0,40
0000047e:   add     a0,a5,a4
00000482:   lbu     a3,-40(a0)
00000486:   lw      a5,4(sp)
00000488:   mv      a1,s1
0000048a:   sb      a3,-1(a5)
0000048e:   lw      a3,0(sp)
00000490:   mv      a0,a3
00000492:   sw      a3,8(sp)
00000494:   jal     ra,0x1680 <__udivsi3>
00000498:   lw      a3,8(sp)
0000049a:   sw      a0,0(sp)
0000049c:   lw      a4,12(sp)
0000049e:   lw      a2,16(sp)
000004a0:   bgeu    a3,s1,0x462 <__utoa+66>
000004a4:   lw      a5,4(sp)
000004a6:   add     a3,s0,a2
000004aa:   li      a4,0
000004ac:   sb      zero,0(a5)
000004b0:   sub     a1,a2,a4
000004b4:   bge     a4,a1,0x454 <__utoa+52>
000004b8:   lbu     t1,0(a3)
000004bc:   add     a1,s0,a4
000004c0:   lbu     a0,0(a1)
000004c4:   sb      t1,0(a1)
000004c8:   addi    a4,a4,1
000004ca:   sb      a0,0(a3)
000004ce:   addi    a3,a3,-1
000004d0:   j       0x4b0 <__utoa+144>
 

Offline AVI-crak

  • Regular Contributor
  • *
  • Posts: 124
  • Country: ru
    • Rtos
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #21 on: May 28, 2023, 02:05:22 pm »
I agree code size is usually more important than execution speed for visualization / status print functions because it should not be in the timing-critical path.

The number of lines in the program and the size of the firmware have a very weak relationship with each other.
I have a library with a total of more than 10k lines, the result of its work takes 142 bytes of function size, and 4 bytes of data. Because I'm very lazy, and I'm too lazy to read the documentation every time.
On the other hand, a person writes in the function "take a percentage of a number" - it turns out very compactly, literally one line. But in real code, a slow built-in function is called, sometimes twice.

A real programmer must be as lazy as possible: write once - use always and everywhere. You only need to think well once, and do not touch again. You can even forget what's inside - you've already done the work.
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 822
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #22 on: May 28, 2023, 05:28:36 pm »
Quote
Create a couple versions, compile and objdump, look for the division functions,...
I did create a couple versions to compare (simple code, -Os, and not optimized away), and the library version uses unsigned division and the replacement signed division. The signed division library function takes an extra 200 bytes over the unsigned version(for m0), the libray also brings in memset/memcpy, and library itoa functions (itoa/utoa) are about 130 bytes larger. In the end the 'minimal' version ends up about 46 bytes larger for the simple test example.

Its easy to draw the wrong conclusions by changing code and just looking at compiled size only. Its not unusual to change something trivial and end up with a code size that drastically changes. In this case, an obviously smaller function that ends up with larger code means there is something not obvious taking place, obviously.
 
The following users thanked this post: SiliconWizard

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #23 on: May 29, 2023, 01:14:20 am »
I doubt that speed of converting integers to strings is very often a concern in embedded uses, even on a 1 MHz 6502 let alone a 24 or 48 or more MHz Cortex-M0.
I agree.

The cases where it does matter is in specific types of devices, that for example parse HPGL (which includes decimal numbers) or G-code, common in 3D printers, vinyl cutters, plasma cutters, and so on.  (Many of the purely 2D ones use HPGL.)

So, it is not often, but when you do, it is quite important to have it efficient, correct, and robust.  (By robust, I mean error detection and aborting in case of numerical overflows, instead of best-effort parsing like e.g. sscanf() does.)

I first encountered this a couple of decades ago, parsing molecular simulation data from PDB (protein data bank) and similar format files; these are text-based files that for example describe the atom positions in a simulation.  When you have say a million atoms and a thousand frames, that's at minimum three short billion decimal/scientific notation floating-point numbers, and even at that time standard C library parsing functions became the speed bottleneck, even on bog-standard workstation-grade spinny rust hard drives.  Much more so now, with fast SSD drives.  So, even in some specific niches on desktop computing, this is still an issue.  A rare one, granted, but an important one when it happens to oneself.  It isn't fun waiting for minutes for simulation data to load, when it can be done in seconds, at I/O speeds.

Sure, there are dedicated binary formats, but text-based formats like JSON are still easiest/most reliable to use for data transfer across completely different systems.  (I do have added custom binary data file support to a couple of molecular dynamics simulators myself, but even then I had to write translators to/from the common text-based formats, simply because adding common binary file format support, no matter how standard, to all utilities accessing such data is just too much work: there are hundreds of them, and even I used easily dozens for each simulation.)
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #24 on: May 29, 2023, 03:37:36 am »
I first encountered this a couple of decades ago, parsing molecular simulation data from PDB (protein data bank) and similar format files; these are text-based files that for example describe the atom positions in a simulation.  When you have say a million atoms and a thousand frames, that's at minimum three short billion decimal/scientific notation floating-point numbers, and even at that time standard C library parsing functions became the speed bottleneck, even on bog-standard workstation-grade spinny rust hard drives.  Much more so now, with fast SSD drives.  So, even in some specific niches on desktop computing, this is still an issue.  A rare one, granted, but an important one when it happens to oneself.  It isn't fun waiting for minutes for simulation data to load, when it can be done in seconds, at I/O speeds.

That is text to binary, which is a different problem. As is floating point.

Also, I thought we were talking about machines where you have 2 KB or 16 KB of something of flash, and 1 or 2 KB of RAM, not big computers with megabytes if not gigabytes of RAM.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf