Author Topic: stdlib + MCU haters: std itoa() uses less resources!  (Read 16786 times)

0 Members and 1 Guest are viewing this topic.

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4191
  • Country: us
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #175 on: June 12, 2023, 08:15:40 am »


Quote from: brucehoult on Yesterday at 07:46:27 pm

Cortex-M0 has hardware 32x32->64 multiply (that usually takes 1 cycle), but doesn't have any instruction that directly calculates the upper 32 bits, so a 64 bit result will necessarily take much longer than a 32 bit one.


I don't understand what that sentence is supposed to mean.
CM0 only has 32*32=32bit multiply.
From the v6m ARM ARM:

Quote
The only multiply instruction supported in ARMv6-M performs a 32x32 multiply that generates a 32-bit
result, see MUL on page A6-159. The instruction can operate on signed or unsigned quantities.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 3947
  • Country: nz
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #176 on: June 12, 2023, 09:02:22 am »


Quote from: brucehoult on Yesterday at 07:46:27 pm

Cortex-M0 has hardware 32x32->64 multiply (that usually takes 1 cycle), but doesn't have any instruction that directly calculates the upper 32 bits, so a 64 bit result will necessarily take much longer than a 32 bit one.


I don't understand what that sentence is supposed to mean.
CM0 only has 32*32=32bit multiply.

Sorry, typo. As the rest of the sentence "doesn't have any instruction that directly calculates the upper 32 bits" makes clear, I meant 32x32->32.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14082
  • Country: fr
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #177 on: June 12, 2023, 07:43:18 pm »
I'm really struggling to see any circumstance in which a 32x32->64 multiply could be faster than a 32x32->32 one.

There isn't. At least not on a 32-bit processor, that just wouldn't make any sense.
Possibly on some odd 64-bit processor (that I do not know) that could be possible, say if they favored the 32x32->64 multiply and the ISA was such that getting the low 32-bit part would require an additional instruction.

Of course, which such an assertion above, we may be all the way back to the original post, a questionable analysis of a very particular case in a particular context with a particular compiler.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 5992
  • Country: fi
    • My home page and email address
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #178 on: June 12, 2023, 08:49:33 pm »
a questionable analysis of a very particular case in a particular context with a particular compiler.
Yup.  Microbenchmarking becomes the more difficult the shorter the target code is and the fewer clock cycles it takes, especially so on architectures with caches or multiple execution units or arithmetic-logic units: the surrounding code, including the elapsed time measurement, affects the target code too much for the measurements to be useful.  At some point, it becomes nonsensical, because the measurement uncertainty and biases exceeds the target duration.

Much better – but still microbenchmarking – is to implement a function in more than one way, and run them in loops with precomputed inputs, and measure the time taken to handle all inputs.  This is then repeated a few thousand times, and the durations recorded.  The most useful measure is not average (because the error in timing under normal operating systems is always positive: the task may be interrupted by other stuff) but median (or some other percentile).  The minimum is only academically interesting, in the sense that it is the time taken "when all stars happen to align"; an unrealistic minimum that may occur, but cannot relied upon to occur.  Median is an easy one to explain: in half the cases, the time taken was at most median.

Proper benchmarking involves taking a real world task and data set, and processing it using different implementations.  Then, of course, you don't benchmark a single operation, but the implementations.

Premature optimization, like trying to make the fastest atoi() you can before making sure it is a limiting bottleneck in your task at hand, is an extremely common mistake, especially among programmers without sufficiently wide experience: they spend a lot of time on "optimizing" something that has no effect on the end result, essentially wasting valuable time.  Most often, true optimization avoids having to do that thing altogether, and achieves an order of magnitude greater savings.

An example of that is reading lots of unsorted data from storage, when you need it in order, with a human waiting for the operation to complete.  You can discuss sort algorithms how much you want, but instead of reading all data and then sorting it, you can get the task done faster (using slightly more computing resources) by sorting the data as it becomes available, for example by reading each data entry into a binary heap or a (balanced) tree.  This is less important now with extremely fast SSD drives, but with e.g. SD cards (often used for removable storage on microcontrollers and embedded devices, even on phones) and other storage media with limited transfer rates, the insertion (online sorting new entry into the data structure) takes place during time which otherwise would be wasted waiting for new data to arrive.  The end result on these slower media is that even though you end up using more CPU cycles, the data is sorted and ready basically as soon as it all of it has been read, whereas the fastest offline sorting algorithm is only just starting its work at that point.
 
The following users thanked this post: newbrain, MMMarco

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14082
  • Country: fr
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #179 on: June 12, 2023, 08:57:02 pm »
Premature optimization, like trying to make the fastest atoi() you can before making sure it is a limiting bottleneck in your task at hand, is an extremely common mistake, especially among programmers without sufficiently wide experience: they spend a lot of time on "optimizing" something that has no effect on the end result, essentially wasting valuable time.  Most often, true optimization avoids having to do that thing altogether, and achieves an order of magnitude greater savings.

Yes, yes, and yes!

It's interesting to see many people navigating between these two extremes: either excessive optimization on unimportant stuff, or complete waste of resources to save a couple hours, or sometimes just actually minutes, of development time.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 3947
  • Country: nz
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #180 on: June 12, 2023, 10:50:06 pm »
Premature optimization, like trying to make the fastest atoi() you can before making sure it is a limiting bottleneck in your task at hand, is an extremely common mistake, especially among programmers without sufficiently wide experience: they spend a lot of time on "optimizing" something that has no effect on the end result, essentially wasting valuable time.  Most often, true optimization avoids having to do that thing altogether, and achieves an order of magnitude greater savings.

Yes, yes, and yes!

It's interesting to see many people navigating between these two extremes: either excessive optimization on unimportant stuff, or complete waste of resources to save a couple hours, or sometimes just actually minutes, of development time.

Small differences in execution time seldom make any difference, but still you do't want to needlessly pessimise things.

Code size, on the other hand, is an absolute that is easy to measure and often leads to second order speedups.

I can't believe how many people just throw -O3 at everything, often bloating code size by factors, for very little speed gain over -O1. And making caches that much less effective when you start measuring the whole program, not just that one function in isolation.
 
The following users thanked this post: MK14

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 5992
  • Country: fi
    • My home page and email address
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #181 on: June 12, 2023, 11:24:41 pm »
Another related detail is that code you intend to be temporary, often isn't.

My own criteria is "Will I curse myself, if I have to come back and maintain/modify this code after a year, after I've forgotten all the relevant details?"
It also reminds me to write comments that document my intent and the underlying idea or logic of the code; I want need those.
 
The following users thanked this post: newbrain

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26575
  • Country: nl
    • NCT Developments
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #182 on: June 12, 2023, 11:58:04 pm »
In the end optimisation is a bit of an art where you shouldn't lose sight of the big picture. Optimising a problem at the higher level by choosing the right algorithm is more effective than trying to optimise a randomly choosen algorithm. And there is also the factor of NRE costs. When I really need to optimise something (size / execution time), I do this to the level where it is enough. That will be the most cost effective solution. Unfortunately embedded programming is littered with dogmas that are supposed to be 'good generic rules' but more often than not lead to sub-optimal results for those who believe in these dogmas religiously.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 
The following users thanked this post: MK14, MMMarco

Offline AVI-crak

  • Regular Contributor
  • *
  • Posts: 121
  • Country: ru
    • Rtos
Re: stdlib + MCU haters: std itoa() uses less resources!
« Reply #183 on: June 13, 2023, 04:06:48 pm »
In the end optimisation is a bit of an art where you shouldn't lose sight of the big picture.
In a particular case, it is impossible to understand the meaning of the overall picture - because we are viewing a small part under a magnifying glass. Long consideration and discussion. So long - that the optimization process started naturally. (new version https://godbolt.org/z/bnfEbaxa1)

It is customary to draw the general picture without details, with a large wide brush. I'm talking about the general API of classical libraries created in the image and likeness of prehistoric dinosaurs. The main headache (which everyone stubbornly ignores) is operating with one character (letter, number) when printing or saving to a file. Cars used to be big and stupid, now they are small and very powerful. But the style has remained the same.
Stone that cannot be moved -> "printf()".
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf