Author Topic: Inline function in program  (Read 2965 times)

0 Members and 1 Guest are viewing this topic.

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14667
  • Country: fr
Re: Inline function in program
« Reply #25 on: November 08, 2023, 08:06:52 am »
As a side note, as was discussed recently in another thread, while GCC optimizations (especially starting with -O2, and very aggressively with -O3) will inline a lot of functions automatically, -Os tends to do the exact opposite, which can lead to pretty odd stuff: it tries to factor pieces of code that are repeated into functions and call them.

In both cases, the end result may not be favorable in terms of actual performance - the only way to tell is to test, and have a look at the generated assembly code. If it matters, of course. Otherwise just let the compiler do its stuff and move on.
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8243
  • Country: fi
Re: Inline function in program
« Reply #26 on: November 08, 2023, 09:55:42 am »
Actually you may have reminded me that I may have already done a quick experiment on that and found that as you quite rightly point out it actually made no actual difference if I inlined functions or not and inlining was pretty much in the hands of the compiler so to speak. So for what I was doing at the time, quite often there wasnt really much of a guarantee of benefit to have my code inlined :-DD

For micromanaging inlining (which is sometimes useful in embedded, if you don't want to go as far as writing assembly, but instead use the "write-C-look-at-assembly-output" or "write-C-measure-with-oscillosscope" strategies), in GCC, just use:

#define ALWAYS_INLINE static inline __attribute__((always_inline))

ALWAYS_INLINE void set_pin() { ... }

I also use this pattern when I refactor something out of function and possibly pass large objects by value (for the code to be most readable), then I feel confident it always works without significant performance penalty. I guess it's still not 100% guaranteed but it has never failed me so far. Note though that if you forcefully inline large functions that are called more than once it will waste significant amount of code memory.
« Last Edit: November 08, 2023, 09:59:34 am by Siwastaja »
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6421
  • Country: fi
    • My home page and email address
Re: Inline function in program
« Reply #27 on: November 08, 2023, 10:09:14 am »
I have a practical example on function inlining in a program, in an effort to help Kittu20 and others to understand what inlining actually is and is not.

Let's say you are doing some work that calculates the values of univariate polynomials of form \$f(x) = C_0 + C_1 x + C_2 x^2 + \dots + C_{N-2} x^{N-2} + C_{N-1} x^{N-1}\$, and you don't want to hardcode the values of \$C_k\$ into your C program, and instead read them from a file.  Then, you can use a function like
Code: [Select]
#include <stdlib.h>

double poly(const double x,
            const size_t n,
            const double c[n])
{
    if (n < 1)
        return 0.0;  // Or NaN

    double result = c[0];
    double factor = 1.0;

    for (size_t i = 1; i < n; i++) {
        result += factor * c[i];
        factor *= x;
    }

    return result;
}
to calculate the value for any given polynomial, given their coefficients c[] and the value of x to evaluate it at.  For example, to calculate \$f(x) = x^3 + 25 x - 4\$, you'd use n = 3 and double c[3] = { -4.0, 25.0, 1.0 } (always \$C_0\$ first, \$C_{N-1}\$ last).

(In C, you can provide the length of an array, if you just do it before the array itself.  You can even use multi-dimensional arrays: (const size_t n, const size_t k, const double m[n][k]) is perfectly valid.  For some reason, it seems rarely used.  I like to promote it every now and then, because if more programmers were to use it, then the C compilers would more likely grow facilities to report array overruns, giving us programmers a very powerful tool to avoid buffer overrun bugs, the bane of bad C code.  The array notation is completely compatible with the pointer notation, and you can supply a pointer to double as the third parameter to poly() without any problems, because in C, arrays always decay to a pointer to their initial element.  The size_t type, and not int, is also the proper type for sizes.)

(Also, while modern C compilers can infer the const-ness automagically from the code itself, and removing/adding them does not actually cause any change in the machine code generated by current C compilers, they are important –– for us humans.  They indicate the author intent, as a promise that "no attempt to modify the values of these variables will be made in this code" (which the compiler will check, and warn or error if violated).  Simply put, because x, n, and the contents of the c array are marked const, we humans know their value should stay the same within each invocation of the function, and therefore don't need to spend any mental effort to track them.  I use them, because it makes it much easier to maintain code long term for me: instead of looking at the code for a minute, I recognize the function parameter const-ness, and can parse what the function body does within a few seconds.  Other experienced C programmers may be thrown off a bit by the variably-modified array argument, but as I wrote above, I think they too should use it more, instead of the more common pointer notation.)

Let's say, however, that your actual task is to calculate the sum of the values \$f(x)\$ for a set or array of \$x\$ instead.  You come up with a function using a very similar interface:
Code: [Select]
double poly_sum(const size_t  nx,
                const double  x[nx],
                const size_t  nc,
                const double  c[nc])
{
    double result = 0.0;
    if (nx < 1 || nc < 1)
        return 0.0;  // Or NaN

    for (size_t i = 0; i < nx; i++)
        result += poly(x[i], nc, c);

    return result;
}
Let's assume we write a few tests to verify the code works, and that it indeed does.

Now, in the latter function implementation as machine code, does the C compiler generate a call to the previously defined poly() function, or does it inline the body of that function?

The only correct answer is that the compiler gets to choose.  It may generate a call, or copy the machine code implementing the poly() function. Even when using the exact same compiler and same optimization settings, it may generate different cases on different hardware architectures.  And chaning the compiler settings can of course change that too.

Let's say we realize we don't actually need the poly() function outside the current compilation unit (this C file at hand), and we mark it static or static inline –– the two are exactly equivalent here ––, i.e. "static double poly(const double x,...".  What changes?

Without static, the compiler must still provide object code for the poly() function even in the case it chooses to inline its implementation within the poly_sum() function.  The static keyword tells the compiler we only use it locally, no external linkage is needed, so it can drop the externally-visible implementation.  The compiler is still free to choose whether to generate separate code for the poly() function, or to inline it within the still externally-visible poly_sum() function.

All architectures you can compile C code to, have their own function call application binary interface, ABI, or "calling conventions".  For example, on 32-bit Windows, when you mark your functions __fastcall, it means they will get their first parameter in the ECX register, and the second in the EDX register.  What we call 'function call overhead' therefore usually includes quite a bit of shuffling data from/to registers and stack, since the way functions take parameters and return value(s) is fixed.  When the compiler inlines a function implementation, it not only gets rid of the calling instruction, but also all that shuffling of values to/from registers and stack.  If the core functionality of the called function is just a few machine instructions, or a short loop like here (on hardware with built-in double precision floating-point math support), the version of the outer poly_sum() function can end up shorter with the other function inlined, because many data-shuffling instructions can be avoided.  Or not; depending on the exact hardware and ABI, which is why it is a good thing we can leave it up to the compiler to decide on a case-by-case basis.

If we were to look at the generated machine code when the C compiler decides to inline the code (at say Compiler Explorer), we'd also find that when inlined, the compiler also omitted the superfluous check of whether n is zero at the beginning of poly(), because the call site, poly_sum() already checked that earlier, and the compiler knows it will never trigger and can therefore omit the code for checking it altogether.  Yes, many expert programmers will tell you that that check in poly() is "unneeded"; I am telling you that because of the auto-inlining facilities of modern C compilers, it does not hurt either.  You might hide it behind a NDEBUG check or similar, but having that check there is useful, because again, it tells us humans what the author intended when they wrote the function: it is not supposed to return anything useful when the array has zero elements, because then the functions \$f(x)\$ is undefined: it has zero terms!

In fact, when the function is marked local-linkage only, static, even when not inlined, the C compiler can note that all call sites have already verified that n will be greater than zero, and even though choosing not to inline it, can generate code for poly() that excludes the explicit check.  Nice!
However, if you forget the static, and think "hey, it's just an extra symbol in the linker which costs nothing, I'll worry about that later, for now it might come handy elsewhere", the compiler will NOT be allowed to drop that check, and therefore you'll get machine code that does unnecessary checking (and possibly other unnecessary work), just because you-the-programmer did not want to think about it just now.

One key in programming is being as descriptive as you can to the compiler.  You do NOT leave things as broad as possible "for now" –– say, because you think you might want to use it from another file later on, and believe it might "save you time" by leaving it exported "for now" ––, and then think you'll restrict them later on as an optimization step, because later never comes.  Instead, be as strict and precise and limited as you can be by default, and only relax those when necessary.  This ensures the compiler has maximum amount of information available from the get go, and can do those optimization choices the best way possible.  Even its warnings about stuff it thinks is contradictory are more useful then.  This also means that marking function parameters const just by habit, while deemed "unnecessary" by "1337 c0d3rZ", is actually a Very Good Thing.

That leaves just the difference between static and static inline functions.  For practical purposes, they are identical.  Some compilers do warn about an unused static function but not about an unused static inline function, which is useful, but not required/suggested/based on the C standard.  (Any tricks one might do with external/internal function visibility are better done using compiler extensions, especially via ELF "weak" symbol support.)

Personally, I use this to my advantage, by habitually using static for normal local functions, and static inline for accessor and helper functions (especially those defined in header files).  The compiler does not care, but for us humans, it is another indicator of author intent.  If you find you need to expand a static inline function, you should instead create a new function that calls the static inline function to provide the added functionality, because adding extra stuff to an accessor/helper function likely called elsewhere will add unnecessary overhead for those other calls.  On the other hand, a normal local static function is not supposed to do meaningful work, so the consideration shifts to refactoring "is this way of splitting the functionality, into these local functions, the most sensible one?", and only if the answer is "yeah, this still makes sense", you go ahead and add the functionality to the existing function; otherwise you refactor the code, changing how you split the overall task into smaller sub-tasks, and which functions handle which ones.

All this means that whether the compiler does or does not inline a particular function is something you do not worry about!  What you should worry about is whether you, the human writing the code, have given enough information for the compiler to make good optimization decisions like inlining.
Actually, I personally prefer to worry about whether I have given enough information for the compiler to avoid making horribly bad optimization decisions instead.
It is really not at all important for the compiler to always make the optimal choices, because they are already so good that the difference between "good" choice and "optimal" choice is minuscule.  However, horribly bad optimization choices still exist, and those do bite us in the butt: you can lose orders of magnitude of performance by writing "stupid C" which does not allow the compiler to generate efficient machine code, because of the rules.
As programmers and developers, we should always concern ourselves most at the algorithm level –– whether our own approach is an efficient one, or whether some other one would work better ––, making sure we implement the code with sufficient specifity and precision, that the compiler can generate good code, or at least avoid generating horrible code.
 
The following users thanked this post: Kittu20


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf