I have a practical example on function inlining in a program, in an effort to help Kittu20 and others to understand what inlining actually is and is not.
Let's say you are doing some work that calculates the values of univariate polynomials of form \$f(x) = C_0 + C_1 x + C_2 x^2 + \dots + C_{N-2} x^{N-2} + C_{N-1} x^{N-1}\$, and you don't want to hardcode the values of \$C_k\$ into your C program, and instead read them from a file. Then, you can use a function like
#include <stdlib.h>
double poly(const double x,
const size_t n,
const double c[n])
{
if (n < 1)
return 0.0; // Or NaN
double result = c[0];
double factor = 1.0;
for (size_t i = 1; i < n; i++) {
result += factor * c[i];
factor *= x;
}
return result;
}
to calculate the value for any given polynomial, given their coefficients
c[] and the value of
x to evaluate it at. For example, to calculate \$f(x) = x^3 + 25 x - 4\$, you'd use
n = 3 and
double c[3] = { -4.0, 25.0, 1.0 } (always \$C_0\$ first, \$C_{N-1}\$ last).
(In C, you
can provide the length of an array, if you just do it before the array itself. You can even use multi-dimensional arrays:
(const size_t n, const size_t k, const double m[n][k]) is perfectly valid. For some reason, it seems rarely used. I like to promote it every now and then, because if more programmers were to use it, then the C compilers would more likely grow facilities to report array overruns, giving us programmers a very powerful tool to avoid buffer overrun bugs, the bane of bad C code. The array notation is completely compatible with the pointer notation, and you can supply a pointer to
double as the third parameter to
poly() without any problems, because in C, arrays always decay to a pointer to their initial element. The
size_t type, and not
int, is also the proper type for sizes.)
(Also, while modern C compilers can infer the
const-ness automagically from the code itself, and removing/adding them does not actually cause any change in the machine code generated by current C compilers, they
are important ––
for us humans. They indicate the author intent, as a promise that "no attempt to modify the values of these variables will be made in this code" (which the compiler will check, and warn or error if violated). Simply put, because
x,
n, and the contents of the
c array are marked
const, we humans know their value should stay the same within each invocation of the function, and therefore don't need to spend any mental effort to track them. I use them, because it makes it much easier to maintain code long term for me: instead of looking at the code for a minute, I recognize the function parameter
const-ness, and can parse what the function body does within a few seconds. Other experienced C programmers may be thrown off a bit by the variably-modified array argument, but as I wrote above, I think they too should use it more, instead of the more common pointer notation.)
Let's say, however, that your
actual task is to calculate the sum of the values \$f(x)\$ for a set or array of \$x\$ instead. You come up with a function using a very similar interface:
double poly_sum(const size_t nx,
const double x[nx],
const size_t nc,
const double c[nc])
{
double result = 0.0;
if (nx < 1 || nc < 1)
return 0.0; // Or NaN
for (size_t i = 0; i < nx; i++)
result += poly(x[i], nc, c);
return result;
}
Let's assume we write a few tests to verify the code works, and that it indeed does.
Now, in the latter function implementation as machine code, does the C compiler generate a call to the previously defined
poly() function, or does it inline the body of that function?
The only correct answer is that
the compiler gets to choose. It may generate a call, or copy the machine code implementing the
poly() function. Even when using the exact same compiler and same optimization settings, it may generate different cases on different hardware architectures. And chaning the compiler settings can of course change that too.
Let's say we realize we don't actually need the
poly() function outside the current compilation unit (this C file at hand), and we mark it
static or
static inline –– the two are exactly equivalent here ––, i.e. "
static double poly(const double x,...". What changes?
Without
static, the compiler must still provide object code for the
poly() function even in the case it chooses to inline its implementation within the
poly_sum() function. The
static keyword tells the compiler we only use it locally, no external linkage is needed, so it can drop the externally-visible implementation. The compiler is still free to choose whether to generate separate code for the
poly() function, or to inline it within the still externally-visible
poly_sum() function.
All architectures you can compile C code to, have their own function call application binary interface, ABI, or "calling conventions". For example, on 32-bit Windows, when you mark your functions
__fastcall, it means they will get their first parameter in the ECX register, and the second in the EDX register. What we call 'function call overhead' therefore usually includes quite a bit of shuffling data from/to registers and stack, since the way functions take parameters and return value(s) is fixed. When the compiler inlines a function implementation, it not only gets rid of the calling instruction, but also all that shuffling of values to/from registers and stack. If the core functionality of the called function is just a few machine instructions, or a short loop like here (on hardware with built-in double precision floating-point math support), the version of the outer
poly_sum() function can end up
shorter with the other function inlined, because many data-shuffling instructions can be avoided. Or not; depending on the exact hardware and ABI, which is why it is a good thing we can leave it up to the compiler to decide on a case-by-case basis.
If we were to look at the generated machine code when the C compiler decides to inline the code (at say
Compiler Explorer), we'd also find that when inlined, the compiler also omitted the superfluous check of whether
n is zero at the beginning of
poly(), because the call site,
poly_sum() already checked that earlier, and the compiler knows it will never trigger and can therefore omit the code for checking it altogether. Yes, many expert programmers will tell you that that check in
poly() is "unneeded"; I am telling you that because of the auto-inlining facilities of modern C compilers,
it does not hurt either. You might hide it behind a
NDEBUG check or similar, but having that check there is useful, because again, it tells us humans what the author intended when they wrote the function: it is not supposed to return anything useful when the array has zero elements, because then the functions \$f(x)\$ is
undefined: it has zero terms!
In fact, when the function is marked local-linkage only,
static, even when not inlined, the C compiler can note that all call sites have already verified that
n will be greater than zero, and even though choosing not to inline it, can generate code for
poly() that excludes the explicit check. Nice!
However, if you forget the
static, and think "hey, it's just an extra symbol in the linker which costs nothing, I'll worry about that
later, for now it might come handy elsewhere", the compiler will NOT be allowed to drop that check, and therefore you'll get machine code that does unnecessary checking (and possibly other unnecessary work), just because you-the-programmer did not want to think about it just now.
One key in programming is being as descriptive as you can to the compiler. You do NOT leave things as broad as possible "for now" –– say, because you think you might want to use it from another file later on, and believe it might "save you time" by leaving it exported "for now" ––, and then think you'll restrict them later on as an optimization step, because later never comes. Instead, be as strict and precise and limited as you can be by default, and only relax those when necessary. This ensures the compiler has maximum amount of information available from the get go, and can do those optimization choices the best way possible. Even its warnings about stuff it thinks is contradictory are more useful then. This also means that marking function parameters
const just by habit, while deemed "unnecessary" by "1337 c0d3rZ", is actually a Very Good Thing.
That leaves just the difference between
static and
static inline functions. For practical purposes, they are identical. Some compilers do warn about an unused
static function but not about an unused
static inline function, which is useful, but not required/suggested/based on the C standard. (Any tricks one might do with external/internal function visibility are better done using compiler extensions, especially via ELF "weak" symbol support.)
Personally, I use this to my advantage, by habitually using
static for normal local functions, and
static inline for accessor and helper functions (especially those defined in header files). The compiler does not care, but for us humans, it is another indicator of author intent. If you find you need to expand a
static inline function, you should instead create a new function that calls the
static inline function to provide the added functionality, because adding extra stuff to an accessor/helper function likely called elsewhere will add unnecessary overhead for those other calls. On the other hand, a normal local
static function is not supposed to do meaningful work, so the consideration shifts to refactoring "is this way of splitting the functionality, into
these local functions, the most sensible one?", and only if the answer is "yeah, this still makes sense", you go ahead and add the functionality to the existing function; otherwise you refactor the code, changing how you split the overall task into smaller sub-tasks, and which functions handle which ones.
All this means that
whether the compiler does or does not inline a particular function is something you do not worry about! What you should worry about is whether you, the human writing the code, have given enough information for the compiler to make good optimization decisions like inlining.
Actually, I personally prefer to worry about whether I have given enough information for the compiler to
avoid making horribly bad optimization decisions instead.
It is really not at all important for the compiler to always make the optimal choices, because they are already so good that the difference between "good" choice and "optimal" choice is minuscule. However, horribly bad optimization choices still exist, and those do bite us in the butt: you can lose orders of magnitude of performance by writing "stupid C" which does not allow the compiler to generate efficient machine code, because of the rules.
As programmers and developers, we should always concern ourselves most at the algorithm level –– whether our own approach is an efficient one, or whether some other one would work better ––, making sure we implement the code with sufficient specifity and precision, that the compiler can generate good code, or at least avoid generating horrible code.